AWS re:Invent 2025 - Troubleshooting database performance issues at scale (COP331)

Troubleshooting Database Performance Issues at Scale

Identifying Database Performance Challenges

Customers often face issues with slow application services, unsure if the root cause is the database

Lack of application context and visibility across a diverse database fleet can make it difficult to pinpoint problems

Reliance on multiple specialized tools to monitor different database engines adds complexity

Fleetwide Database Observability

Unified monitoring across accounts and regions provides a high-level view of database fleet performance

Ability to save custom views (e.g., "Retail Product Application") to quickly identify problem areas

Detailed metrics on load, CPU, memory, disk, and network I/O for each database instance

Visibility into events like restarts, failures, and severity levels

Integration with application performance monitoring to correlate database issues with end-to-end transaction data

Analyzing Database Instance Performance

Drill down into specific database instances to investigate high load or performance degradation

Identify problematic queries consuming excessive resources, such as a "select * from orders" query running continuously

Trace query execution back to the originating application, user, and host to understand the root cause

Slice and dice performance data by various dimensions (hosts, users, applications) to isolate the issue

Troubleshooting Locking and Concurrency Issues

Use database lock analysis to identify popular record locking scenarios causing performance problems

Visualize the lock tree to understand blocking relationships and wait times

Pinpoint specific locked objects, blocking sessions, and locking patterns to resolve concurrency issues

Optimizing Query Execution Plans

Detect changes in query execution plans that may have caused performance degradation

Compare efficient and inefficient plans to understand differences in index usage, partitioning, and other factors

Identify queries performing full table scans versus more efficient index-only scans

Proactive Database Performance Monitoring

Leverage pre-built dashboards with 18+ key metrics per database engine

Customize dashboards to track critical performance indicators like read/write latency

Analyze slow query patterns to identify long-running queries impacting application performance

End-to-End Transaction Tracing

Integrate application performance monitoring to trace customer transactions from front-end to database

Visualize the entire transaction flow, including calls to the database, to pinpoint where latency or errors occur

Drill down into specific database queries within the transaction trace to understand their performance

Troubleshooting Workflow

Identify high-load database instances from the fleet-wide observability view

Analyze queries, execution plans, and locking patterns on the problematic instance

Review key performance metrics to understand the nature and scope of the issue

Leverage application performance monitoring to trace end-to-end transactions and correlate database performance

Implement fixes and monitor to ensure the issue is resolved

Key Takeaways

Unified observability across a diverse database fleet enables rapid identification of performance problems

In-depth analysis of queries, execution plans, and locking behavior provides insights to resolve issues

Integrating application performance monitoring allows tracing end-to-end transactions to isolate database-related problems

A structured troubleshooting workflow helps efficiently diagnose and remediate database performance challenges

AWS re:Invent 2025 - Troubleshooting database performance issues at scale (COP331)

Troubleshooting Database Performance Issues at Scale

Identifying Database Performance Challenges

Fleetwide Database Observability

Analyzing Database Instance Performance

Troubleshooting Locking and Concurrency Issues

Optimizing Query Execution Plans

Proactive Database Performance Monitoring

End-to-End Transaction Tracing

Troubleshooting Workflow

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - Troubleshooting database performance issues at scale (COP331)

Troubleshooting Database Performance Issues at Scale

Identifying Database Performance Challenges

Fleetwide Database Observability

Analyzing Database Instance Performance

Troubleshooting Locking and Concurrency Issues

Optimizing Query Execution Plans

Proactive Database Performance Monitoring

End-to-End Transaction Tracing

Troubleshooting Workflow

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.