Talks AWS re:Invent 2025 - Troubleshooting database performance issues at scale (COP331) VIDEO
AWS re:Invent 2025 - Troubleshooting database performance issues at scale (COP331) Troubleshooting Database Performance Issues at Scale
Identifying Database Performance Challenges
Customers often face issues with slow application services, unsure if the root cause is the database
Lack of application context and visibility across a diverse database fleet can make it difficult to pinpoint problems
Reliance on multiple specialized tools to monitor different database engines adds complexity
Fleetwide Database Observability
Unified monitoring across accounts and regions provides a high-level view of database fleet performance
Ability to save custom views (e.g., "Retail Product Application") to quickly identify problem areas
Detailed metrics on load, CPU, memory, disk, and network I/O for each database instance
Visibility into events like restarts, failures, and severity levels
Integration with application performance monitoring to correlate database issues with end-to-end transaction data
Analyzing Database Instance Performance
Drill down into specific database instances to investigate high load or performance degradation
Identify problematic queries consuming excessive resources, such as a "select * from orders" query running continuously
Trace query execution back to the originating application, user, and host to understand the root cause
Slice and dice performance data by various dimensions (hosts, users, applications) to isolate the issue
Troubleshooting Locking and Concurrency Issues
Use database lock analysis to identify popular record locking scenarios causing performance problems
Visualize the lock tree to understand blocking relationships and wait times
Pinpoint specific locked objects, blocking sessions, and locking patterns to resolve concurrency issues
Optimizing Query Execution Plans
Detect changes in query execution plans that may have caused performance degradation
Compare efficient and inefficient plans to understand differences in index usage, partitioning, and other factors
Identify queries performing full table scans versus more efficient index-only scans
Proactive Database Performance Monitoring
Leverage pre-built dashboards with 18+ key metrics per database engine
Customize dashboards to track critical performance indicators like read/write latency
Analyze slow query patterns to identify long-running queries impacting application performance
End-to-End Transaction Tracing
Integrate application performance monitoring to trace customer transactions from front-end to database
Visualize the entire transaction flow, including calls to the database, to pinpoint where latency or errors occur
Drill down into specific database queries within the transaction trace to understand their performance
Troubleshooting Workflow
Identify high-load database instances from the fleet-wide observability view
Analyze queries, execution plans, and locking patterns on the problematic instance
Review key performance metrics to understand the nature and scope of the issue
Leverage application performance monitoring to trace end-to-end transactions and correlate database performance
Implement fixes and monitor to ensure the issue is resolved
Key Takeaways
Unified observability across a diverse database fleet enables rapid identification of performance problems
In-depth analysis of queries, execution plans, and locking behavior provides insights to resolve issues
Integrating application performance monitoring allows tracing end-to-end transactions to isolate database-related problems
A structured troubleshooting workflow helps efficiently diagnose and remediate database performance challenges
Your Digital Journey deserves a great story. Build one with us.