TalksAWS re:Invent 2025 - Troubleshooting database performance issues at scale (COP331)

AWS re:Invent 2025 - Troubleshooting database performance issues at scale (COP331)

Troubleshooting Database Performance Issues at Scale

Identifying Database Performance Challenges

  • Customers often face issues with slow application services, unsure if the root cause is the database
  • Lack of application context and visibility across a diverse database fleet can make it difficult to pinpoint problems
  • Reliance on multiple specialized tools to monitor different database engines adds complexity

Fleetwide Database Observability

  • Unified monitoring across accounts and regions provides a high-level view of database fleet performance
  • Ability to save custom views (e.g., "Retail Product Application") to quickly identify problem areas
  • Detailed metrics on load, CPU, memory, disk, and network I/O for each database instance
  • Visibility into events like restarts, failures, and severity levels
  • Integration with application performance monitoring to correlate database issues with end-to-end transaction data

Analyzing Database Instance Performance

  • Drill down into specific database instances to investigate high load or performance degradation
  • Identify problematic queries consuming excessive resources, such as a "select * from orders" query running continuously
  • Trace query execution back to the originating application, user, and host to understand the root cause
  • Slice and dice performance data by various dimensions (hosts, users, applications) to isolate the issue

Troubleshooting Locking and Concurrency Issues

  • Use database lock analysis to identify popular record locking scenarios causing performance problems
  • Visualize the lock tree to understand blocking relationships and wait times
  • Pinpoint specific locked objects, blocking sessions, and locking patterns to resolve concurrency issues

Optimizing Query Execution Plans

  • Detect changes in query execution plans that may have caused performance degradation
  • Compare efficient and inefficient plans to understand differences in index usage, partitioning, and other factors
  • Identify queries performing full table scans versus more efficient index-only scans

Proactive Database Performance Monitoring

  • Leverage pre-built dashboards with 18+ key metrics per database engine
  • Customize dashboards to track critical performance indicators like read/write latency
  • Analyze slow query patterns to identify long-running queries impacting application performance

End-to-End Transaction Tracing

  • Integrate application performance monitoring to trace customer transactions from front-end to database
  • Visualize the entire transaction flow, including calls to the database, to pinpoint where latency or errors occur
  • Drill down into specific database queries within the transaction trace to understand their performance

Troubleshooting Workflow

  1. Identify high-load database instances from the fleet-wide observability view
  2. Analyze queries, execution plans, and locking patterns on the problematic instance
  3. Review key performance metrics to understand the nature and scope of the issue
  4. Leverage application performance monitoring to trace end-to-end transactions and correlate database performance
  5. Implement fixes and monitor to ensure the issue is resolved

Key Takeaways

  • Unified observability across a diverse database fleet enables rapid identification of performance problems
  • In-depth analysis of queries, execution plans, and locking behavior provides insights to resolve issues
  • Integrating application performance monitoring allows tracing end-to-end transactions to isolate database-related problems
  • A structured troubleshooting workflow helps efficiently diagnose and remediate database performance challenges

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.