TalksAWS re:Invent 2025 - Maximizing EC2 Performance: A Hands-on Guide to Instance Optimization (CMP333)

AWS re:Invent 2025 - Maximizing EC2 Performance: A Hands-on Guide to Instance Optimization (CMP333)

Maximizing EC2 Performance: A Hands-on Guide to Instance Optimization

Performance Engineering Overview

  • Performance engineering is about finding opportunities for efficiency and performance gains in a system
  • Abstractions and layers of software/hardware can hide performance bottlenecks that need to be uncovered
  • Bottlenecks may hide other, worse problems that need to be addressed
  • Performance engineering often involves a search space explosion, requiring the use of many different tools and techniques

Wide-then-Deep Approach

  • The presenters recommend a "wide-then-deep" approach to performance engineering:
    • First, take a wide view of the entire system to identify the most promising opportunities
    • Then, dive deep into the most impactful areas to optimize them
  • This approach helps avoid getting stuck pursuing the wrong optimization path based on intuition alone

Introducing Aperf

  • Aperf is a performance analysis tool developed by AWS to enable the wide-then-deep approach
  • Aperf collects hundreds of system-wide metrics and presents them in an easy-to-understand dashboard
  • Key features of Aperf:
    • Collects a comprehensive set of CPU, memory, disk, network, and other system-level statistics
    • Generates flame graphs and other visualizations to identify performance hotspots
    • Allows easy comparison of performance across different EC2 instance types

Groovy Demo

  • The presenters demonstrate a Groovy-based web application with aspects for logging, metrics, and other cross-cutting concerns
  • Using Aperf, they identify that the aspect-oriented programming (AOP) implementation is causing significant performance overhead
  • By removing the AOP aspects and inlining the functionality, they are able to achieve a 3x performance improvement on the same EC2 instance (M7G)
  • This demonstrates how optimizing the underlying implementation, rather than the business logic, can lead to substantial performance gains

MongoDB Demo

  • The presenters set up a MongoDB deployment on two different EC2 instance types: M7G and M7GD (with NVMe storage)
  • Using Aperf, they identify that the M7G instance is heavily I/O bound, with the CPU spending most of its time waiting for disk I/O
  • By switching to the M7GD instance with faster NVMe storage, they are able to achieve a 3x performance improvement, from 4,000 to 12,000 requests per second

Key Takeaways

  • Performance engineering is not just about writing better algorithms or low-level code; it's about finding opportunities for optimization throughout the entire system
  • The wide-then-deep approach, enabled by tools like Aperf, helps identify the most impactful areas for optimization, which may not always be in the application code
  • Optimizing the underlying infrastructure, such as EC2 instance types and storage configurations, can lead to significant performance gains without modifying the application
  • Understanding the performance characteristics and costs of the various components in your system (CPU, memory, storage, network, etc.) is crucial for making informed optimization decisions

Business Impact

  • The techniques and tools presented can help organizations maximize the performance and cost-efficiency of their EC2-based applications
  • By identifying and addressing performance bottlenecks, companies can improve the responsiveness and scalability of their systems, leading to better user experiences and higher business productivity
  • The ability to quickly assess the performance impact of different EC2 instance types and configurations can inform infrastructure planning and help organizations make more informed decisions about their cloud resources

Real-world Examples

  • The Groovy and MongoDB demos showcase how Aperf can surface performance issues that may not be immediately obvious from the application code alone
  • By addressing these hidden bottlenecks, the presenters were able to achieve significant performance improvements without modifying the core business logic
  • These types of optimizations can have a direct impact on the ability to meet service-level objectives (SLOs) and scale applications to handle increased workloads

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.