TalksAWS re:Invent 2025 - Maximizing EC2 Performance: A Hands-on Guide to Instance Optimization (CMP333)
AWS re:Invent 2025 - Maximizing EC2 Performance: A Hands-on Guide to Instance Optimization (CMP333)
Maximizing EC2 Performance: A Hands-on Guide to Instance Optimization
Performance Engineering Overview
Performance engineering is about finding opportunities for efficiency and performance gains in a system
Abstractions and layers of software/hardware can hide performance bottlenecks that need to be uncovered
Bottlenecks may hide other, worse problems that need to be addressed
Performance engineering often involves a search space explosion, requiring the use of many different tools and techniques
Wide-then-Deep Approach
The presenters recommend a "wide-then-deep" approach to performance engineering:
First, take a wide view of the entire system to identify the most promising opportunities
Then, dive deep into the most impactful areas to optimize them
This approach helps avoid getting stuck pursuing the wrong optimization path based on intuition alone
Introducing Aperf
Aperf is a performance analysis tool developed by AWS to enable the wide-then-deep approach
Aperf collects hundreds of system-wide metrics and presents them in an easy-to-understand dashboard
Key features of Aperf:
Collects a comprehensive set of CPU, memory, disk, network, and other system-level statistics
Generates flame graphs and other visualizations to identify performance hotspots
Allows easy comparison of performance across different EC2 instance types
Groovy Demo
The presenters demonstrate a Groovy-based web application with aspects for logging, metrics, and other cross-cutting concerns
Using Aperf, they identify that the aspect-oriented programming (AOP) implementation is causing significant performance overhead
By removing the AOP aspects and inlining the functionality, they are able to achieve a 3x performance improvement on the same EC2 instance (M7G)
This demonstrates how optimizing the underlying implementation, rather than the business logic, can lead to substantial performance gains
MongoDB Demo
The presenters set up a MongoDB deployment on two different EC2 instance types: M7G and M7GD (with NVMe storage)
Using Aperf, they identify that the M7G instance is heavily I/O bound, with the CPU spending most of its time waiting for disk I/O
By switching to the M7GD instance with faster NVMe storage, they are able to achieve a 3x performance improvement, from 4,000 to 12,000 requests per second
Key Takeaways
Performance engineering is not just about writing better algorithms or low-level code; it's about finding opportunities for optimization throughout the entire system
The wide-then-deep approach, enabled by tools like Aperf, helps identify the most impactful areas for optimization, which may not always be in the application code
Optimizing the underlying infrastructure, such as EC2 instance types and storage configurations, can lead to significant performance gains without modifying the application
Understanding the performance characteristics and costs of the various components in your system (CPU, memory, storage, network, etc.) is crucial for making informed optimization decisions
Business Impact
The techniques and tools presented can help organizations maximize the performance and cost-efficiency of their EC2-based applications
By identifying and addressing performance bottlenecks, companies can improve the responsiveness and scalability of their systems, leading to better user experiences and higher business productivity
The ability to quickly assess the performance impact of different EC2 instance types and configurations can inform infrastructure planning and help organizations make more informed decisions about their cloud resources
Real-world Examples
The Groovy and MongoDB demos showcase how Aperf can surface performance issues that may not be immediately obvious from the application code alone
By addressing these hidden bottlenecks, the presenters were able to achieve significant performance improvements without modifying the core business logic
These types of optimizations can have a direct impact on the ability to meet service-level objectives (SLOs) and scale applications to handle increased workloads
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.