AWS re:Invent 2025 - Everything you've wanted to know about performance on EC2 instances (CMP405)

AWS re:Invent 2025 - Everything you've wanted to know about performance on EC2 instances

Understanding the EC2 Instance Portfolio

EC2 instances are categorized into different families based on their compute, memory, and networking capabilities

The instance naming convention provides details on the generation, processor type, and size of the instance

Within each processor brand (Intel, AMD, Graviton), the performance is consistent across the same generation of instances

However, performance can vary significantly between different processor architectures (e.g. Intel vs AMD vs Graviton)

Processor Generations and Performance Improvements

Newer processor generations offer significant performance improvements over previous generations

Key improvements include:

Larger caches (L1, L2, L3)
Improved branch prediction
Increased memory bandwidth
Higher core counts

The price per vCPU has increased over time, but the price per unit of performance has decreased

Virtualization and Bare Metal Instances

The move from the Xen hypervisor to the Nitro virtualization system has reduced virtualization overhead and "noisy neighbor" effects

Bare metal instances provide full control over the underlying hardware, but are only offered as full instances (no partial instances)

For most workloads, the performance difference between virtualized and bare metal instances is negligible, except for highly latency-sensitive applications

NUMA Topology and Memory Access

AMD-based instances expose a non-uniform memory access (NUMA) topology, with groups of 8 cores (CCX) sharing a slice of L3 cache and memory

Accessing memory across different CCXs can result in higher latency and reduced performance

This can be especially problematic when scaling from single-socket to dual-socket instances, as the memory latency increases

Kubernetes can help mitigate NUMA-related issues by scheduling pods on the same NUMA node

Hyperthreading and Single-Threaded Performance

Intel instances traditionally used hyperthreading to provide two logical threads per physical core

Graviton and newer AMD instances use a single-threaded design, with one vCPU per physical core

The performance impact of hyperthreading varies depending on the workload:

Parallelized workloads can benefit from hyperthreading
Single-threaded performance may be higher on single-threaded cores

Key Takeaways

Understand the EC2 instance naming convention to select the right instance type for your workload

Newer processor generations offer significant performance improvements, but the price per vCPU has also increased

Virtualization overhead has been reduced, but bare metal instances may still be beneficial for highly latency-sensitive applications

NUMA topology can have a significant impact on performance, especially when scaling to larger instances

Hyperthreading can provide benefits for parallelized workloads, but single-threaded performance may be higher on single-threaded cores

Real-World Examples and Impact

A customer migrating their database from a 24XL to a 48XL instance saw a doubling of P50 latency, despite the increased resources, due to the NUMA topology

Customers running SSL/TLS load balancers on older Graviton2 instances experienced significant performance degradation due to a single 64-bit integer multiplier, which was later addressed in Graviton3

Kubernetes users need to be aware of NUMA topology and consider scheduling pods on the same NUMA node to avoid performance issues

By understanding the underlying hardware and architectural differences between EC2 instance types, developers and operators can make more informed decisions to optimize the performance and cost-effectiveness of their applications running on AWS.

AWS re:Invent 2025 - Everything you've wanted to know about performance on EC2 instances (CMP405)