Talks AWS re:Invent 2025 - AWS Trn3 UltraServers: Power next-generation enterprise AI performance(AIM3335) VIDEO
AWS re:Invent 2025 - AWS Trn3 UltraServers: Power next-generation enterprise AI performance(AIM3335) Summary of AWS re:Invent 2025 - AWS Trn3 UltraServers: Power next-generation enterprise AI performance
The Tectonic Shift of AI
AI is enabling a fundamental transformation in how we build, deploy, and interact with the world
AI is driving breakthroughs across scientific domains like protein biology, mathematics, and software engineering
AI is becoming the engine powering scientific discovery and innovation
Building the AI Infrastructure of the Future
AWS has built the most comprehensive and deeply integrated AI stack to power the next generation of AI
Key components include:
Compute: Broad portfolio of GPU instances and latest Inferentia and Trainium instances
Network: Ultra-clusters with low-latency, low-jitter Elastic Fabric Adapter
Storage: High-throughput options like FSx for Lustre and S3 Express
Security: Nitro system for workload isolation and data protection
Management: Observability tools like CloudWatch
Trainium 3: Powering Next-Gen AI Workloads
Trainium 3 is designed to support the evolving needs of AI, including:
Longer context lengths for reasoning models
Mixture of expert models with communication-heavy architectures
Infrastructure for pre-training, post-training, and inference
High batch size, high throughput systems for concurrent agent-based workloads
Key Trainium 3 specifications:
360 PFLOPS of microscaled FP8 compute (4.4x more than Trainium 2)
20TB of HBM capacity (3.4x more) and 700TB/s of HBM bandwidth (3.9x more)
2x faster interconnect with new Neuron switches for low-latency, high-bandwidth communication
Optimizing Trainium 3 for Performance
Focused on achieving maximum sustained performance, not just peak numbers
Innovations include:
Microscaling for efficient low-precision quantization without accuracy loss
Accelerated softmax instructions to keep tensor engines fully utilized
Extensive co-optimizations across the entire hardware and software stack
Scaling Trainium 3 for Massive Workloads
Trainium 3 designed for rapid scalability from the ground up
Modular, top-accessible, and robotically assembled compute sleds enable fast deployment
Leverages AWS's expertise in building massive AI clusters, like the 1 million-chip Project Trineir
Ease of Use for Different User Personas
ML Developers: Deep integration with popular ML frameworks and pre-optimized models
Researchers: PyTorch-native support with eager execution and automatic optimizations
Performance Engineers: Low-level Neuron Kernel Interface (NKI) and Neuron Explorer profiling tools
The Road Ahead: Trainium 4
Targeting 6x FP4 performance uplift, 4x memory bandwidth, and 2x memory capacity compared to Trainium 3
Continued focus on energy efficiency and end-to-end optimizations
Key Takeaways
Trainium 3 is designed to power the next generation of large-scale, complex AI workloads
AWS has invested heavily in building a comprehensive AI infrastructure stack, from chips to clusters
Performance optimizations focus on achieving maximum sustained throughput, not just peak numbers
Scalability and ease of use are key priorities, catering to different user personas
Rapid iteration continues, with Trainium 4 promising even greater performance and efficiency
Your Digital Journey deserves a great story. Build one with us.