TalksAWS re:Invent 2025 - AWS Trn3 UltraServers: Power next-generation enterprise AI performance(AIM3335)

AWS re:Invent 2025 - AWS Trn3 UltraServers: Power next-generation enterprise AI performance(AIM3335)

Summary of AWS re:Invent 2025 - AWS Trn3 UltraServers: Power next-generation enterprise AI performance

The Tectonic Shift of AI

  • AI is enabling a fundamental transformation in how we build, deploy, and interact with the world
  • AI is driving breakthroughs across scientific domains like protein biology, mathematics, and software engineering
  • AI is becoming the engine powering scientific discovery and innovation

Building the AI Infrastructure of the Future

  • AWS has built the most comprehensive and deeply integrated AI stack to power the next generation of AI
  • Key components include:
    • Compute: Broad portfolio of GPU instances and latest Inferentia and Trainium instances
    • Network: Ultra-clusters with low-latency, low-jitter Elastic Fabric Adapter
    • Storage: High-throughput options like FSx for Lustre and S3 Express
    • Security: Nitro system for workload isolation and data protection
    • Management: Observability tools like CloudWatch

Trainium 3: Powering Next-Gen AI Workloads

  • Trainium 3 is designed to support the evolving needs of AI, including:
    • Longer context lengths for reasoning models
    • Mixture of expert models with communication-heavy architectures
    • Infrastructure for pre-training, post-training, and inference
    • High batch size, high throughput systems for concurrent agent-based workloads
  • Key Trainium 3 specifications:
    • 360 PFLOPS of microscaled FP8 compute (4.4x more than Trainium 2)
    • 20TB of HBM capacity (3.4x more) and 700TB/s of HBM bandwidth (3.9x more)
    • 2x faster interconnect with new Neuron switches for low-latency, high-bandwidth communication

Optimizing Trainium 3 for Performance

  • Focused on achieving maximum sustained performance, not just peak numbers
  • Innovations include:
    • Microscaling for efficient low-precision quantization without accuracy loss
    • Accelerated softmax instructions to keep tensor engines fully utilized
    • Extensive co-optimizations across the entire hardware and software stack

Scaling Trainium 3 for Massive Workloads

  • Trainium 3 designed for rapid scalability from the ground up
  • Modular, top-accessible, and robotically assembled compute sleds enable fast deployment
  • Leverages AWS's expertise in building massive AI clusters, like the 1 million-chip Project Trineir

Ease of Use for Different User Personas

  • ML Developers: Deep integration with popular ML frameworks and pre-optimized models
  • Researchers: PyTorch-native support with eager execution and automatic optimizations
  • Performance Engineers: Low-level Neuron Kernel Interface (NKI) and Neuron Explorer profiling tools

The Road Ahead: Trainium 4

  • Targeting 6x FP4 performance uplift, 4x memory bandwidth, and 2x memory capacity compared to Trainium 3
  • Continued focus on energy efficiency and end-to-end optimizations

Key Takeaways

  • Trainium 3 is designed to power the next generation of large-scale, complex AI workloads
  • AWS has invested heavily in building a comprehensive AI infrastructure stack, from chips to clusters
  • Performance optimizations focus on achieving maximum sustained throughput, not just peak numbers
  • Scalability and ease of use are key priorities, catering to different user personas
  • Rapid iteration continues, with Trainium 4 promising even greater performance and efficiency

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.