TalksAWS re:Invent 2025 - Accelerating AI innovation with NVIDIA GPUs on AWS (AIM251)
AWS re:Invent 2025 - Accelerating AI innovation with NVIDIA GPUs on AWS (AIM251)
Summary of AWS re:Invent 2025 - Accelerating AI innovation with NVIDIA GPUs on AWS
Key Trends in Generative AI
Reasoning: Models are generating more intermediate tokens to break down complex problems, requiring more compute during inference.
Multimodality: Models are generating a broader set of outputs like images, video, and audio, increasing compute and memory bandwidth requirements.
Agentic AI: Models are taking actions on behalf of users by querying databases, calling APIs, and interacting with other models, needing heterogeneous CPU and GPU compute.
Reinforcement Learning: Customers are using RL to ensure models stay aligned with user intent as they operate in broader environments.
AWS Investments in AI Infrastructure
Liquid Cooling
Liquid cooling enables higher GPU density, improved serviceability, and more dynamic scaling of GPU compute.
Custom GPU cold plates, coolant distribution units, and in-row heat exchangers were developed to optimize liquid cooling.
EC2 Ultra Servers
Ultra servers provide up to 72 GPUs connected over NVLink, enabling training and deployment of multi-trillion parameter models.
Features include high-bandwidth low-latency interconnects, resilient networking, and seamless integration across the AI stack.
EC2 Ultra Clusters
Ultra clusters connect tens of thousands of GPUs with non-blocking, low-latency networking using 51TB switches and 400Gbps links.
Resilient design with features like over-provisioning and redundancy ensures consistent performance at scale.
NVIDIA GPU Platforms on AWS
P60 GB300 Ultra Servers
Powered by NVIDIA Blackwell Ultra GPUs, offering 20TB of GPU memory and 50% higher FP4 performance.
Ideal for reasoning, multimodal models, and training/inference of multi-trillion parameter models.
P6 B300 Instances
Offer 50% more GPU memory, twice the networking bandwidth, and 50% more FP4 performance compared to prior generation.
Optimized for mid to large-scale generative AI training and inference workloads.
Adobe's Generative AI Platform - Firefly Foundry
Adobe trains "commercially safe" base models on curated data, then fine-tunes them on customer IP to create production-ready models.
Firefly Foundry leverages AWS infrastructure for efficient training and inference, including:
Multi-tenant training scheduler for prioritizing and managing workloads
Auto-recovery system to handle hardware failures during large-scale training
Disaggregated inference pipeline optimized for different compute requirements
Examples showcase how Firefly Foundry can generate high-quality, IP-aligned content for media and entertainment customers.
Key Takeaways
Emerging generative AI use cases are driving increased compute and memory requirements, especially for inference.
AWS is investing heavily in liquid cooling, ultra-scale servers and clusters to deliver the next level of GPU performance.
NVIDIA's latest GPU platforms on AWS, like the P60 GB300 and P6 B300, are optimized for these demanding workloads.
Adobe's Firefly Foundry demonstrates how customers can leverage AWS infrastructure to build production-ready generative AI applications.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.