TalksAWS re:Invent 2025 - NVIDIA Run:ai & Amazon SageMaker HyperPod Integration for Distributed Training

AWS re:Invent 2025 - NVIDIA Run:ai & Amazon SageMaker HyperPod Integration for Distributed Training

AWS re:Invent 2025 - NVIDIA Run:ai & Amazon SageMaker HyperPod Integration for Distributed Training

NVIDIA Run:ai Overview

  • NVIDIA Run:ai is a Kubernetes-based GPU orchestration and scheduling platform
  • Key capabilities:
    • GPU infrastructure pooling and heterogeneous environment management
    • Policy-driven governance and resource management
    • Advanced GPU utilization techniques like fractional GPUs and dynamic memory
    • Seamless user experience with on-demand access to compute
    • Open, API-first architecture to integrate with existing tools and frameworks
  • Run:ai architecture:
    • Control plane manages multiple distributed Kubernetes/EKS clusters
    • Clusters aggregate GPU resources into a large compute pool
    • Run:ai integrates with the clusters to provide advanced scheduling and orchestration

GPU Utilization Optimizations

  • Fractional GPU technologies:
    • VGPU, MIG, and Run:ai's own CUDA-based fractional GPU sharing
    • Enables multiple containers/workloads to share a single physical GPU
    • Improves user density for development and inference workloads
  • Dynamic GPU memory:
    • Allows containers to request a dynamic range of GPU memory
    • Enables workloads to scale GPU memory usage on-demand
    • Reduces the need to restart jobs when scaling data/model size
  • GPU memory swap:
    • Transparently swaps idle GPU memory to host RAM
    • Allows suspending and resuming GPU workloads to improve utilization
    • Can increase GPU utilization from 85-90% up to even higher levels

Scheduling and Resource Management

  • Kubernetes-based scheduling is not optimal for GPU workloads
  • Run:ai implements an HPC-inspired scheduler with features like:
    • Multiple queues, preemption, and reclamation
    • Guaranteed GPU quotas for teams and projects
    • Topology-aware scheduling to optimize network and GPU locality
  • Quotas provide developers reliable access to GPUs, while allowing admins to shift capacity
  • Tight integration with Amazon SageMaker HyperPod:
    • HyperPod provides automated health checking and hardware replacement
    • Run:ai schedules workloads to leverage the resilient HyperPod infrastructure

Demonstration Scenarios

  1. Hardware Fault Tolerance:
    • Run:ai workload continues running despite a simulated GPU failure
    • HyperPod automatically replaces the faulty node, reintegrating it into the cluster
    • Run:ai scales down the workload, then scales it back up on the new node
  2. Multi-Tenant Resource Sharing:
    • Different teams have guaranteed GPU quotas within the cluster
    • A team can burst beyond their quota when capacity is available
    • When a new team requests resources, Run:ai preempts lower-priority workloads

Additional Capabilities

  • Kubernetes AI Scheduler (KIuler) - Open-sourced scheduling engine
  • Model Streamer - Optimizes cold starts for large language models
  • Dynamo - Advanced model serving and inference platform integration

Business Impact

  • Increased GPU utilization and return on infrastructure investment
  • Faster time-to-market for AI/ML projects through reliable access to resources
  • Centralized visibility and control over GPU consumption and allocation
  • Resilient, fault-tolerant GPU clusters with automated hardware management

Real-World Examples

  • Customers leveraging fractional GPUs to improve developer density and inference efficiency
  • Enterprises using GPU memory swap to maximize utilization of large GPU servers
  • Teams dynamically shifting GPU quotas based on changing business priorities

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.