TalksAWS re:Invent 2025 - Scaling instantly to 1000 GPUs for Serverless AI inference (AIM2201)

AWS re:Invent 2025 - Scaling instantly to 1000 GPUs for Serverless AI inference (AIM2201)

Scaling Instantly to 1000 GPUs for Serverless AI Inference

Overview

  • Modal is an infrastructure platform that makes it easy for developers to build and scale AI applications.
  • The company was founded in 2021 with the goal of addressing the challenges of deploying and scaling AI workloads on traditional infrastructure.
  • Modal provides a serverless, infrastructure-as-a-service platform that allows developers to focus on building AI models and applications without worrying about the underlying infrastructure.

Key Challenges in AI Infrastructure

  • Traditional infrastructure like Kubernetes and EC2 is not well-suited for the demands of AI applications, particularly around GPU usage, scaling, and cost management.
  • AI applications often require:
    • Running expensive GPU-based workloads
    • Scaling up and down rapidly to handle unpredictable demand
    • Executing untrusted code in a secure sandbox
    • Managing large-scale batch processing and training jobs

Modal's Approach

  • Modal has built its own custom container runtime, file system, and orchestration layer to address the unique needs of AI workloads.
  • The platform allows developers to write a few lines of Python code to define serverless functions that can be deployed and scaled on demand.
  • Modal manages the underlying infrastructure, including provisioning and scaling thousands of GPUs across multiple cloud providers and regions.
  • Developers can focus on building their AI applications without worrying about infrastructure management.

Technical Capabilities

  • Modal can scale up to 1000 GPUs within seconds to handle large, bursty workloads.
  • The platform supports a variety of GPU hardware, including the latest Nvidia A100 and A40 models.
  • Modal provides fast container startup times, even for large models, by leveraging its custom infrastructure.
  • The platform offers advanced observability and monitoring features, allowing developers to track GPU utilization, latency, and other metrics.

Business Impact

  • Modal's customers include a wide range of companies, from large tech giants like Meta to innovative AI startups.
  • Use cases span multiple domains, including machine learning, audio/speech processing, bioinformatics, weather forecasting, and more.
  • By offloading infrastructure management to Modal, customers can focus on building and deploying their AI applications faster, without the overhead of managing complex GPU-based systems.
  • Modal's consumption-based pricing model allows customers to only pay for the resources they use, leading to potential cost savings compared to traditional infrastructure approaches.

Example Use Cases

  • Sumo AI, a company that generates music using AI, runs all of its inference workloads on Modal, allowing a small team to focus on model development rather than infrastructure.
  • Cognition and Decagon, companies working on large language models, use Modal to execute code in a secure sandbox environment.
  • A robotics company uses Modal to control a robotic arm, taking advantage of Modal's low-latency and high-scale capabilities.
  • A customer processed 3,000 years of audio data in just a few days using Modal's GPU-powered infrastructure.

Getting Started with Modal

  • Developers can get started with Modal by installing the Python SDK and writing a few lines of code to define their serverless functions.
  • Modal offers $30 per month in free credits for all users, and startups can receive up to $50,000 in credits to help them get started.
  • The platform is designed to support developers throughout their AI journey, from prototyping to large-scale production deployments.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.