TalksAWS re:Invent 2025 - Mission-Ready HPC: From NOAA Today to AI Tomorrow (WPS205)

AWS re:Invent 2025 - Mission-Ready HPC: From NOAA Today to AI Tomorrow (WPS205)

Summary of AWS re:Invent 2025 - Mission-Ready HPC: From NOAA Today to AI Tomorrow

Overview of HPC Market and Trends

  • The HPC market was valued at $37 billion in 2023 and grew by 24% in 2024.
  • By 2028, one-third of the HPC market is expected to be on the cloud.
  • By 2029, the converged AI and HPC market is projected to reach $49 billion.
  • This rapid growth validates the market demand and the innovations being driven by organizations like NOAA.

NOAA's Mission and HPC Workloads

NOAA's Diverse Mission

  • NOAA's mission covers a wide range of domains, from the surface of the sun to the bottom of the oceans.
  • Key focus areas include weather forecasting, hurricane tracking, aviation weather, tsunami warnings, fire weather, and ocean prediction.
  • NOAA operates across the United States, Puerto Rico, Guam, and the Pacific islands, with 122 forecast offices embedded in local communities.
  • The goal is to build trust relationships with decision-makers to provide timely, impactful weather information that saves lives and protects the economy.

HPC Requirements and Workloads

  • NOAA has strict requirements for product timeliness, aiming for 99.9% on-time delivery of 14 million products per day.
  • They operate two high-performance supercomputers (14 petaflops each) that can switch operations between the East and West coasts within 10 minutes.
  • NOAA's modeling workflow includes:
    • Ingesting billions of observations per day
    • Running complex analysis and forecast models (global, regional, climate)
    • Performing post-processing and statistical adjustments
    • Outputting data in formats for forecasters and the public
  • The workload is highly parallel for the core forecast models but also includes serial post-processing tasks.
  • NOAA also has on-demand workloads for hurricane/tropical storm prediction and development/surge workloads.

Challenges with On-Premises HPC

  • On-premises systems can face challenges when workloads get backed up, requiring load-shedding of certain models or cycles.
  • The rapid pace of AI model advancements also presents change management challenges, as NOAA needs to carefully validate new models to maintain scientific integrity.
  • Data dissemination is another challenge, with NOAA serving over 12 TB of data per day and handling surges up to 1 billion hits during events like tsunamis.

NOAA's Journey to the Cloud

Benchmarking and Proof of Concepts

  • NOAA started exploring cloud-based HPC in early 2020, benchmarking their models and porting MPI applications to the cloud.
  • This process helped NOAA understand the performance and cost tradeoffs of running HPC workloads in the cloud.

Cloud Architecture and Offerings

  • NOAA has standardized on purpose-built HPC instances (HPC7A, HPC6A, HPC8A) in AWS.
  • They leverage Elastic Fabric Adapter for low-latency, high-throughput networking and FSx for Lustre for high-performance storage.
  • NOAA offers two cloud HPC options for their scientists and researchers:
    1. Self-managed HPC clusters
    2. HPC-as-a-Service through a partnership with Parallel Works and GDIT

Benefits of Cloud HPC

  • The cloud-based HPC offerings have enabled NOAA to accelerate their research-to-operations timeline.
  • Scientists and researchers can quickly spin up the compute resources they need without waiting for on-premises queues.
  • NOAA has seen 15% faster runtimes and 25% better cost-performance for their Rufus rapid refresh forecast system in the cloud.

Convergence of HPC and AI

AI Weather Prediction

  • NOAA is exploring the use of AI-based weather prediction models, such as Forecast.net, Pongua Weather, Graphcast, and Aurora.
  • These AI models can run on single GPU instances in minutes, compared to the hours required for traditional physics-based numerical weather prediction models.
  • The AI models are trained on the outputs of the physics-based models, leveraging the strengths of both approaches.

Democratizing Weather Prediction

  • The cloud-based AI weather prediction models allow NOAA's scientists and researchers to quickly experiment and iterate, without the constraints of on-premises HPC systems.
  • Even non-scientists, like the presenter, were able to run AI weather prediction models in Jupyter notebooks, demonstrating the accessibility of these tools.

AWS Investments in HPC and AI

  • AWS has announced a $50 billion investment to build new AI and supercomputer centers for the U.S. government.
  • This investment is equivalent to 1.3 GW of compute capacity and will serve missions across various government agencies.
  • The investment is driven by real-world innovations seen with customers like NOAA, as well as other examples:
    • S2 Labs using deep learning and supercomputing to map buried infrastructure after Hurricane Ivan
    • Sinera using multi-agent AI and HPC to automate complex engineering design processes

Key Takeaways

  • The convergence of HPC and AI is transforming how organizations like NOAA can predict, track, and respond to critical weather events, saving more lives.
  • Cloud-based HPC offerings are enabling NOAA to accelerate their research-to-operations timeline and improve cost-performance.
  • AI-based weather prediction models are democratizing access to advanced forecasting capabilities, allowing for faster experimentation and iteration.
  • AWS's significant investments in HPC and AI infrastructure demonstrate the growing importance of these technologies in mission-critical applications.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.