High performance computing: Reinvented to help you think truly big (CMP204)

Summary

Overview

  • Ian Coy, Director of Advanced Computing and Simulation at AWS, discusses the evolution of high-performance computing (HPC) on AWS and the company's strategy for supporting HPC workloads.
  • Michael Bartlett, a Cloud Engineer at the National Renewable Energy Laboratory (NREL), shares his organization's experience in using cloud-based HPC for their research.

Key Takeaways

High-Performance Computing on AWS

  1. Shift in HPC Landscape: There has been a significant migration of traditional HPC workloads from on-premises data centers to the cloud, with an estimated 20% of HPC workloads already in the cloud, expected to grow to a third in the next 3-4 years.
  2. Composable Architectures: AWS offers the flexibility to dynamically provision and instantiate compute resources tailored to specific workloads, allowing customers to optimize their HPC workflows.
  3. AWS Parallel Computing Service (PCS): A new fully managed HPC service that provides a family of job schedulers, starting with Slurm, to help customers easily migrate their on-premises HPC workloads to the cloud.
  4. Purchasing Options: AWS offers various purchasing options for HPC resources, including on-demand, savings plans, spot instances, and the new capacity blocks for GPU instances.
  5. Innovations in Infrastructure: AWS has continuously innovated in the areas of compute (e.g., Graviton4), networking (Elastic Fabric Adapter), and storage (FSx for Lustre) to support high-performance HPC workloads.

NREL's Cloud Journey

  1. Hybrid Approach: NREL maintains an on-premises HPC system, but also leverages the cloud to address specific challenges, such as bypassing queue congestion and providing access to researchers without approval.
  2. Workload Characterization: NREL categorizes their workloads into HPC-style (tightly coupled) and high-throughput (embarrassingly parallel) and selects the appropriate cloud-based solution accordingly.
  3. Parallel Cluster and PCS: NREL has used the Parallel Cluster toolkit and is excited about the upcoming features of the AWS Parallel Computing Service, such as infrastructure-as-code integration and the ability to provide self-service access to researchers.
  4. Future Plans: NREL is exploring a hybrid scheduler that can burst smaller jobs to the cloud, expose exotic hardware, and automate the right-sizing of resources based on workload characteristics.

Conclusion

AWS and its customers are actively shaping the future of high-performance computing by leveraging the flexibility, scalability, and innovation offered by the cloud. The collaboration between AWS and organizations like NREL demonstrates the potential for cloud-based HPC to accelerate scientific and engineering research.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us