Summary
Overview
- Ian Coy, Director of Advanced Computing and Simulation at AWS, discusses the evolution of high-performance computing (HPC) on AWS and the company's strategy for supporting HPC workloads.
- Michael Bartlett, a Cloud Engineer at the National Renewable Energy Laboratory (NREL), shares his organization's experience in using cloud-based HPC for their research.
Key Takeaways
High-Performance Computing on AWS
- Shift in HPC Landscape: There has been a significant migration of traditional HPC workloads from on-premises data centers to the cloud, with an estimated 20% of HPC workloads already in the cloud, expected to grow to a third in the next 3-4 years.
- Composable Architectures: AWS offers the flexibility to dynamically provision and instantiate compute resources tailored to specific workloads, allowing customers to optimize their HPC workflows.
- AWS Parallel Computing Service (PCS): A new fully managed HPC service that provides a family of job schedulers, starting with Slurm, to help customers easily migrate their on-premises HPC workloads to the cloud.
- Purchasing Options: AWS offers various purchasing options for HPC resources, including on-demand, savings plans, spot instances, and the new capacity blocks for GPU instances.
- Innovations in Infrastructure: AWS has continuously innovated in the areas of compute (e.g., Graviton4), networking (Elastic Fabric Adapter), and storage (FSx for Lustre) to support high-performance HPC workloads.
NREL's Cloud Journey
- Hybrid Approach: NREL maintains an on-premises HPC system, but also leverages the cloud to address specific challenges, such as bypassing queue congestion and providing access to researchers without approval.
- Workload Characterization: NREL categorizes their workloads into HPC-style (tightly coupled) and high-throughput (embarrassingly parallel) and selects the appropriate cloud-based solution accordingly.
- Parallel Cluster and PCS: NREL has used the Parallel Cluster toolkit and is excited about the upcoming features of the AWS Parallel Computing Service, such as infrastructure-as-code integration and the ability to provide self-service access to researchers.
- Future Plans: NREL is exploring a hybrid scheduler that can burst smaller jobs to the cloud, expose exotic hardware, and automate the right-sizing of resources based on workload characteristics.
Conclusion
AWS and its customers are actively shaping the future of high-performance computing by leveraging the flexibility, scalability, and innovation offered by the cloud. The collaboration between AWS and organizations like NREL demonstrates the potential for cloud-based HPC to accelerate scientific and engineering research.