TalksAWS re:Invent 2025 - HPC at Scale with AWS Parallel Computing Service (PCS) (CMP340)

AWS re:Invent 2025 - HPC at Scale with AWS Parallel Computing Service (PCS) (CMP340)

AWS re:Invent 2025 - HPC at Scale with AWS Parallel Computing Service (PCS)

Overview of HPC at AWS

AWS has been working on HPC capabilities since 2015, starting with the open-source Cloud Formation Cluster (CFN) toolkit
Customers wanted a more fully-managed HPC service from AWS, leading to the development of AWS Parallel Cluster (PC) and eventually AWS Parallel Computing Service (PCS)
PCS is designed to address the needs of demanding HPC customers like Shell and Toyota, who were initially skeptical about running HPC workloads on AWS

What is AWS Parallel Computing Service (PCS)?

PCS is a managed Slurm offering, providing a fully-managed HPC-as-a-service solution
Slurm was chosen as the initial scheduler due to its popularity in open-source and academia, especially for large language model training and AI workloads
PCS allows customers to focus on their scientific workloads, research, and simulations, while AWS handles the underlying infrastructure and operations

Key Features and Benefits of PCS

Managed Slurm scheduler, allowing dynamic scaling and scheduling of compute resources
Seamless infrastructure-as-code and API-driven development, reducing the need for manual cluster management
Integrated with AWS services like CloudWatch for observability and cost optimization
Flexible architecture supporting CPUs, GPUs, and various storage options
Designed to meet the needs of multiple stakeholders: HPC system administrators, scientists, and engineers

Architectural Overview of PCS

PCS clusters consist of login nodes, compute node groups, and queues that can be configured to schedule jobs across different instance types
The service follows a shared responsibility model, where AWS manages the controller and updates, while customers manage their VPC, compute nodes, and workloads
Customers can purchase PCS resources using On-Demand, Spot, or a combination, depending on their needs

Pricing and Availability

PCS pricing includes a fee for the cluster controller and Slurm accounting, in addition to the standard EC2 instance costs
PCS is currently available in select AWS regions, but the plan is to expand it globally by the end of 2026

Customer Adoption and Success Stories

Toyota Central R&D Labs

Toyota faced challenges with managing their on-premises HPC environment, including long lead times for adding new resources and inefficient resource utilization
By adopting PCS, Toyota was able to:
- Reduce environment setup time from 6 weeks to 30 minutes
- Quickly accommodate requests for advanced compute resources like R7 48xR and P4D 24xA100 instances
- Improve overall utilization and cost optimization through dynamic scaling

Shell

Shell initially had concerns about the performance, security, and cost-effectiveness of running HPC workloads on AWS
After a long journey, Shell was able to:
- Achieve a 2.5x acceleration of critical path projects by leveraging PCS and burst capacity
- Seamlessly integrate PCS with their existing Slurm-based workflows
- Benefit from the flexibility and scalability of PCS, allowing them to iterate faster on their HPC solutions

Future Developments and Integrations

AWS announced the upcoming availability of the latest AMD EPYC Trento processors in the HPC 8A instance family
AWS is also investing $50 billion to ensure the latest HPC and AI resources are available in their GovCloud and classified regions

Key Takeaways

PCS provides a fully-managed HPC-as-a-service solution, allowing customers to focus on their scientific workloads while AWS handles the underlying infrastructure
Customers like Toyota and Shell have seen significant benefits in terms of reduced setup time, improved resource utilization, and accelerated innovation cycles by adopting PCS
PCS offers a flexible and scalable architecture, supporting a variety of compute and storage options, and is designed to meet the needs of multiple stakeholders within HPC organizations
AWS is continuously investing in and expanding its HPC capabilities, including the upcoming availability of the latest AMD EPYC processors and a $50 billion investment in HPC resources for government and classified workloads

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

AWS re:Invent 2025 - HPC at Scale with AWS Parallel Computing Service (PCS) (CMP340)

AWS re:Invent 2025 - HPC at Scale with AWS Parallel Computing Service (PCS)

Overview of HPC at AWS

What is AWS Parallel Computing Service (PCS)?

Key Features and Benefits of PCS

Architectural Overview of PCS

Pricing and Availability

Customer Adoption and Success Stories

Toyota Central R&D Labs

Shell

Future Developments and Integrations

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - HPC at Scale with AWS Parallel Computing Service (PCS) (CMP340)

AWS re:Invent 2025 - HPC at Scale with AWS Parallel Computing Service (PCS)

Overview of HPC at AWS

What is AWS Parallel Computing Service (PCS)?

Key Features and Benefits of PCS

Architectural Overview of PCS

Pricing and Availability

Customer Adoption and Success Stories

Toyota Central R&D Labs

Shell

Future Developments and Integrations

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.