An open collaboration for HPC acceleration (ENU305)
Modernizing HPC for Energy and Utilities on AWS
Collaboration and Ideation
The collaboration started with ideation sessions with energy partners like Shell and Occidental to remove barriers for HPC workloads on AWS, specifically in the oil and gas and subsurface space.
The goal was to change how people collaborate within companies and across companies, solution providers, ISVs, and academia to leverage the latest HPC technologies for oil and gas and subsurface workloads like seismic imaging, reservoir simulation, and others.
The collaboration began with writing a PR/RFQ, which led to the development of a product now being tested and evaluated by operators and customers, with integrations to various partners and solution providers.
Challenges and Opportunities
HPC requirements are constantly increasing, and the complexity of migrating large monolithic workloads is growing with the increasing fidelity and resolution of algorithms and data sizes.
Many customers have perfected their workflows specific to their hardware, software, network, and environment, so the challenge was to help them reimagine how to do this in the cloud, leveraging their hybrid environment and scaling beyond it using the latest technologies.
Vision and Architecture
The Energy HPC Orchestrator is an application that allows customers to build low-code and no-code HPC workflows by dragging and dropping different components, including their custom algorithms, third-party applications, and new ideas from the market, academia, or partners.
The application allows customers to target the right infrastructure and hardware for the right application within an HPC workflow, enabling experimentation and benchmarking.
The serverless architecture decouples the different pieces of a workflow, allowing for fault-tolerant and event-based execution of microservices.
The solution integrates with SDKs and development kits, such as NVIDIA's, to leverage optimized algorithms and hardware.
Use Case: Reverse Time Migration and Seismic Imaging
Reverse time migration and full waveform inversion are examples of heavily distributed workloads that run 24/7 to understand the subsurface and create 3D models.
The solution breaks down the workflow into specific components and microservices, orchestrated through Lambda and SQS, to decouple the tightly coupled and interconnected tasks in a traditional workflow.
This allows for fault-tolerant, event-based execution and reduced data movement between the microservices.
Collaboration and Integration
The collaboration with partners like Accidental, NVIDIA, and SQ has been crucial in integrating their specialized algorithms and technologies into the orchestrator.
Customers can now run pilot projects and move from pilots to commercial licenses more efficiently, leveraging the best-of-breed algorithms and hardware.
The solution also allows for integrating custom algorithms and workflows, as well as exploring the use of machine learning and generative AI alongside traditional HPC workloads.
Future Developments
The team is exploring the integration of chatbots and generative AI to guide users through the workflow and provide recommendations.
There are ongoing experiments to augment the traditional FWI algorithm with machine learning approaches to optimize the process.
The use of Graviton and custom silicon from AWS is being explored to provide cost-effective and energy-efficient options for HPC workloads.
The integration with NVIDIA's DGX Cloud offering on AWS is an exciting development to provide an appliance-like AI experience for customers.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.