TalksAWS re:Invent 2025 - FINRA: Accelerate Massive Data Processing with NVIDIA on AWS EMR (AIM279)

AWS re:Invent 2025 - FINRA: Accelerate Massive Data Processing with NVIDIA on AWS EMR (AIM279)

AWS re:Invent 2025 - FINRA: Accelerate Massive Data Processing with NVIDIA on AWS EMR

Introduction to GPU Acceleration for Apache Spark

Apache Spark is a widely-used data processing framework across enterprises and organizations
Data volumes have been growing exponentially in recent years, especially for AI and machine learning workloads
To handle this data growth, enterprises are turning to GPU acceleration to speed up Spark workloads

The GPU Acceleration Stack

The solution introduces a plugin that can be easily integrated into existing Spark workflows
This plugin leverages the NVIDIA GPU layer running on the cloud or on-premises
This GPU acceleration can provide significant performance improvements and cost savings

Real-World Use Cases

Companies across various industries have publicly shared their success with this GPU-accelerated Spark technology
One example is in fraud detection, where billions of records need to be processed using time series analysis - an ideal workload for GPUs
This resulted in a 14x speedup and 90% cost savings compared to CPU-only processing

FINRA's Journey with GPU-Accelerated Spark

Background on FINRA

FINRA is a not-for-profit organization responsible for market integrity and investor protection
They operate over 1 PB of storage in the AWS cloud, processing massive datasets for regulatory compliance and fraud detection

Evaluating GPU Acceleration

FINRA initially used Apache Hive for their SQL queries, then transitioned to Apache Spark
When introduced to GPU-accelerated Spark, they ran tests on the TPCDS 9B benchmark
This resulted in a 50% performance improvement and 50% cost reduction compared to CPU-only Spark

Applying to Production Workloads

FINRA then applied the GPU-accelerated Spark to their production trading application workloads
Again, they saw around 50% performance improvements and 45% cost reductions
However, the initial GPU runs were not optimal, requiring collaboration with NVIDIA to identify and resolve bottlenecks

Integrating GPU Spark into the Data Pipeline

FINRA's data pipeline involves decompressing, type conversion, and parquet conversion of 100,000 daily CSV files
By transitioning this pipeline to use GPU-accelerated Spark, they achieved consistent runtime and cost savings
This required some code changes to leverage Spark DataFrames instead of the less GPU-friendly Dataset API

Lessons Learned and the Path Forward

Not every workload will see immediate benefits from GPU acceleration - identifying bottlenecks is crucial
FINRA has established a process to validate CPU vs GPU performance for their workloads
While CPU remains the default, GPU acceleration is now a strategic part of FINRA's big data technology stack for the future

Key Takeaways

GPU acceleration can provide significant performance improvements and cost savings for large-scale Spark workloads
Integrating GPU-accelerated Spark requires some upfront effort to identify and resolve bottlenecks, but can lead to transformative results
FINRA's experience demonstrates the real-world business impact of this technology, from regulatory compliance to fraud detection
As GPU hardware and software continue to evolve, GPU acceleration is becoming a strategic part of enterprise big data architectures

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

AWS re:Invent 2025 - FINRA: Accelerate Massive Data Processing with NVIDIA on AWS EMR (AIM279)

AWS re:Invent 2025 - FINRA: Accelerate Massive Data Processing with NVIDIA on AWS EMR

Introduction to GPU Acceleration for Apache Spark

The GPU Acceleration Stack

Real-World Use Cases

FINRA's Journey with GPU-Accelerated Spark

Background on FINRA

Evaluating GPU Acceleration

Applying to Production Workloads

Integrating GPU Spark into the Data Pipeline

Lessons Learned and the Path Forward

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - FINRA: Accelerate Massive Data Processing with NVIDIA on AWS EMR (AIM279)

AWS re:Invent 2025 - FINRA: Accelerate Massive Data Processing with NVIDIA on AWS EMR

Introduction to GPU Acceleration for Apache Spark

The GPU Acceleration Stack

Real-World Use Cases

FINRA's Journey with GPU-Accelerated Spark

Background on FINRA

Evaluating GPU Acceleration

Applying to Production Workloads

Integrating GPU Spark into the Data Pipeline

Lessons Learned and the Path Forward

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.