TalksAccelerate Apache Spark up to 5 times on AWS with RAPIDS (ANT208)

Accelerate Apache Spark up to 5 times on AWS with RAPIDS (ANT208)

Accelerating Batch Data Processing on GPUs with Rapids Accelerator for Apache Spark

Overview

Challenges of big data processing:
- Exponential growth of data from various sources (internet, consumer devices, IoT)
- Need for high-quality data at low latency
Approaches to handle data growth:
- Horizontal scaling of compute resources
- Downsampling data
- Longer processing times
Proposed solution: Accelerate batch data processing using Nvidia GPUs and the Rapids Accelerator for Apache Spark

Apache Spark 3 and Rapids Accelerator

Apache Spark 3 features that enable GPU processing:
- Resource-aware scheduling
- Columnar data processing
- Plugin architecture
Rapids Accelerator for Apache Spark:
- Plug-in that works with different Apache Spark distributions (EMR, Databricks, etc.)
- Seamless integration with existing user workflows (SQL, DataFrames)
- Transparent GPU acceleration, with fallback to CPU for unsupported operations

Performance and Cost Savings

Benchmarks:
- NDS (Nvidia Decision Support Benchmark) on AWS EC2 instances:
  - Up to 9x faster performance on GPU, with 4.8x average speedup and 72% cost savings
- NDS on AWS EMR:
  - Up to 3.5x faster performance on GPU, with 1.3x average speedup and 14% cost savings
Transaction Fraud Use Case:
- 14x speedup and 87% cost savings when running on GPU cluster

Identifying Workloads for GPU Acceleration

Spark Rapids User Tools:
- Open-source tool to analyze Spark event logs and recommend workloads for GPU acceleration
- Provides recommendations on cluster configuration and job tuning

Getting Started

Documentation:
- Nvidia Rapids Accelerator for Apache Spark documentation
- AWS EMR documentation on using the Rapids Accelerator
Spark Rapids User Tools:
- Available on Python Package Installation site
- Notebook for running the tool on EMR

Conclusion

Exponential data growth can be addressed by accelerating batch data processing on GPUs using the Rapids Accelerator for Apache Spark
Significant performance improvements and cost savings can be achieved by identifying the right workloads for GPU acceleration
The open-source Spark Rapids User Tools can help in this process by analyzing Spark event logs and providing recommendations

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Accelerate Apache Spark up to 5 times on AWS with RAPIDS (ANT208)

Accelerating Batch Data Processing on GPUs with Rapids Accelerator for Apache Spark

Overview

Apache Spark 3 and Rapids Accelerator

Performance and Cost Savings

Identifying Workloads for GPU Acceleration

Getting Started

Conclusion

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Accelerate Apache Spark up to 5 times on AWS with RAPIDS (ANT208)

Accelerating Batch Data Processing on GPUs with Rapids Accelerator for Apache Spark

Overview

Apache Spark 3 and Rapids Accelerator

Performance and Cost Savings

Identifying Workloads for GPU Acceleration

Getting Started

Conclusion

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.