TalksHow Ford unlocked real-time insights using Apache Iceberg on AWS (AUT311)

How Ford unlocked real-time insights using Apache Iceberg on AWS (AUT311)

Connected Vehicles and Modernizing Data Lakes with Apache Iceberg on AWS

Connected Vehicles: Overview

Connected vehicles are enabling new remote operations and enhancing customer experience through real-time data insights and operational efficiency.
Key connected vehicle use cases include:
- Predictive maintenance
- Proactive maintenance
- Vehicle health monitoring
- Vehicle event tracking

Challenges in Connected Vehicle Data Management

Data volume growth as more vehicles get connected
Need for real-time data consumption for remote functions
Data lake scalability to handle high volume and concurrency without impacting end-users

Modernizing Data Lakes with Apache Iceberg

Evolution of Modern Data Lakes

Transition from traditional relational databases and data warehouses to open table formats like Apache Hudi, Apache Iceberg, and Delta Lake.
These formats provide a blend of traditional data lake best practices and the open ecosystem of big data analytics.

Advantages of Apache Iceberg

ACID compliance for data reliability and consistency
Schema enforcement and evolution to handle changing data structures
Scalability and performance to support petabyte-scale data and high write throughput

Iceberg Benefits for Connected Vehicle Platforms

Reliable and consistent data ingestion, even during system failures
Seamless schema changes to accommodate new vehicle telemetry and features
Scalable and performant data lake to handle massive data volumes and concurrency

Ford's Journey with Connected Vehicle Event Store using Apache Iceberg

Ford's Connected Vehicle Platform Overview

Manages over 20 million vehicles globally, enabling bidirectional data exchange and remote vehicle operations.

Building the Event Store Platform

Initial scope: High-cardinality data with moderate freshness requirements.
Rapid growth and scalability challenges:
- Increased job processing times
- Degraded query performance
- Rising storage and compute costs
- Data aging issues

Optimizing the Platform with Apache Iceberg

Clean Zone Improvements:
- Leveraging Glue's lazy loading to optimize file listing
- Replacing custom UDFs with Spark native functions
- Moving to a "backet of files" approach
Migrating to Apache Iceberg:
- Creating a view to seamlessly transition to Iceberg tables
- Leveraging Iceberg's table compaction to address small file issues
- Achieving 80% improvement in query performance

Future Enhancements: Streaming with Iceberg

Adopting a streaming approach to provide real-time access to critical vehicle data
Maintaining a backup raw data layer using the streaming approach

Conclusion: The Transformative Power of Apache Iceberg on AWS

Iceberg's features, such as ACID compliance, schema management, and scalability, are key to modernizing connected vehicle data lakes.
AWS provides comprehensive support for Apache Iceberg through various analytics services, including EMR, Glue, Athena, Sagemaker, and Redshift.
The integration with AWS Glue Catalog and S3 storage ensures a cost-effective, scalable, and optimized transactional data lake.

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

How Ford unlocked real-time insights using Apache Iceberg on AWS (AUT311)

Connected Vehicles and Modernizing Data Lakes with Apache Iceberg on AWS

Connected Vehicles: Overview

Challenges in Connected Vehicle Data Management

Modernizing Data Lakes with Apache Iceberg

Evolution of Modern Data Lakes

Advantages of Apache Iceberg

Iceberg Benefits for Connected Vehicle Platforms

Ford's Journey with Connected Vehicle Event Store using Apache Iceberg

Ford's Connected Vehicle Platform Overview

Building the Event Store Platform

Optimizing the Platform with Apache Iceberg

Future Enhancements: Streaming with Iceberg

Conclusion: The Transformative Power of Apache Iceberg on AWS

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

How Ford unlocked real-time insights using Apache Iceberg on AWS (AUT311)

Connected Vehicles and Modernizing Data Lakes with Apache Iceberg on AWS

Connected Vehicles: Overview

Challenges in Connected Vehicle Data Management

Modernizing Data Lakes with Apache Iceberg

Evolution of Modern Data Lakes

Advantages of Apache Iceberg

Iceberg Benefits for Connected Vehicle Platforms

Ford's Journey with Connected Vehicle Event Store using Apache Iceberg

Ford's Connected Vehicle Platform Overview

Building the Event Store Platform

Optimizing the Platform with Apache Iceberg

Future Enhancements: Streaming with Iceberg

Conclusion: The Transformative Power of Apache Iceberg on AWS

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.