TalksAWS re:Invent 2025 - Building multi-Region data lakes with Replication for Amazon S3 Tables (STG358)

AWS re:Invent 2025 - Building multi-Region data lakes with Replication for Amazon S3 Tables (STG358)

Building Multi-Region Data Lakes with Replication for Amazon S3 Tables

Overview

Presentation on the new replication support for Amazon S3 Tables, a fully managed service for running Apache Iceberg data lakes in the cloud
Covers the key drivers for multi-region data lakes, challenges with existing replication approaches, and details on the new replication capabilities

Why Multi-Region Data Lakes?

Performance: Placing data closer to users reduces latency and inter-region data transfer costs
Compliance: Regulatory requirements often mandate having a secondary isolated copy of data
Data Protection: Protecting against accidental data loss, deletion, or ransomware attacks by having a separate replica

Challenges with Existing Iceberg Replication Approaches

S3 Replication:
- Asynchronous nature requires custom logic to coordinate commits
- Absolute file paths in metadata need transformation
- Requires coordination with catalogs and applications
Custom ETL/Spark Jobs:
- Need to track state and handle errors in replication process
- Require deep understanding of Iceberg spec to handle schema/partition changes
Dual Writes:
- Puts replication infrastructure on critical path of applications
- Requires rethinking latency, performance, and application changes

Introducing Replication for Amazon S3 Tables

Fully managed service that replicates Iceberg tables across regions and accounts
Creates "read-only replicas" with the following key characteristics:
1. Same namespace and table names as source
2. Backfills current snapshots and replicates ongoing updates
3. Understands Iceberg data and metadata natively
4. Replicas are always query-ready

Key Benefits

Simplified Operations:
- Configure replication in a few clicks
- Get out-of-the-box auditing and monitoring
- Integrate with AWS analytics services seamlessly
Scalability and Flexibility:
- S3 replicates over 150PB of data per week
- Customize replicas with different storage classes, retention, and encryption
Purpose-Built for Iceberg:
- Maintains snapshot ordering and transforms metadata file paths
- Merges updates intelligently to maintain long-term replicas

Replication Use Cases

Distributed Analytics: Fan out data to multiple regions to serve global teams
Centralized Analytics: Aggregate data from distributed sources into a central region
Data Protection: Maintain isolated replicas for compliance, DR, and rollback scenarios
Tiered Retention: Create replicas with different retention policies for different use cases

Replication Workflow Demonstration

Configuring table-level replication and backfilling existing data
Replicating ongoing updates and monitoring replication status
Centralizing data from multiple regions for analytics

Technical Challenges

Snapshot Ordering: Ensuring replicas maintain the correct sequence of Iceberg snapshots
Concurrency Control: Merging metadata updates intelligently between source and replica

Customer Case Study: Zeta Global

Zeta Global operates a large-scale AI-powered marketing platform
Ingests 6TB of data daily into 10,000+ Iceberg tables
Adopted Amazon S3 Tables to handle scale and performance
Using S3 Tables Replication to maintain region-local data replicas for low-latency access

Key Takeaways

Amazon S3 Tables now offers fully managed replication capabilities for Iceberg data lakes
Simplifies multi-region deployments, improves data protection, and enables flexible replication policies
Purpose-built to maintain Iceberg semantics and metadata during replication
Enables new use cases around distributed and centralized analytics
Demonstrated through real-world customer examples and technical deep dives

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

AWS re:Invent 2025 - Building multi-Region data lakes with Replication for Amazon S3 Tables (STG358)

Building Multi-Region Data Lakes with Replication for Amazon S3 Tables

Overview

Why Multi-Region Data Lakes?

Challenges with Existing Iceberg Replication Approaches

Introducing Replication for Amazon S3 Tables

Key Benefits

Replication Use Cases

Replication Workflow Demonstration

Technical Challenges

Customer Case Study: Zeta Global

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - Building multi-Region data lakes with Replication for Amazon S3 Tables (STG358)

Building Multi-Region Data Lakes with Replication for Amazon S3 Tables

Overview

Why Multi-Region Data Lakes?

Challenges with Existing Iceberg Replication Approaches

Introducing Replication for Amazon S3 Tables

Key Benefits

Replication Use Cases

Replication Workflow Demonstration

Technical Challenges

Customer Case Study: Zeta Global

Key Takeaways

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.