Talks AWS re:Invent 2025 - Building multi-Region data lakes with Replication for Amazon S3 Tables (STG358) VIDEO
AWS re:Invent 2025 - Building multi-Region data lakes with Replication for Amazon S3 Tables (STG358) Building Multi-Region Data Lakes with Replication for Amazon S3 Tables
Overview
Presentation on the new replication support for Amazon S3 Tables, a fully managed service for running Apache Iceberg data lakes in the cloud
Covers the key drivers for multi-region data lakes, challenges with existing replication approaches, and details on the new replication capabilities
Why Multi-Region Data Lakes?
Performance : Placing data closer to users reduces latency and inter-region data transfer costs
Compliance : Regulatory requirements often mandate having a secondary isolated copy of data
Data Protection : Protecting against accidental data loss, deletion, or ransomware attacks by having a separate replica
Challenges with Existing Iceberg Replication Approaches
S3 Replication :
Asynchronous nature requires custom logic to coordinate commits
Absolute file paths in metadata need transformation
Requires coordination with catalogs and applications
Custom ETL/Spark Jobs :
Need to track state and handle errors in replication process
Require deep understanding of Iceberg spec to handle schema/partition changes
Dual Writes :
Puts replication infrastructure on critical path of applications
Requires rethinking latency, performance, and application changes
Introducing Replication for Amazon S3 Tables
Fully managed service that replicates Iceberg tables across regions and accounts
Creates "read-only replicas" with the following key characteristics:
Same namespace and table names as source
Backfills current snapshots and replicates ongoing updates
Understands Iceberg data and metadata natively
Replicas are always query-ready
Key Benefits
Simplified Operations :
Configure replication in a few clicks
Get out-of-the-box auditing and monitoring
Integrate with AWS analytics services seamlessly
Scalability and Flexibility :
S3 replicates over 150PB of data per week
Customize replicas with different storage classes, retention, and encryption
Purpose-Built for Iceberg :
Maintains snapshot ordering and transforms metadata file paths
Merges updates intelligently to maintain long-term replicas
Replication Use Cases
Distributed Analytics : Fan out data to multiple regions to serve global teams
Centralized Analytics : Aggregate data from distributed sources into a central region
Data Protection : Maintain isolated replicas for compliance, DR, and rollback scenarios
Tiered Retention : Create replicas with different retention policies for different use cases
Replication Workflow Demonstration
Configuring table-level replication and backfilling existing data
Replicating ongoing updates and monitoring replication status
Centralizing data from multiple regions for analytics
Technical Challenges
Snapshot Ordering : Ensuring replicas maintain the correct sequence of Iceberg snapshots
Concurrency Control : Merging metadata updates intelligently between source and replica
Customer Case Study: Zeta Global
Zeta Global operates a large-scale AI-powered marketing platform
Ingests 6TB of data daily into 10,000+ Iceberg tables
Adopted Amazon S3 Tables to handle scale and performance
Using S3 Tables Replication to maintain region-local data replicas for low-latency access
Key Takeaways
Amazon S3 Tables now offers fully managed replication capabilities for Iceberg data lakes
Simplifies multi-region deployments, improves data protection, and enables flexible replication policies
Purpose-built to maintain Iceberg semantics and metadata during replication
Enables new use cases around distributed and centralized analytics
Demonstrated through real-world customer examples and technical deep dives
Your Digital Journey deserves a great story. Build one with us.