TalksAWS re:Invent 2025 - Building multi-Region data lakes with Replication for Amazon S3 Tables (STG358)

AWS re:Invent 2025 - Building multi-Region data lakes with Replication for Amazon S3 Tables (STG358)

Building Multi-Region Data Lakes with Replication for Amazon S3 Tables

Overview

  • Presentation on the new replication support for Amazon S3 Tables, a fully managed service for running Apache Iceberg data lakes in the cloud
  • Covers the key drivers for multi-region data lakes, challenges with existing replication approaches, and details on the new replication capabilities

Why Multi-Region Data Lakes?

  • Performance: Placing data closer to users reduces latency and inter-region data transfer costs
  • Compliance: Regulatory requirements often mandate having a secondary isolated copy of data
  • Data Protection: Protecting against accidental data loss, deletion, or ransomware attacks by having a separate replica

Challenges with Existing Iceberg Replication Approaches

  1. S3 Replication:
    • Asynchronous nature requires custom logic to coordinate commits
    • Absolute file paths in metadata need transformation
    • Requires coordination with catalogs and applications
  2. Custom ETL/Spark Jobs:
    • Need to track state and handle errors in replication process
    • Require deep understanding of Iceberg spec to handle schema/partition changes
  3. Dual Writes:
    • Puts replication infrastructure on critical path of applications
    • Requires rethinking latency, performance, and application changes

Introducing Replication for Amazon S3 Tables

  • Fully managed service that replicates Iceberg tables across regions and accounts
  • Creates "read-only replicas" with the following key characteristics:
    1. Same namespace and table names as source
    2. Backfills current snapshots and replicates ongoing updates
    3. Understands Iceberg data and metadata natively
    4. Replicas are always query-ready

Key Benefits

  1. Simplified Operations:
    • Configure replication in a few clicks
    • Get out-of-the-box auditing and monitoring
    • Integrate with AWS analytics services seamlessly
  2. Scalability and Flexibility:
    • S3 replicates over 150PB of data per week
    • Customize replicas with different storage classes, retention, and encryption
  3. Purpose-Built for Iceberg:
    • Maintains snapshot ordering and transforms metadata file paths
    • Merges updates intelligently to maintain long-term replicas

Replication Use Cases

  1. Distributed Analytics: Fan out data to multiple regions to serve global teams
  2. Centralized Analytics: Aggregate data from distributed sources into a central region
  3. Data Protection: Maintain isolated replicas for compliance, DR, and rollback scenarios
  4. Tiered Retention: Create replicas with different retention policies for different use cases

Replication Workflow Demonstration

  1. Configuring table-level replication and backfilling existing data
  2. Replicating ongoing updates and monitoring replication status
  3. Centralizing data from multiple regions for analytics

Technical Challenges

  1. Snapshot Ordering: Ensuring replicas maintain the correct sequence of Iceberg snapshots
  2. Concurrency Control: Merging metadata updates intelligently between source and replica

Customer Case Study: Zeta Global

  • Zeta Global operates a large-scale AI-powered marketing platform
  • Ingests 6TB of data daily into 10,000+ Iceberg tables
  • Adopted Amazon S3 Tables to handle scale and performance
  • Using S3 Tables Replication to maintain region-local data replicas for low-latency access

Key Takeaways

  • Amazon S3 Tables now offers fully managed replication capabilities for Iceberg data lakes
  • Simplifies multi-region deployments, improves data protection, and enables flexible replication policies
  • Purpose-built to maintain Iceberg semantics and metadata during replication
  • Enables new use cases around distributed and centralized analytics
  • Demonstrated through real-world customer examples and technical deep dives

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.