TalksAWS re:Invent 2025 - Amazon S3 Tables architecture, use cases, and best practices (STG334)

AWS re:Invent 2025 - Amazon S3 Tables architecture, use cases, and best practices (STG334)

Summary of AWS re:Invent 2025 - Amazon S3 Tables Architecture, Use Cases, and Best Practices

Overview of Amazon S3 Tables

  • Amazon S3 Tables is a fully managed service that allows customers to create and manage Apache Iceberg tables directly in Amazon S3
  • Key benefits include optimized performance and scale, simplified security controls, and automatic table maintenance
  • S3 Tables is tightly integrated with other AWS services like AWS Glue Data Catalog and provides open access via the Iceberg REST Catalog interface

Recent Launches and Updates

  • Over the past year, AWS has shipped numerous features to make S3 Tables more flexible and optimized for data lake environments:
    • Advanced compaction techniques like sort and zorder for improved performance and cost-effectiveness
    • Expanded to 32 AWS regions and scaled up to 100,000 tables per region
    • Added table-level encryption using KMS and resource tags for attribute-based access control
    • Enabled direct access to Athena and SageMaker Unified Studio from the S3 console
    • Integrated with partner tools using the Iceberg REST Catalog interface

New Capabilities

  1. Iceberg v3 Support:

    • Adds support for deletion vectors and row lineage, enabling more efficient data modifications and change tracking
    • Available through services like SageMaker Unified Studio's notebook interface
  2. Intelligent Tiering for S3 Tables:

    • Automatically transitions table data across S3's frequent, infrequent, and archive access tiers based on access patterns
    • Optimizes storage costs by up to 80% without impacting performance
    • Compaction is now tier-aware, focusing on optimizing the most actively queried data
  3. S3 Tables Replication:

    • Enables replicating Iceberg tables across AWS regions for improved performance, compliance, and data protection
    • Automatically mirrors table and namespace resources, replicates data and metadata, and maintains snapshot history
    • Provides built-in audit trails, real-time monitoring, and the ability to configure replica settings independently

Customer Examples

  • Zeta Global reduced data freshness latency by 80% and compressed time-to-insights from 15 minutes to a few minutes by using S3 Tables for their petabyte-scale data lake.
  • Indeed is migrating their 85PB data lake to S3 Tables, streamlining their data infrastructure and reducing costs. The migration has unlocked significant business value:
    • 75% faster reporting for the Heron Insights team
    • 65% cost reduction for the Smart Sourcing team
    • 88% reduction in complexity for the Indeed Interviews team
    • 98% improvement in SLOs for the Partner Analytics team

Best Practices and Recommendations

  1. Iceberg Partitioning:

    • Choose partitioning schemes (time-based or hash-based) that align with your primary query patterns for optimal performance
    • Iceberg allows changing partitioning schemes over time as data and requirements evolve
  2. Compaction:

    • Keep the default "auto" compaction mode, which will use sort if a sort order is defined or bin-pack otherwise
    • Consider using zorder compaction if your queries filter on multiple columns
  3. Snapshot Management:

    • For batch ETL workloads, keep the default 3-day maximum snapshot age
    • For streaming workloads, reduce the maximum snapshot age to 24 hours or less to avoid performance degradation from large metadata files
  4. Unreference File Removal:

    • Generally, keep the default 3-day setting for unreference file removal

Developing Applications with S3 Tables

  • Demonstrated a web application built using React, Amazon Bedrock, and the DuckDB WebAssembly client to enable natural language querying of S3 Tables
  • Allows performing complex analytical queries on customer and order data without writing SQL
  • Highlights the ease of building serverless, browser-based applications that leverage the Iceberg compatibility of S3 Tables

Key Takeaways

  • S3 Tables simplifies the management of petabyte-scale data lakes by providing a fully managed Iceberg storage service
  • Recent launches like Iceberg v3 support, Intelligent Tiering, and S3 Tables Replication unlock significant performance, cost, and operational benefits
  • Customers like Zeta Global and Indeed have seen transformative business impact by migrating to S3 Tables
  • Best practices around partitioning, compaction, snapshot management, and file removal can help optimize S3 Tables deployments
  • The ability to build natural language-driven, serverless applications on top of S3 Tables showcases the platform's developer-friendly capabilities

Additional Resources

  • S3 Tables tutorial: [link]
  • S3 Tables workshop: [link]
  • SageMaker Unified Studio integration: [link]

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.