TalksAWS re:Invent 2025 - Amazon S3 Tables architecture, use cases, and best practices (STG334)
AWS re:Invent 2025 - Amazon S3 Tables architecture, use cases, and best practices (STG334)
Summary of AWS re:Invent 2025 - Amazon S3 Tables Architecture, Use Cases, and Best Practices
Overview of Amazon S3 Tables
Amazon S3 Tables is a fully managed service that allows customers to create and manage Apache Iceberg tables directly in Amazon S3
Key benefits include optimized performance and scale, simplified security controls, and automatic table maintenance
S3 Tables is tightly integrated with other AWS services like AWS Glue Data Catalog and provides open access via the Iceberg REST Catalog interface
Recent Launches and Updates
Over the past year, AWS has shipped numerous features to make S3 Tables more flexible and optimized for data lake environments:
Advanced compaction techniques like sort and zorder for improved performance and cost-effectiveness
Expanded to 32 AWS regions and scaled up to 100,000 tables per region
Added table-level encryption using KMS and resource tags for attribute-based access control
Enabled direct access to Athena and SageMaker Unified Studio from the S3 console
Integrated with partner tools using the Iceberg REST Catalog interface
New Capabilities
Iceberg v3 Support:
Adds support for deletion vectors and row lineage, enabling more efficient data modifications and change tracking
Available through services like SageMaker Unified Studio's notebook interface
Intelligent Tiering for S3 Tables:
Automatically transitions table data across S3's frequent, infrequent, and archive access tiers based on access patterns
Optimizes storage costs by up to 80% without impacting performance
Compaction is now tier-aware, focusing on optimizing the most actively queried data
S3 Tables Replication:
Enables replicating Iceberg tables across AWS regions for improved performance, compliance, and data protection
Automatically mirrors table and namespace resources, replicates data and metadata, and maintains snapshot history
Provides built-in audit trails, real-time monitoring, and the ability to configure replica settings independently
Customer Examples
Zeta Global reduced data freshness latency by 80% and compressed time-to-insights from 15 minutes to a few minutes by using S3 Tables for their petabyte-scale data lake.
Indeed is migrating their 85PB data lake to S3 Tables, streamlining their data infrastructure and reducing costs. The migration has unlocked significant business value:
75% faster reporting for the Heron Insights team
65% cost reduction for the Smart Sourcing team
88% reduction in complexity for the Indeed Interviews team
98% improvement in SLOs for the Partner Analytics team
Best Practices and Recommendations
Iceberg Partitioning:
Choose partitioning schemes (time-based or hash-based) that align with your primary query patterns for optimal performance
Iceberg allows changing partitioning schemes over time as data and requirements evolve
Compaction:
Keep the default "auto" compaction mode, which will use sort if a sort order is defined or bin-pack otherwise
Consider using zorder compaction if your queries filter on multiple columns
Snapshot Management:
For batch ETL workloads, keep the default 3-day maximum snapshot age
For streaming workloads, reduce the maximum snapshot age to 24 hours or less to avoid performance degradation from large metadata files
Unreference File Removal:
Generally, keep the default 3-day setting for unreference file removal
Developing Applications with S3 Tables
Demonstrated a web application built using React, Amazon Bedrock, and the DuckDB WebAssembly client to enable natural language querying of S3 Tables
Allows performing complex analytical queries on customer and order data without writing SQL
Highlights the ease of building serverless, browser-based applications that leverage the Iceberg compatibility of S3 Tables
Key Takeaways
S3 Tables simplifies the management of petabyte-scale data lakes by providing a fully managed Iceberg storage service
Recent launches like Iceberg v3 support, Intelligent Tiering, and S3 Tables Replication unlock significant performance, cost, and operational benefits
Customers like Zeta Global and Indeed have seen transformative business impact by migrating to S3 Tables
Best practices around partitioning, compaction, snapshot management, and file removal can help optimize S3 Tables deployments
The ability to build natural language-driven, serverless applications on top of S3 Tables showcases the platform's developer-friendly capabilities
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.