TalksAWS re:Invent 2025 - [NEW LAUNCH] What's new in Apache Iceberg v3 and beyond (OPN201)

AWS re:Invent 2025 - [NEW LAUNCH] What's new in Apache Iceberg v3 and beyond (OPN201)

Summary of AWS re:Invent 2025 - What's New in Apache Iceberg v3 and Beyond

Overview of Apache Iceberg

  • Iceberg is an open-source project that started at Netflix in 2017 and became an Apache incubator project in 2018, graduating to a top-level Apache project in 2020.
  • The key focus of Iceberg is enabling interoperability between different data processing engines by providing a standardized table format and specification.
  • The presentation covers the key features introduced in the recently released Iceberg v3 specification, as well as upcoming proposals for Iceberg v4.

Iceberg v3 Key Features

Variant Data Type

  • Iceberg v3 introduces support for semi-structured "variant" data types, allowing for flexible schema handling.
  • This addresses common challenges with processing JSON or other unstructured data in a fixed schema environment.
  • Key benefits of the variant data type include:
    • Improved performance and cost by storing data in a columnar format with statistics and predicate pushdown
    • Automatic schema evolution to handle new fields or elements in the semi-structured data
    • Efficient storage through compression of the shredded sub-columns
    • Easy querying using schema navigation through dot notation
  • Common use cases include IoT workloads, data pipelines, and real-time analytics on varied data sources.

Deletion Vectors

  • Iceberg v3 introduces "deletion vectors" as an optimization for handling deletes, replacing the previous "positional deletes" approach.
  • Deletion vectors store a bitmap of deleted rows, allowing for more efficient writes and reduced storage overhead compared to rewriting entire data files.
  • This addresses the "write amplification" problem seen with the previous "copy-on-write" delete model.
  • Key use cases include GDPR compliance, data cleanup in data lakes, and optimizing incremental data pipelines.

Row Lineage

  • Iceberg v3 adds "row lineage" capabilities, automatically tracking row-level changes and sequence numbers.
  • This provides built-in change data capture (CDC) functionality, enabling use cases like incremental processing, event lifecycle tracking, and data debugging.
  • Row lineage information is stored directly with the data, allowing for efficient time travel queries and change correlation across snapshots.
  • Compared to the previous "change log" approach in Iceberg v2, row lineage simplifies implementation and maintenance.

Additional Iceberg v3 Features

  • Default values: Ability to specify default values for columns, stored in the table metadata.
  • Table encryption keys: Support for table-level encryption with integration to KMS.
  • Multi-argument transformations: Ability to specify multiple columns for partitioning and sorting transformations.
  • New data types: Addition of nanosecond timestamps, geography/geometry types, and "unknown" null placeholder.

Iceberg v4 Proposals

The Iceberg community is also exploring the following performance-focused proposals for Iceberg v4:

  • Improved column statistics: More efficient storage and access of column-level statistics.
  • Adapted metadata tree: Optimizations to the multi-layered metadata structure to improve small write and delete performance.
  • Relative paths: Support for storing relative instead of absolute paths to data files, enabling easier table copying and replication.

Conclusion

Iceberg v3 introduces significant new capabilities around semi-structured data handling, optimized deletes, and built-in change tracking. These features address key challenges faced by organizations working with large-scale, heterogeneous data. The upcoming Iceberg v4 proposals further focus on improving performance and operational efficiency. Adopters are encouraged to evaluate Iceberg v3 and provide feedback to the open-source community as it shapes the future roadmap.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.