TalksAWS re:Invent 2025 - Maximize the value of cold data with Amazon S3 Glacier storage classes (STG208)

AWS re:Invent 2025 - Maximize the value of cold data with Amazon S3 Glacier storage classes (STG208)

Maximizing the Value of Cold Data with Amazon S3 Glacier Storage Classes

The Importance of Cold Data

  • S3 currently stores hundreds of trillions of objects, with 70-80% of that data being "cold" - rarely accessed data stored for months, years, or decades
  • Cold data is no longer just dormant storage, but a catalyst for innovation across industries
  • Customers are unlocking new insights and competitive advantages by leveraging their historical, archived data

S3 Storage Class Continuum

  • S3 offers a continuum of storage classes balancing access speed and cost efficiency:
    • S3 Standard: Millisecond access for active data
    • S3 Infrequent Access: Lower cost for less frequently accessed data
    • S3 Glacier Storage Classes:
      • Glacier Instant Retrieval: Fast access for rarely accessed critical data
      • Glacier Flexible Retrieval: Lower cost with flexible retrieval times
      • Glacier Deep Archive: Lowest cost storage for long-term retention

Automating Data Lifecycle Management

  • S3 Lifecycle Policies allow automatically transitioning data between storage classes as access patterns change
  • Policies can be applied to entire buckets or filtered by prefix, tags, size, and versioning
  • S3 Intelligent Tiering automatically moves data between access tiers based on usage patterns

Restoring Archived Data

  • Key use cases for restoring archived data:
    • Reviving historical content for new audiences
    • Leveraging archived data for strategic decision-making
    • Training machine learning models on vast historical datasets
  • Restoration options:
    • Glacier Instant Retrieval: Same API as S3, higher retrieval costs
    • Glacier Flexible Retrieval and Glacier Deep Archive: 3-step process (initiate, monitor, access)
    • Batch Operations: Optimize restore performance by maximizing transactions per second

New Archive-Focused Features

  1. Compute Checksum Operation:

    • Allows verifying data integrity of objects stored in any S3 storage class, including Glacier
    • Eliminates the need to download objects to calculate checksums locally
    • Leverages S3 Batch Operations for efficient, scalable checksum verification
    • Provides detailed completion reports for auditing and compliance
  2. S3 Metadata:

    • Automatically extracts and stores object metadata (tags, storage class, size, etc.) in queryable tables
    • Enables instant SQL queries and natural language searches on archived data
    • Provides a "system of record" for understanding the contents and state of S3 buckets
    • Democratizes access to insights from archived data for teams beyond just data engineers

Key Takeaways

  • Cold data is a valuable asset, not just dormant storage, with opportunities to drive innovation and competitive advantage
  • S3 Glacier storage classes offer a continuum of cost and access tradeoffs to optimize for different cold data use cases
  • Automated lifecycle management and restoration capabilities make it easy to manage and access archived data
  • New features like Compute Checksum and S3 Metadata simplify data integrity validation and enable rapid insights from archived data

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.