TalksAWS re:Invent 2025 - Indeed's migration to Amazon S3 Tables (STG210)

AWS re:Invent 2025 - Indeed's migration to Amazon S3 Tables (STG210)

Summary of AWS re:Invent 2025 - Indeed's Migration to Amazon S3 Tables (STG210)

Introduction to Amazon S3 Tables

  • Amazon S3 Tables is a new AWS service launched over a year ago to address key challenges with storing and managing tabular data in data lakes.
  • The main problems S3 Tables aims to solve are:
    1. Simplifying security by allowing policies to be applied at the table level rather than individual objects
    2. Improving performance through automatic compaction of data objects
    3. Reducing the complexity of maintaining data lakes at scale by automating tasks like object cleanup and snapshot management

Indeed's Data Lake Challenges

  • Indeed is the world's largest job site, with 635 million job seeker profiles, 3.3 million employers, and operations in over 60 countries.
  • Their data lake stores 87+ PB of data, with 550 TB ingested daily across 15,000+ tables.
  • Key challenges with their previous data lake architecture using Iceberg on general S3 buckets:
    • High maintenance overhead for Iceberg table management (200+ dev hours/year)
    • S3 rate limiting issues due to varying access patterns across tables
    • Complexity of managing object-level access controls for 600M+ objects

Migrating to Amazon S3 Tables

  • After evaluating the benefits, Indeed decided to migrate their entire data lake to Amazon S3 Tables.
  • Key drivers for the migration:
    • 10% estimated annual cost savings compared to Iceberg on S3
    • Dramatically reduced maintenance overhead (no more custom Iceberg management jobs)
    • Simplified data access and security management at the table level
    • Faster onboarding experience for new data sources (minutes vs. days)

Migration Approach

  • Took an incremental, phased approach rather than a "big bang" migration:
    • Analyzed access logs to categorize data into cohorts and batches based on workloads (Spark, Trino, Athena, etc.)
    • Implemented a dual-write pipeline to migrate data in phases while maintaining full availability
    • Developed custom tooling to automate the ingestion of data directly into S3 Tables
  • Encountered some challenges during the migration:
    • Tight integration with AWS Lake Formation required significant effort to manage permissions
    • Query performance differences between Iceberg and S3 Tables required extensive testing
    • Limitations on S3 Table buckets and objects per account had to be carefully managed

Key Takeaways and Benefits

  • Indeed is on track to migrate 50+ PB of data to Amazon S3 Tables, with an initial 2.5 PB "canary" migration underway.
  • The migration has allowed them to:
    • Reduce data lake maintenance overhead by 80%
    • Onboard new data sources in minutes instead of days
    • Redirect 4 developer-months per year from maintenance to product development
  • Overall, the migration to S3 Tables has enabled Indeed to build a more modern, cost-efficient, and maintainable data lake to power their global job search platform.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.