Summary of AWS re:Invent 2025 - Indeed's Migration to Amazon S3 Tables (STG210)
Introduction to Amazon S3 Tables
Amazon S3 Tables is a new AWS service launched over a year ago to address key challenges with storing and managing tabular data in data lakes.
The main problems S3 Tables aims to solve are:
Simplifying security by allowing policies to be applied at the table level rather than individual objects
Improving performance through automatic compaction of data objects
Reducing the complexity of maintaining data lakes at scale by automating tasks like object cleanup and snapshot management
Indeed's Data Lake Challenges
Indeed is the world's largest job site, with 635 million job seeker profiles, 3.3 million employers, and operations in over 60 countries.
Their data lake stores 87+ PB of data, with 550 TB ingested daily across 15,000+ tables.
Key challenges with their previous data lake architecture using Iceberg on general S3 buckets:
High maintenance overhead for Iceberg table management (200+ dev hours/year)
S3 rate limiting issues due to varying access patterns across tables
Complexity of managing object-level access controls for 600M+ objects
Migrating to Amazon S3 Tables
After evaluating the benefits, Indeed decided to migrate their entire data lake to Amazon S3 Tables.
Key drivers for the migration:
10% estimated annual cost savings compared to Iceberg on S3
Dramatically reduced maintenance overhead (no more custom Iceberg management jobs)
Simplified data access and security management at the table level
Faster onboarding experience for new data sources (minutes vs. days)
Migration Approach
Took an incremental, phased approach rather than a "big bang" migration:
Analyzed access logs to categorize data into cohorts and batches based on workloads (Spark, Trino, Athena, etc.)
Implemented a dual-write pipeline to migrate data in phases while maintaining full availability
Developed custom tooling to automate the ingestion of data directly into S3 Tables
Encountered some challenges during the migration:
Tight integration with AWS Lake Formation required significant effort to manage permissions
Query performance differences between Iceberg and S3 Tables required extensive testing
Limitations on S3 Table buckets and objects per account had to be carefully managed
Key Takeaways and Benefits
Indeed is on track to migrate 50+ PB of data to Amazon S3 Tables, with an initial 2.5 PB "canary" migration underway.
The migration has allowed them to:
Reduce data lake maintenance overhead by 80%
Onboard new data sources in minutes instead of days
Redirect 4 developer-months per year from maintenance to product development
Overall, the migration to S3 Tables has enabled Indeed to build a more modern, cost-efficient, and maintainable data lake to power their global job search platform.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.