TalksAWS re:Invent 2025 - How Samsung optimized 1.2 PB on Amazon DynamoDB with zero downtime (DAT323)

AWS re:Invent 2025 - How Samsung optimized 1.2 PB on Amazon DynamoDB with zero downtime (DAT323)

Optimizing 1.2 PB on Amazon DynamoDB with Zero Downtime: Samsung's Successful Migration

Introduction

  • Samsung Cloud is a key platform for billions of Galaxy users, providing core cloud services like sync, backup, and restore for popular Samsung apps.
  • The platform handles over 1 billion monthly active devices, generating 50 billion requests per day.
  • Samsung chose Amazon DynamoDB as their primary database, using it to store core synchronization data for applications like Samsung Internet.

The Challenge

  • As the service grew, the continuous creation and update of tab data caused the data volume to reach 1.2 PB in the DynamoDB table.
  • This massive table was consuming hundreds of thousands of dollars in AWS storage costs every month.
  • Samsung's mission was to cut storage costs by 50% and recoup the optimization investment within 3 months.

Data Analysis and Insights

  • Analyzing the entire 1.2 PB dataset was not feasible, so Samsung took a sample to understand its characteristics.
  • The analysis revealed that 90% of the data consisted of tab deletion records (tombstones).
  • Further investigation showed that 60% of the tombstones were over 6 months old, indicating a lack of long-term data lifecycle management.

Optimization Strategies

  • Samsung considered two main strategies:
    1. Strategy A: Directly scan and delete the old data from the existing table.
    2. Strategy B: Create a new table and migrate only the necessary data.
  • Strategy A was ruled out as it would require scanning trillions of records, consuming massive RCUs and WUs, and take over 6 months to complete.
  • Strategy B, while an improvement, still required processing a significant amount of data, making it difficult to meet the 3-month ROI target.

Rethinking the Fundamental Assumption

  • Samsung took a step back and questioned the fundamental assumption of the 6-month retention period for tab deletion records.
  • Their investigation revealed that 99.9% of devices needed less than 1 week for tab deletion history to propagate, and 99% received the deletion history within 3 days.
  • This insight led Samsung to decide to change the deletion history window from 6 months to 1 week, drastically reducing the amount of data to be migrated.

Migration Architecture and Principles

  • Samsung established two guiding principles for the migration process:
    1. Simple User Experience: The migration could not cause any negative experience for the user.
    2. Controlled Execution: Every step had to be predictable, observable, and fully controllable.
  • To achieve this, Samsung adopted a per-user migration strategy, where users not yet migrated continued to use the old table, and once a user's data migration was completed, all their requests were redirected to the new table.
  • To manage the migration process, Samsung introduced a "Migration Status Table" (MST) to track the migration status for each user, and a "Flow Regulator" to control the migration speed and protect the old and new tables.

Optimizing Data Extraction

  • Samsung leveraged the existing timestamp-based synchronization mechanism and the local secondary index (LSI) on the table to efficiently extract the necessary data.
  • Instead of a full table scan, they used the LSI to find the data to be migrated (active tabs and deletion records from the last week), and then used the BatchGetItem API to fetch the actual data.
  • This approach dramatically reduced the read capacity units (RCUs) required, making the migration process more cost-effective and faster.

Results and Key Takeaways

  • The entire migration was completed in just 1 week, with the number of concurrently migrating users staying steady at a few tens of thousands.
  • The table size was reduced from 1.2 PB to just 100 TB, slashing the storage costs and recovering the full investment before the next AWS bill.
  • Importantly, Samsung received zero customer inquiries or fault reports related to the migration, proving the success of their user-centric approach.

Key Lessons

  1. Power of Data-Driven Decision Making: Samsung's data analysis and sampling revealed the problem and provided the confidence to make critical decisions.
  2. Importance of Domain Knowledge: Understanding the true purpose of the tombstone records and the 99-tab limit policy was crucial for finding the right solution.
  3. User-Centric Mindset: Prioritizing the user experience over technical and financial success was the driving force behind Samsung's successful migration.

Conclusion

Samsung's story demonstrates the power of a data-driven, domain-informed, and user-centric approach to solving complex technical challenges. By rethinking fundamental assumptions and designing a migration process with user experience and controlled execution in mind, Samsung was able to optimize a massive 1.2 PB DynamoDB table with zero downtime and minimal impact on their customers.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.