Dive deep on Amazon S3 (STG302)

Here's a detailed summary of the video transcription in markdown format, broken down into sections for better readability:

Introduction

Seth Markle, a senior principal engineer in Amazon S3, will be discussing how the scale of S3 can be leveraged to your advantage.
James Borella, a principal engineer, will also be participating in the discussion.
S3 has massive scale, holding over 400 trillion objects, averaging over 150 million requests per second, and serving over 1 PB per second of traffic.

As a storage engineer, Seth is concerned with the real-world physical properties of the underlying hardware, which can't be solved with better software alone.
Hard drives have two primary movements required to read data: spinning the platter and moving the actuator arm.
These physical limitations, such as seek time and rotational latency, affect the performance of drives.
S3 uses a log-structured file system called Shard Store to optimize for these physical properties.
Storage workloads tend to be bursty, with periods of high activity and long periods of low activity.

At scale, the aggregate workloads across millions of customers become more predictable, allowing S3 to spread customer traffic across many more drives than their individual storage footprint would require.
This provides the benefits of increased throughput and workload isolation, as peaks and valleys of different customers' workloads are decorrelated.
S3 also constantly rebalances data across the fleet to maintain an even distribution of storage temperature (hot and cold data) across drives.
When new storage racks are added, S3 takes the opportunity to revisit past placement decisions and redistribute data for better load balancing.

James discusses how S3 uses a technique called "Shuffle Sharding" to intentionally decorrelate workloads across the system.
Rather than statically assigning data to specific drives or servers, S3 randomly spreads shards of data across the fleet, ensuring that workloads from the same customer or bucket are not constrained by the same set of resources.
This decorrelation technique is used throughout the S3 architecture, including in the DNS load balancing and the AWS Common Runtime's handling of tail latency.

Fault tolerance is not just about being robust, but also about knowing when something has failed in the first place, which is a challenging problem at S3's scale.
S3 uses erasure coding to achieve fault tolerance at a lower overhead than replication, and this technique also enables faster deployment of new software and hardware.
The birthday paradox is used to ensure that no single piece of new software or hardware is overexposed to the entire fleet.
Erasure coding also helps improve performance by allowing S3 to cancel and retry slow requests, leveraging the fact that any subset of the coded shards can be used to recover the original data.

The S3 team's focus on fault tolerance and designing for decorrelation enables them to move quickly and with confidence, as the system is resilient to failures.
The scale of S3 is not just a challenge to be tolerated, but a key advantage that allows the team to deliver a more durable, performant, and reliable storage service.