AWS storage services: The foundation for data-driven innovation (STG209)
Key Takeaways
Evolution of Storage at AWS
Over the past decade, AWS S3 has gone from serving under 100 customers with over 1 petabyte of data to thousands of customers, including some at exabyte scale.
The team has had to innovate to continuously scale S3 without customers having to worry about provisioning or scaling.
Some key innovations include:
Disaggregating storage and compute for storage racks to improve flexibility.
Intelligently managing the placement and distribution of data to balance workloads across the storage fleet.
Loosening Storage Limits
AWS has made several updates to loosen storage limits over the past year:
FSx for Open ZFS now supports an intelligent tiering storage class that uses S3 for cold data, reducing costs by up to 85% compared to the SSD-based version.
Amazon EFS achieved a 10x increase in read IOPS per file system and a 2x increase in file system throughput limit.
FSx Lustre achieved up to 150GB/s per client throughput by integrating with EFA.
S3 bucket limits have been increased from 100 to 1 million per account.
S3 Tables - Managed Iceberg Tables in S3
S3 Tables provide a first-class managed Iceberg table abstraction on top of S3.
Key features:
3x faster query performance and 10x higher throughput for Iceberg tables.
Table-level access control with IAM.
Automatic optimization and maintenance of table data.
S3 Tables integrate with other AWS analytics services like Firehose and QuickSight.
Customer Perspectives
Nubank, a digital bank, leveraged S3 extensively and had to develop strategies to manage the growing S3 bucket limits.
Roche, a healthcare company, is building a data platform on AWS to integrate and process vast amounts of unstructured healthcare data, using S3 as a unified data layer.
Key lessons from Roche:
Invest in strong data foundations, data lineage, and metadata governance.
Upskill people to leverage new data technologies.
Radically simplify architecture to enable scalability.
S3 Metadata - Automatic Metadata Indexing
S3 Metadata is a new feature that automatically indexes metadata about objects stored in S3 buckets.
It provides a queryable system table for discovering, querying, and tracking changes to object metadata.
This can enable richer metadata-driven applications on top of S3 data.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.