High-performance storage for AI/ML, analytics, and HPC workloads (STG327)

Amazon S3 for Data Lakes

Key Takeaways:

  1. Amazon S3 is the de-facto cloud object storage solution that offers unlimited scalability and dynamic scaling based on workload, without any pre-provisioning or capacity management.
  2. S3 is particularly useful for large-scale data lakes with varying access patterns and long-living, growing data volumes due to its cost-effective storage classes (S3 Standard, Infrequent Access, Glacier Instant Retrieval).
  3. To get the most out of S3 for data lakes:
    • Work backwards from your workload performance requirements and compute/network resources to determine the required throughput.
    • Use multi-value answer DNS technology and short-lived DNS cache TTL to ensure a well-distributed workload across S3 prefixes.
    • Design thoughtful prefix strategies to avoid accidentally sharing prefixes and ensure independent scaling.
  4. For extremely hot and frequently accessed data, S3 Express One Zone directory buckets provide highly consistent, single-digit millisecond first byte latency and reduced request costs.
  5. A real-world example using ClickHouse showed a 283% increase in query performance and 65% reduction in total cost of ownership by using the two-tier storage approach with S3 Express One Zone and S3 Standard.

Lift and Shift File Storage Customers

Key Takeaways:

  1. File systems are a familiar way to access data, support POSIX compliance and file sharing, and provide high performance for high-performance workloads.
  2. Amazon FSx for Lustre is a service that provides the high-performance capabilities of Lustre without the need to manage the storage system.
  3. FSx for Lustre focuses on providing low latency access to data by:
    • Using SSD storage
    • Providing a single network hop between clients and servers
    • Leveraging Lustre's caching capabilities
  4. For throughput and IOPS optimization:
    • Provision the right throughput tier based on your workload needs.
    • Leverage the new metadata IOPS feature to independently scale metadata performance if needed.
    • Enable Elastic Fabric Adapter (EFA) support to bypass the OS kernel and achieve up to 1200 Gbps throughput per instance.
  5. Customer example: Shell was able to increase their GPU utilization from 90% to 100% by using FSx for Lustre and EC2 in the cloud.

Accessing S3 Data Lakes through File Systems

Key Takeaways:

  1. Customers may want file access to S3 data lakes for two reasons:
    • To use file-based tools and applications
    • To leverage the performance characteristics of file systems
  2. Two solutions are available:
    • Mountpoint for Amazon S3: A FUSE client that provides a file system interface to S3 buckets.
    • FSx for Lustre integration with S3: Allows you to mount an S3 bucket as a Lustre file system.
  3. Mountpoint is suitable for workloads that need file capabilities but don't require full file system semantics or high performance.
  4. FSx for Lustre integration provides a more "first-class" file experience with full POSIX semantics and performance benefits.
  5. Caching features are available for both solutions to improve performance, including a new shared cache using S3 Express One Zone.
  6. Customer example: LG AI Research used a combination of S3 for storage and FSx for Lustre for high-performance training to build a foundational AI model.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us