TalksAWS re:Invent 2025 - Accelerate gen AI and ML workloads with AWS storage (STG201)

AWS re:Invent 2025 - Accelerate gen AI and ML workloads with AWS storage (STG201)

Accelerating Gen AI and ML Workloads with AWS Storage

Improving Productivity with Prompt Engineering

  • Prompt engineering allows providing examples, context, and constraints to guide large language model (LLM) responses
  • Can significantly improve productivity by automating tasks like creating PR FAQ documents
  • Challenges include scaling prompt engineering to handle large volumes of data and multiple data sources

Leveraging Retrieval Augmented Generation (RAG)

  • RAG uses semantic search to find and return relevant data from a data lake to augment the original prompt
  • Converts data into vectors to enable efficient semantic search
  • Allows LLMs to access relevant data without manually loading everything into the prompt

Optimizing RAG with Metadata Filtering

  • Metadata provides context, lineage, and classification for data
  • Enables more targeted and effective RAG searches by narrowing the search space
  • Metadata becomes the "nervous system" for the entire AI operation

Introducing Agents for Complex Tasks

  • Agents combine LLMs with access to tools like databases, knowledge bases, and APIs
  • Allows LLMs to reason through multi-step workflows and leverage diverse data sources
  • Requires managing tool integrations, which is enabled by the Model Context Protocol (MCP)

Updating Models with Labeled Data

  • Supervised fine-tuning: Provides labeled examples of desired inputs and outputs to guide model updates
  • Distillation: Uses a larger "teacher" model to generate outputs that are then used to train a smaller "student" model
  • Alignment: Provides examples of preferred and non-preferred responses to shape model behavior

Continued Pre-training and Training from Scratch

  • Continued pre-training embeds new knowledge, tone, and relationships into an existing model
  • Training from scratch is required when the existing model structure is fundamentally different from the desired application
  • Efficient data loading and checkpointing to storage are critical for the resource-intensive training process

AWS Storage Solutions for ML Workloads

  • Amazon FSx for Luster and Open ZFS provide high-performance, scalable file storage for research and development
  • Amazon S3 Express One Zone offers low-latency, high-throughput object storage optimized for ML training
  • Amazon S3 Mount Point and S3 Connector for PyTorch provide file-like interfaces for efficient data access

Business Impact and Examples

  • Biotech firm used S3 Vectors for semantic search on 30 million scientific papers, dramatically reducing research timelines
  • Meta worked with AWS to scale S3 Express One Zone to 140 Tbps of throughput for large-scale model training
  • Customers can leverage AWS storage services to build scalable, cost-effective AI pipelines for diverse use cases

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.