TalksAWS re:Invent 2025 - Build an AI-ready data foundation (ANT304)

AWS re:Invent 2025 - Build an AI-ready data foundation (ANT304)

Building an AI-Ready Data Foundation on AWS

Defining Data Foundation for Data Analytics

  • Data foundation for data analytics refers to having access to data tools that enable:
    • Data storage
    • Data processing
    • Data integration
    • Data governance
  • These foundational constructs come together to form a typical data pipeline
  • The key is being able to leverage this existing data foundation to build AI applications

Evolving Data Needs for AI Applications

  • AI applications have evolved from simple rule-based assistants to autonomous multi-agent systems
  • This evolution impacts the data pipeline in several ways:
    • Need for additional unstructured data sources
    • Advanced data processing requirements for training, inference, and context management
    • Capturing real-time user feedback and personalization
    • Comprehensive data governance across the entire pipeline

Leveraging the AI-Ready Data Foundation on AWS

AWS MCP Servers

  • MCP (Model-Connected Prompts) servers provide a conversational layer between AI models and tools
  • Enables true agent autonomy by allowing the model to determine what tools to use
  • Replaces the need for custom API integrations
  • Examples used in the demo:
    • Redshift MCP server for natural language access to Redshift data
    • S3 Tables MCP server for querying tabular data in S3

Context Management

  • LLMs have limited context windows and are stateless, requiring careful context management
  • Context includes instruction sets, knowledge bases, and data from tools/MCP servers
  • Agentic memory using services like Amazon Bedrock Agent Core can retain context
  • Knowledge bases using Amazon OpenSearch as a vector store provide semantic search capabilities

Event-Driven Architectures for Multi-Agent Systems

  • Multiple loosely coupled agents working asynchronously can improve resiliency and scalability
  • Amazon MSK (Managed Streaming for Apache Kafka) enables event-driven architectures with real-time data exchange
  • Allows agents to respond to patterns and scenario changes independently

Ensuring Data Readiness for AI

Metadata and Data Governance

  • Metadata is critical for AI agents to understand and utilize data effectively
  • SageMaker provides automated metadata generation, data quality scoring, and lineage tracking
  • Metadata can be published back to S3 for agents to directly access

SageMaker Unified Studio

  • Provides a single interface for data engineers, scientists, and analysts to work with data and AI
  • Includes features like one-click onboarding of data sets and a polyglot notebook with built-in data agents
  • Data agents can automatically generate code, fix errors, and provide visualizations

Key Takeaways

  • Existing data foundation on AWS can be leveraged to build AI-ready applications
  • Critical components include MCP servers, context management, event-driven architectures, and data readiness
  • SageMaker provides a unified platform to manage data, models, and AI agents
  • Automated metadata generation and data governance are crucial for enabling AI agents to effectively utilize data

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.