Talks AWS re:Invent 2025 - Build an AI-ready data foundation (ANT304) VIDEO
AWS re:Invent 2025 - Build an AI-ready data foundation (ANT304) Building an AI-Ready Data Foundation on AWS
Defining Data Foundation for Data Analytics
Data foundation for data analytics refers to having access to data tools that enable:
Data storage
Data processing
Data integration
Data governance
These foundational constructs come together to form a typical data pipeline
The key is being able to leverage this existing data foundation to build AI applications
Evolving Data Needs for AI Applications
AI applications have evolved from simple rule-based assistants to autonomous multi-agent systems
This evolution impacts the data pipeline in several ways:
Need for additional unstructured data sources
Advanced data processing requirements for training, inference, and context management
Capturing real-time user feedback and personalization
Comprehensive data governance across the entire pipeline
Leveraging the AI-Ready Data Foundation on AWS
AWS MCP Servers
MCP (Model-Connected Prompts) servers provide a conversational layer between AI models and tools
Enables true agent autonomy by allowing the model to determine what tools to use
Replaces the need for custom API integrations
Examples used in the demo:
Redshift MCP server for natural language access to Redshift data
S3 Tables MCP server for querying tabular data in S3
Context Management
LLMs have limited context windows and are stateless, requiring careful context management
Context includes instruction sets, knowledge bases, and data from tools/MCP servers
Agentic memory using services like Amazon Bedrock Agent Core can retain context
Knowledge bases using Amazon OpenSearch as a vector store provide semantic search capabilities
Event-Driven Architectures for Multi-Agent Systems
Multiple loosely coupled agents working asynchronously can improve resiliency and scalability
Amazon MSK (Managed Streaming for Apache Kafka) enables event-driven architectures with real-time data exchange
Allows agents to respond to patterns and scenario changes independently
Ensuring Data Readiness for AI
Metadata and Data Governance
Metadata is critical for AI agents to understand and utilize data effectively
SageMaker provides automated metadata generation, data quality scoring, and lineage tracking
Metadata can be published back to S3 for agents to directly access
SageMaker Unified Studio
Provides a single interface for data engineers, scientists, and analysts to work with data and AI
Includes features like one-click onboarding of data sets and a polyglot notebook with built-in data agents
Data agents can automatically generate code, fix errors, and provide visualizations
Key Takeaways
Existing data foundation on AWS can be leveraged to build AI-ready applications
Critical components include MCP servers, context management, event-driven architectures, and data readiness
SageMaker provides a unified platform to manage data, models, and AI agents
Automated metadata generation and data governance are crucial for enabling AI agents to effectively utilize data
Your Digital Journey deserves a great story. Build one with us.