TalksAWS re:Invent 2025 - Agentic data engineering with AWS Analytics MCP Servers (ANT335)

AWS re:Invent 2025 - Agentic data engineering with AWS Analytics MCP Servers (ANT335)

Agentic Data Engineering with AWS Analytics MCP Servers

Overview

  • Presentation on using agentic AI and Model Context Protocol (MCP) to enhance data engineering productivity and capabilities
  • Covers challenges faced by data engineers, the agentic AI solution, and a demo of implementing agentic data engineering using AWS Analytics MCP Servers

Data Engineering Challenges

  • Frequent context switching between tasks like data discovery, job development, debugging, and monitoring
  • Reactive rather than proactive approach to data quality issues
  • Time lost optimizing pipeline performance and integrating new tools/services

Agentic AI Solution

  • Agentic loop: AI agent that can reason, take actions, and iterate to complete complex tasks
  • Key components:
    • Agent with short-term and long-term memory
    • Access to tools and data sources
    • Ability to plan, reflect, and self-critique
  • Integrates with tools via Model Context Protocol (MCP)
    • Universal language for AI agents to discover and interact with data/tools
    • Decouples agent from specific tool implementations

AWS Analytics MCP Servers

  • Provide agentic capabilities for common data engineering tasks across AWS services
  • Examples:
    • Data Processing MCP Server: Create Glue jobs, run Athena queries, etc.
    • Amazon Redshift MCP Server: List clusters, query data, etc.
    • Amazon MSK MCP Server: Create/manage Kafka clusters
  • Agents can leverage appropriate MCP Servers based on user's natural language requests

Agentic Data Engineering Demo

  1. Data Discovery: Agent uses S3 MCP tool to locate customer data files
  2. ETL Job Creation: Agent creates a Glue job, handling tasks like:
    • Identifying appropriate IAM role
    • Generating job code
    • Uploading job to S3
  3. Job Execution and Monitoring: Agent runs the Glue job, monitors for errors, and fixes issues
  4. Data Cataloging: Agent creates a Glue crawler to catalog the processed data
  5. Data Validation: Agent uses Athena MCP tool to run validation queries
  6. Data Loading: Agent creates a Redshift notebook to load data into the data warehouse

Key Takeaways

  • Agentic AI and MCP integration can significantly improve data engineering productivity and quality
  • Reduced context switching, proactive data quality, and automated best practices
  • Ability to scale agentic capabilities across the organization using tools like SageMaker Unified Studio and Agent Core
  • Importance of human-in-the-loop oversight and configuring appropriate MCP Servers/tools

Technical Details

  • Used Amazon Bedrock large language models as the AI agent
  • Integrated with various AWS Analytics services via MCP Servers:
    • Data Processing, Amazon Redshift, Amazon MSK, Amazon OpenSearch
  • Demonstrated capabilities in tools like SageMaker Unified Studio and Kiro

Business Impact

  • Faster time-to-insight for business stakeholders by automating data pipelines
  • Improved data quality and reliability through proactive monitoring and validation
  • Ability to scale data engineering capabilities across the organization

Examples

  • Any Company Retail: Fictitious retailer looking to transform customer experience using data
  • Specific use case of building a customer data pipeline, from data discovery to loading into a data warehouse

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.