TalksAWS re:Invent 2025 - Accelerating data engineering with AI Agents for AWS Analytics (ANT215)

AWS re:Invent 2025 - Accelerating data engineering with AI Agents for AWS Analytics (ANT215)

Accelerating Data Engineering with AI Agents for AWS Analytics

Overview of AI Agents for AWS Analytics

  • Data teams spend 60-70% of their time on undifferentiated tasks like upgrades, data preparation, and troubleshooting
  • Existing AI coding assistants lack awareness of the user's data, resources, and environment, leading to code that doesn't work
  • AWS is introducing AI agents to solve these problems directly within analytics workflows

Guiding Principles for AI Agents in AWS Analytics

  1. Domain-Specific Agents: Agents tailored to specific problems in areas like Spark and SQL
  2. Role-Specific Experiences: Agents adapted to the needs of data engineers, data scientists, and software engineers
  3. Multi-Agent Ecosystem: Agents that can work together to solve end-to-end analytical workflows
  4. MCP-Based Interoperability: Agents that integrate with users' existing IDEs and tools

Launched AI Agents

1. Apache Spark Upgrade Agent

  • Industry's first automated Spark upgrade agent
  • Supports upgrading Spark applications from 2.4 to 3.5, with 4.0 support coming soon
  • Operates through an MCP-based remote server that integrates with users' IDEs
  • Handles the entire upgrade process:
    1. Planning and orchestration
    2. Dependency management
    3. Code modifications to address breaking changes
    4. Data quality testing to ensure output consistency

2. SageMaker Data Agent

  • Persona-specific agent for data engineers, data scientists, and analysts
  • Aware of the user's business data and catalog
  • Writes queries and code that can run without modification
  • Provides multi-step planning for complex tasks like building ML pipelines
  • Includes a Spark troubleshooting agent to resolve issues
  • Enforces security guardrails to prevent destructive actions

Demo: Building a Customer Lifetime Value Prediction Model

  • The agent is used by a data scientist to build an ML model to predict customer lifetime value (LTV)
  • Key steps:
    1. Discovering and exploring the relevant data tables
    2. Analyzing the impact of customer satisfaction on LTV trends
    3. Preparing the data for a linear regression model
    • Identifying categorical features
    • Applying one-hot encoding
    1. Training the model and evaluating its performance
    2. Visualizing the predicted LTV and feature importance

Business Impact and Applications

  • AI agents can significantly reduce the time and effort required for common data engineering tasks
  • Agents can scale to handle growing data volumes and complexity, addressing the "capacity crunch" faced by data teams
  • The multi-agent ecosystem and MCP-based integration enable seamless workflows across different tools and platforms
  • Specific use cases include:
    • Upgrading Spark applications with minimal effort
    • Building data pipelines and running queries on various data sources
    • Developing and deploying machine learning models

Key Takeaways

  • AWS is introducing a suite of AI agents to address the challenges faced by data teams, including workflow complexity, knowledge gaps, and capacity constraints
  • The agents are designed with guiding principles of domain-specificity, role-specific experiences, multi-agent collaboration, and MCP-based interoperability
  • The launched agents, such as the Spark Upgrade Agent and SageMaker Data Agent, demonstrate the capabilities of this new approach
  • AI agents can significantly accelerate data engineering tasks, improve productivity, and enable data teams to focus on higher-value work

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.