TalksAWS re:Invent 2025 - Accelerating data engineering with AI Agents for AWS Analytics (ANT215)
AWS re:Invent 2025 - Accelerating data engineering with AI Agents for AWS Analytics (ANT215)
Accelerating Data Engineering with AI Agents for AWS Analytics
Overview of AI Agents for AWS Analytics
Data teams spend 60-70% of their time on undifferentiated tasks like upgrades, data preparation, and troubleshooting
Existing AI coding assistants lack awareness of the user's data, resources, and environment, leading to code that doesn't work
AWS is introducing AI agents to solve these problems directly within analytics workflows
Guiding Principles for AI Agents in AWS Analytics
Domain-Specific Agents: Agents tailored to specific problems in areas like Spark and SQL
Role-Specific Experiences: Agents adapted to the needs of data engineers, data scientists, and software engineers
Multi-Agent Ecosystem: Agents that can work together to solve end-to-end analytical workflows
MCP-Based Interoperability: Agents that integrate with users' existing IDEs and tools
Launched AI Agents
1. Apache Spark Upgrade Agent
Industry's first automated Spark upgrade agent
Supports upgrading Spark applications from 2.4 to 3.5, with 4.0 support coming soon
Operates through an MCP-based remote server that integrates with users' IDEs
Handles the entire upgrade process:
Planning and orchestration
Dependency management
Code modifications to address breaking changes
Data quality testing to ensure output consistency
2. SageMaker Data Agent
Persona-specific agent for data engineers, data scientists, and analysts
Aware of the user's business data and catalog
Writes queries and code that can run without modification
Provides multi-step planning for complex tasks like building ML pipelines
Includes a Spark troubleshooting agent to resolve issues
Enforces security guardrails to prevent destructive actions
Demo: Building a Customer Lifetime Value Prediction Model
The agent is used by a data scientist to build an ML model to predict customer lifetime value (LTV)
Key steps:
Discovering and exploring the relevant data tables
Analyzing the impact of customer satisfaction on LTV trends
Preparing the data for a linear regression model
Identifying categorical features
Applying one-hot encoding
Training the model and evaluating its performance
Visualizing the predicted LTV and feature importance
Business Impact and Applications
AI agents can significantly reduce the time and effort required for common data engineering tasks
Agents can scale to handle growing data volumes and complexity, addressing the "capacity crunch" faced by data teams
The multi-agent ecosystem and MCP-based integration enable seamless workflows across different tools and platforms
Specific use cases include:
Upgrading Spark applications with minimal effort
Building data pipelines and running queries on various data sources
Developing and deploying machine learning models
Key Takeaways
AWS is introducing a suite of AI agents to address the challenges faced by data teams, including workflow complexity, knowledge gaps, and capacity constraints
The agents are designed with guiding principles of domain-specificity, role-specific experiences, multi-agent collaboration, and MCP-based interoperability
The launched agents, such as the Spark Upgrade Agent and SageMaker Data Agent, demonstrate the capabilities of this new approach
AI agents can significantly accelerate data engineering tasks, improve productivity, and enable data teams to focus on higher-value work
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.