Open standard to allow AI assistants to access real-time data
Integrated MCP servers for AWS Glue, Amazon EMR, and Amazon Athena
Reduces integration complexity, provides AI-driven insights, and simplifies observability
Improving Data Pipeline Productivity
Auto-generating code and SQL statements using Amazon Q Developer
Building visual ETL pipelines with AI-driven prompt-based automation
Demo 1: Data Processing MCP Server
Onboarded diabetes dataset into AWS Glue Data Catalog
Configured and activated Data Processing MCP server in SageMaker Studio
Used Amazon Q Developer to generate a Jupyter notebook accessing the diabetes data via MCP server
Demo 2: Auto-Generating Visual ETL Pipelines
Used prompts to auto-generate a visual ETL pipeline to join customer behavior and customer dimension data
Performed transformations like type casting and column renaming
Aggregated data by state to calculate total page views and purchase amount
Enhancing AWS Data Processing Engines for AI Readiness
Challenges: High-volume data ingestion, data quality, identity/access control, integrated AI/ML environment, model training and inference
Identity and Access Control:
Trusted Identity Propagation for single sign-on and fine-grained permissions
S3 Access Grants and Lake Formation Full Table Access for data access control
Lake Formation Fine-Grained Access Control for column/row/cell-level security
Integrated AI/ML Experience:
SageMaker Notebooks for quick start with Spark-powered notebooks and AI-driven code assistance
Spark Upgrade Agent to automate upgrading Spark applications with error handling and data quality checks
Performance Enhancements:
4.4-4.5x faster Spark performance compared to open-source
2x better write performance with Iceberg
EMR Serverless Storage Provisioning for remote shuffle storage
Key Takeaways
Enterprises need to revisit their data foundation to make it AI-ready, addressing people, process, and technology challenges
Unlocking enterprise data for AI agents through RAG and MCP servers can improve AI model accuracy and flexibility
Automating data pipeline development with AI-driven code generation and visual ETL can boost productivity
AWS is enhancing its data processing services like EMR, Glue, and Athena to address identity, access, integration, and performance needs for AI workloads
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.