TalksAWS re:Invent 2025 - Deep dive into databases zero-ETL integrations (DAT445)
AWS re:Invent 2025 - Deep dive into databases zero-ETL integrations (DAT445)
AWS re:Invent 2025 - Deep Dive into Databases Zero-ETL Integrations
Introduction
The presentation covers the business use cases and technical details of AWS's "Zero ETL" integrations, which aim to simplify the process of moving transactional data from operational databases into analytical systems.
The speaker, Dave Gardner, is a Database Specialist Solution Architect at AWS with 9 years of experience.
The session is targeted at data architects, data scientists, and ETL programmers who are responsible for building data pipelines and enabling analytics on operational data.
Business Use Cases
Bringing transactional data into analytical systems enables key business insights and value, such as:
Customer relationship management
Fraud detection
Gamer leaderboards
Inventory optimization
Sentiment analysis
Product insights and sales
These use cases require mining the "gold" in transactional data to drive business decisions and improve customer experience.
Architecture Overview
The presentation outlines a "Command Query Response Segregation" (CQRS) architecture:
Transactional systems (e.g. DynamoDB, DocumentDB, RDS) are optimized for high availability, performance, and reliability to support peak workloads.
Analytical systems (e.g. S3, Redshift, OpenSearch) are used to extract, enrich, and analyze the transactional data.
The complexity and fragility of the data pipeline between these two systems is where AWS Zero ETL aims to simplify the process.
AWS Zero ETL Integrations
AWS Zero ETL provides a managed service to make it easier to set up data pipelines from operational databases to analytical systems.
Key features:
Simple setup and configuration
Secure and easy way to enable analytics on transactional data
Offloads the "undifferentiated heavy lifting" of managing data pipelines
DynamoDB as a Source
Challenges with moving DynamoDB data to analytical systems:
Mapping NoSQL data structures to columnar databases
Handling single-table designs with mixed data types
AWS Zero ETL for DynamoDB:
Requires enabling point-in-time recovery and a resource policy on the DynamoDB table
Provides options for data mapping, partitioning, and table naming when moving to Redshift or S3 data lake
Leverages DynamoDB streams and S3 exports to enable near-real-time data replication
Aurora as a Source
AWS Zero ETL for Aurora MySQL and PostgreSQL:
Utilizes direct parallel export from Aurora storage to Redshift storage
Enables continuous CDC (change data capture) replication using MySQL binlogs or PostgreSQL logical replication
Provides data freshness in the order of seconds between Aurora and Redshift
Other RDS Engines
AWS Zero ETL supports other RDS engines like Oracle, SQL Server, and PostgreSQL running on-premises or on EC2.
Uses database-specific replication mechanisms like redo log mining for Oracle.
Enables bringing data from self-managed databases into AWS analytical services.
DocumentDB to OpenSearch
AWS Zero ETL can also integrate DocumentDB NoSQL data with OpenSearch for real-time search and analytics.
Requires an intermediate S3 bucket and custom JSON mapping between the document collections and OpenSearch indices.
Monitoring and Management
AWS Zero ETL provides visibility and monitoring through CloudWatch:
Pipeline status (creating, modifying, syncing, needs attention)
Metrics on data throughput, record counts, errors, and latency
Specific monitoring for OpenSearch pipelines, tracking OCU (OpenSearch Compute Unit) utilization.
Enables proactive alerting and troubleshooting of data pipeline issues.
Key Benefits
Simplifies the complexity and fragility of traditional ETL pipelines:
Automatically handles schema changes in source databases
Provides reliable, near-real-time data replication to analytical systems
Allows data teams to focus on value-added analytics and business insights, rather than managing data pipelines.
Continues to expand support for additional data sources and targets based on customer feedback.
Conclusion
AWS Zero ETL integrations aim to make it easier for customers to unlock the value of their transactional data by simplifying the process of moving it to analytical systems.
The service provides a managed, secure, and performant way to enable advanced analytics, machine learning, and business intelligence on operational data.
Customers can leverage Zero ETL to accelerate their data-driven initiatives and focus on high-impact use cases, rather than managing complex data pipelines.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.