TalksAWS re:Invent 2025 - Deep dive into databases zero-ETL integrations (DAT445)

AWS re:Invent 2025 - Deep dive into databases zero-ETL integrations (DAT445)

AWS re:Invent 2025 - Deep Dive into Databases Zero-ETL Integrations

Introduction

  • The presentation covers the business use cases and technical details of AWS's "Zero ETL" integrations, which aim to simplify the process of moving transactional data from operational databases into analytical systems.
  • The speaker, Dave Gardner, is a Database Specialist Solution Architect at AWS with 9 years of experience.
  • The session is targeted at data architects, data scientists, and ETL programmers who are responsible for building data pipelines and enabling analytics on operational data.

Business Use Cases

  • Bringing transactional data into analytical systems enables key business insights and value, such as:
    • Customer relationship management
    • Fraud detection
    • Gamer leaderboards
    • Inventory optimization
    • Sentiment analysis
    • Product insights and sales
  • These use cases require mining the "gold" in transactional data to drive business decisions and improve customer experience.

Architecture Overview

  • The presentation outlines a "Command Query Response Segregation" (CQRS) architecture:
    • Transactional systems (e.g. DynamoDB, DocumentDB, RDS) are optimized for high availability, performance, and reliability to support peak workloads.
    • Analytical systems (e.g. S3, Redshift, OpenSearch) are used to extract, enrich, and analyze the transactional data.
  • The complexity and fragility of the data pipeline between these two systems is where AWS Zero ETL aims to simplify the process.

AWS Zero ETL Integrations

  • AWS Zero ETL provides a managed service to make it easier to set up data pipelines from operational databases to analytical systems.
  • Key features:
    • Simple setup and configuration
    • Secure and easy way to enable analytics on transactional data
    • Offloads the "undifferentiated heavy lifting" of managing data pipelines

DynamoDB as a Source

  • Challenges with moving DynamoDB data to analytical systems:
    • Mapping NoSQL data structures to columnar databases
    • Handling single-table designs with mixed data types
  • AWS Zero ETL for DynamoDB:
    • Requires enabling point-in-time recovery and a resource policy on the DynamoDB table
    • Provides options for data mapping, partitioning, and table naming when moving to Redshift or S3 data lake
    • Leverages DynamoDB streams and S3 exports to enable near-real-time data replication

Aurora as a Source

  • AWS Zero ETL for Aurora MySQL and PostgreSQL:
    • Utilizes direct parallel export from Aurora storage to Redshift storage
    • Enables continuous CDC (change data capture) replication using MySQL binlogs or PostgreSQL logical replication
    • Provides data freshness in the order of seconds between Aurora and Redshift

Other RDS Engines

  • AWS Zero ETL supports other RDS engines like Oracle, SQL Server, and PostgreSQL running on-premises or on EC2.
  • Uses database-specific replication mechanisms like redo log mining for Oracle.
  • Enables bringing data from self-managed databases into AWS analytical services.

DocumentDB to OpenSearch

  • AWS Zero ETL can also integrate DocumentDB NoSQL data with OpenSearch for real-time search and analytics.
  • Requires an intermediate S3 bucket and custom JSON mapping between the document collections and OpenSearch indices.

Monitoring and Management

  • AWS Zero ETL provides visibility and monitoring through CloudWatch:
    • Pipeline status (creating, modifying, syncing, needs attention)
    • Metrics on data throughput, record counts, errors, and latency
  • Specific monitoring for OpenSearch pipelines, tracking OCU (OpenSearch Compute Unit) utilization.
  • Enables proactive alerting and troubleshooting of data pipeline issues.

Key Benefits

  • Simplifies the complexity and fragility of traditional ETL pipelines:
    • Automatically handles schema changes in source databases
    • Provides reliable, near-real-time data replication to analytical systems
  • Allows data teams to focus on value-added analytics and business insights, rather than managing data pipelines.
  • Continues to expand support for additional data sources and targets based on customer feedback.

Conclusion

  • AWS Zero ETL integrations aim to make it easier for customers to unlock the value of their transactional data by simplifying the process of moving it to analytical systems.
  • The service provides a managed, secure, and performant way to enable advanced analytics, machine learning, and business intelligence on operational data.
  • Customers can leverage Zero ETL to accelerate their data-driven initiatives and focus on high-impact use cases, rather than managing complex data pipelines.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.