TalksAWS re:Invent 2025 - Accelerate analytics and AI w/ an open and secure lakehouse architecture-ANT309

AWS re:Invent 2025 - Accelerate analytics and AI w/ an open and secure lakehouse architecture-ANT309

Accelerating Analytics and AI with an Open and Secure Lakehouse Architecture

Overview

  • The presentation explored how to accelerate analytics and AI using an open and secure lakehouse architecture, featuring insights from AWS and a real-world implementation by Intuit.
  • Key topics included data foundations for AI, the benefits of lakehouse architecture, and technical capabilities like discoverability, consistent data access, and open scalable systems.

Building a Strong Data Foundation for AI

  • Agentic AI, the next step beyond generative AI, requires high-quality and accessible data as the foundation.
  • This involves:
    • Secure storage and organization of data from diverse sources and formats
    • Transforming raw data into valuable AI insights
    • Implementing data catalogs and governance for structure, trust, and responsible use

Addressing Data Fragmentation Challenges

  • In most enterprises, data is spread across many different systems with varying access patterns, security models, and APIs.
  • This fragmentation creates integration overhead, performance bottlenecks, inconsistent data quality and governance, and new attack vectors for sensitive data.
  • Achieving the speed, agility, and flexibility required for AI systems is difficult with this level of data fragmentation.

Lakehouse Architecture for Analytics and AI

  • The lakehouse architecture addresses three key challenges:
    1. Discoverability: Rich metadata management makes it easy for AI systems and developers to find the right data.
    2. Consistent Data Access: Centralized access control policies govern all data, ensuring consistent security and compliance.
    3. Open Scalable Systems: Support for open formats allows leveraging a broad range of tools without data conversion or duplication.

Components of the Lakehouse Architecture

  1. Amazon S3: Provides the foundational storage layer, with S3 Tables for optimized analytics workloads.
  2. Amazon Redshift: The data warehouse for structured, highly curated analytical data.
  3. AWS Glue Data Catalog: The central metadata layer that maintains schema, partition, and location information for both S3-based and Redshift tables.
  4. Apache Iceberg: Provides a standard way to read and write data, enabling compatibility across Apache-compatible engines.
  5. AWS Lake Formation: Enforces fine-grain access control, applying column and row-level filtering at query time.

Glue Data Catalog: The Backbone of the Lakehouse

  • Glue Data Catalog operates at unprecedented scale, managing hundreds of millions of tables and billions of requests weekly.
  • It provides a highly available, scalable, and durable central metadata repository, with open API compatibility and integration with AWS analytics services.
  • Glue Data Catalog enables a "self-describing, trusted data platform" where every dataset is discoverable, governed, and optimized for intelligent use.

Model Control Protocol (MCP) for AI-Driven Data Interaction

  • MCP is an open standard that allows AI agents to directly interact with the data catalog using natural language requests.
  • MCP extends the Lake Formation security policies to protect AI access, ensuring consistent data governance.
  • The demo showcased how MCP enables AI-driven data exploration, schema management, and sentiment analysis while respecting access controls.

Integrating Data from Diverse Sources

  • AWS offers three complementary approaches to bring data into the lakehouse:
    1. Zero ETL Integrations: Automatic and continuous replication of data from operational databases, S3, and other sources.
    2. Query Federation: On-demand access to data across multiple source systems without moving the data.
    3. Catalog Federation: Centralized discovery and secure access to Iceberg tables stored in remote catalogs.

Optimizing Lakehouse Performance and Cost with Iceberg

  • Apache Iceberg provides ACID compliance, scalable metadata handling, schema evolution, and time travel capabilities.
  • AWS announced support for Iceberg v3, including deletion vectors for improved query performance and row lineage tracking for audit and impact analysis.

Glue Data Catalog Primitives for Iceberg Table Management

  • Automated compaction and snapshot retention policies to maintain performance and cost-efficiency at scale.
  • Materialized views for simplified data transformation, with Spark SQL compatibility and automatic query rewriting.

Comprehensive Security and Governance Controls

  • Fine-grain access control at the table, column, and row level, with support for tag-based and attribute-based policies.
  • Best-in-class authentication, encryption, auditing, and compliance certifications for regulated industries.

Intuit's Lakehouse Journey

  • Intuit, a financial technology company, is building an AI-driven expert platform to provide money management solutions for consumers and businesses.
  • Intuit's data landscape includes 300,000+ tables, 70,000+ data pipelines, 2,500+ users, and over 200 PB of data, with technical heterogeneity and complex compliance requirements.
  • Intuit adopted a dual-native catalog approach, federating AWS Glue Data Catalog and Databricks Unity Catalog, with a unified control plane for metadata management, compliance, and access control.
  • Key features Intuit leveraged include Iceberg table format, fine-grain access controls, and automated data tagging and policy management.

Key Takeaways

  • Lakehouse architecture combines the benefits of data warehouses and data lakes, providing performance, flexibility, and openness.
  • AWS Glue Data Catalog is the backbone of the lakehouse, enabling a self-describing, trusted data platform at scale.
  • MCP allows AI agents to directly interact with the data catalog using natural language, while respecting security policies.
  • Iceberg table format and Glue Data Catalog primitives optimize lakehouse performance, cost, and manageability.
  • Comprehensive security and governance controls, including fine-grain access policies and compliance certifications, enable lakehouse adoption in regulated industries.
  • Intuit's real-world implementation demonstrates the practical application of lakehouse architecture to power an AI-driven expert platform at scale.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.