TalksAWS re:Invent 2025 - Scaling Amazon Redshift with a multi-warehouse architecture (ANT318)

AWS re:Invent 2025 - Scaling Amazon Redshift with a multi-warehouse architecture (ANT318)

Scaling Amazon Redshift with a Multi-Warehouse Architecture

Overview of Multi-Warehouse Architectures

  • Addresses challenges of workload interference and resource contention in monolithic data warehouse architectures
  • Introduces two key design patterns:
    1. Hub and Spoke: Separate compute clusters for different workloads (e.g. streaming, batch, analytics, data science) with a centralized data store
    2. Data Mesh: Separate compute clusters and data ownership for different business units/teams, with controlled data sharing

Key Features of Multi-Warehouse Architectures

Redshift Managed Storage and Compute

  • Redshift Managed Storage provides highly optimized columnar storage for analytics
  • Hybrid compute model using a mix of provisioned and serverless Redshift clusters
  • Ability to mix and match provisioned and serverless clusters based on workload needs

Federated Permissions Management

  • Centralized management of fine-grained access control policies across multiple Redshift clusters
  • Policies tied to user identity and data sovereignty requirements

Integration with Data Lake and Ecosystem

  • Ability to query data in Redshift and open table formats like Apache Iceberg
  • 2x performance improvements for Iceberg queries using Redshift Serverless
  • Iceberg write support for append-only workloads
  • Integration with SageMaker Unified Studio for end-to-end data and AI workflows

AI-Powered Use Cases

  • Natural language querying of Redshift data using Amazon Bedrock
  • Embedding Redshift data as knowledge bases for generative AI applications
  • Integration with AWS MCP (Model Context Protocol) for AI orchestration

Vanguard's Journey with Multi-Warehouse Architectures

  • Started with a centralized data warehouse on Redshift, unlocking BI and analytics use cases
  • Faced challenges with resource contention, workload management complexity, and scaling as data and use cases grew
  • Transitioned to a multi-warehouse "hub and spoke" architecture:
    • Separate Redshift clusters for ETL, analytics, and data science workloads
    • Improved SLAs, analyst experience, and workload isolation
  • Moving towards a "data mesh" architecture:
    • Separate data ownership and compute for different business domains
    • Leveraging Iceberg tables and Redshift Serverless for increased agility

Key Lessons and Best Practices

  • Start simple and gradually evolve the architecture as needs grow
  • Collaborate with AWS solution architects to identify and adopt new features
  • Track key metrics like active users, storage, costs, and query performance
  • Embrace a flexible, multi-layered architecture to meet diverse and evolving business requirements

Technical Details and Metrics

  • Vanguard's data landscape:
    • 20TB in Redshift Managed Storage
    • 150TB in S3 data lake
    • 600 tables, 400 views, 100 active users
    • 500,000+ queries per month, powered by thousands of batch jobs
  • Redshift Serverless instances used for workload isolation and improved performance
  • Apache Iceberg used as the open table format for the data lake

Business Impact

  • Enabled new use cases like comparative product analysis, leading to new product offerings
  • Improved analyst experience and self-service analytics capabilities
  • Increased agility and reduced coordination overhead through the data mesh architecture
  • Ensured business-critical workloads (ETL, reporting) are isolated from ad-hoc queries and AI use cases

Example Use Cases

  • Ingesting real-time sales data from an Oracle database using Zero ETL integration
  • Combining data from the data warehouse and data lake (Iceberg) for reporting
  • Exposing Redshift data as a knowledge base for generative AI applications using Amazon Bedrock

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.