Talks AWS re:Invent 2025 - Scaling Amazon Redshift with a multi-warehouse architecture (ANT318) VIDEO
AWS re:Invent 2025 - Scaling Amazon Redshift with a multi-warehouse architecture (ANT318) Scaling Amazon Redshift with a Multi-Warehouse Architecture
Overview of Multi-Warehouse Architectures
Addresses challenges of workload interference and resource contention in monolithic data warehouse architectures
Introduces two key design patterns:
Hub and Spoke : Separate compute clusters for different workloads (e.g. streaming, batch, analytics, data science) with a centralized data store
Data Mesh : Separate compute clusters and data ownership for different business units/teams, with controlled data sharing
Key Features of Multi-Warehouse Architectures
Redshift Managed Storage and Compute
Redshift Managed Storage provides highly optimized columnar storage for analytics
Hybrid compute model using a mix of provisioned and serverless Redshift clusters
Ability to mix and match provisioned and serverless clusters based on workload needs
Federated Permissions Management
Centralized management of fine-grained access control policies across multiple Redshift clusters
Policies tied to user identity and data sovereignty requirements
Integration with Data Lake and Ecosystem
Ability to query data in Redshift and open table formats like Apache Iceberg
2x performance improvements for Iceberg queries using Redshift Serverless
Iceberg write support for append-only workloads
Integration with SageMaker Unified Studio for end-to-end data and AI workflows
AI-Powered Use Cases
Natural language querying of Redshift data using Amazon Bedrock
Embedding Redshift data as knowledge bases for generative AI applications
Integration with AWS MCP (Model Context Protocol) for AI orchestration
Vanguard's Journey with Multi-Warehouse Architectures
Started with a centralized data warehouse on Redshift, unlocking BI and analytics use cases
Faced challenges with resource contention, workload management complexity, and scaling as data and use cases grew
Transitioned to a multi-warehouse "hub and spoke" architecture:
Separate Redshift clusters for ETL, analytics, and data science workloads
Improved SLAs, analyst experience, and workload isolation
Moving towards a "data mesh" architecture:
Separate data ownership and compute for different business domains
Leveraging Iceberg tables and Redshift Serverless for increased agility
Key Lessons and Best Practices
Start simple and gradually evolve the architecture as needs grow
Collaborate with AWS solution architects to identify and adopt new features
Track key metrics like active users, storage, costs, and query performance
Embrace a flexible, multi-layered architecture to meet diverse and evolving business requirements
Technical Details and Metrics
Vanguard's data landscape:
20TB in Redshift Managed Storage
150TB in S3 data lake
600 tables, 400 views, 100 active users
500,000+ queries per month, powered by thousands of batch jobs
Redshift Serverless instances used for workload isolation and improved performance
Apache Iceberg used as the open table format for the data lake
Business Impact
Enabled new use cases like comparative product analysis, leading to new product offerings
Improved analyst experience and self-service analytics capabilities
Increased agility and reduced coordination overhead through the data mesh architecture
Ensured business-critical workloads (ETL, reporting) are isolated from ad-hoc queries and AI use cases
Example Use Cases
Ingesting real-time sales data from an Oracle database using Zero ETL integration
Combining data from the data warehouse and data lake (Iceberg) for reporting
Exposing Redshift data as a knowledge base for generative AI applications using Amazon Bedrock
Your Digital Journey deserves a great story. Build one with us.