TalksAWS re:Invent 2025 - Building resilient multi-Region applications with Capital One (ARC404)

AWS re:Invent 2025 - Building resilient multi-Region applications with Capital One (ARC404)

Building Resilient Multi-Region Applications with Capital One

Motivation for Multi-Region Architectures

  • Many applications require geographic separation and disaster recovery capabilities due to regulatory and compliance requirements, especially in industries like finance, healthcare, and government.
  • Critical applications with high availability needs also benefit from multi-region architectures to provide predictable, bounded recovery during regional disruptions.
  • Capital One has evolved their approach from treating multi-region as a specialized DR capability to making it a fundamental part of their application architecture.

Identifying and Managing Dependencies

  • Dependencies, whether on AWS services, internal systems, third-party services, or configuration requirements, can ruin recovery plans if not properly identified and accounted for.
  • Dependency identification is challenging as they are often "out of sight, out of mind" and not reflected in application architecture diagrams.
  • Capital One uses a mental model of "runtime recovery dependencies" and "runtime dependencies" to systematically identify and manage dependencies.
    • AWS Capabilities in AWS Builder Center helps identify service and feature availability across regions.
    • Dependency chain mapping and network traffic blocking during failover testing uncovers hidden dependencies.
    • Centralizing platform capabilities and automating best practices is key to managing dependencies at scale.

Reliable Recovery Orchestration

  • Manual recovery processes using runbooks or repurposed deployment pipelines are error-prone and lack visibility.
  • AWS Application Recovery Controller's "Region Switch" capability provides a fully managed, highly available orchestration service to automate regional failover.
    • Supports sequential and parallel execution of recovery steps like scaling compute, failing over databases, and updating DNS.
    • Performs continuous plan validation and executes recovery from the target region to ensure reliability.
    • Provides flexibility to incorporate custom recovery actions through Lambda functions.
  • Capital One's journey evolved from slow, manual processes to fully automated, dependency-aware recovery workflows that reduced average recovery time by 70%.

Ensuring Data Consistency

  • Data consistency is a key challenge in multi-region architectures, with the CAP theorem dictating trade-offs between consistency and availability.
  • Initial solutions using Aurora Global Database and async DynamoDB replication favored availability over consistency.
  • Aurora DSQL and DynamoDB Multi-Region Strong Consistency enable true active-active architectures with synchronous data replication across regions.
    • Requires a third "witness" region but provides stronger consistency guarantees.
    • Reduces availability during network partitions but ensures data integrity.
  • Practicing recovery under realistic conditions, not just ideal scenarios, is crucial for validating data consistency.

Key Takeaways

  1. Identify and manage dependencies, both runtime recovery and runtime, through testing, analysis, and automation.
  2. Implement reliable, automated recovery orchestration to reduce errors and improve recovery time.
  3. Choose data consistency strategies carefully, leveraging services like Aurora DSQL and DynamoDB Multi-Region Strong Consistency.
  4. Practice recovery under realistic conditions to validate end-to-end resilience.

Resources

  • Multi-Region Fundamentals whitepaper (QR code provided)
  • AWS Fault Injection Service for cross-region connectivity testing
  • AWS Application Recovery Controller Region Switch capability
  • AWS Builder Center's AWS Capabilities for service/feature availability visibility
  • Capital One's in-house recovery automation tooling

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.