TalksAWS re:Invent 2025 - Architecting resilient multicloud operations, feat. Monzo Bank (HMC201)

AWS re:Invent 2025 - Architecting resilient multicloud operations, feat. Monzo Bank (HMC201)

Architecting Resilient Multicloud Operations: Lessons from Monzo Bank

Understanding Multicloud and Resilience

  • Multicloud refers to operating business applications in more than one cloud service provider (CSP)
  • Resilience is about preventing, mitigating, and recovering from failures as quickly as possible across the application stack
  • Resilience involves practices like high availability (HA) and disaster recovery (DR), as well as broader considerations around observability, automation, and alignment of people, processes, and technology

The SIMS Framework for Resilience

  • The SIMS framework outlines five key capabilities required for resilient workloads:
    1. Redundancy: Avoiding single points of failure
    2. Sufficient Capacity: Handling excessive load
    3. Timely Output: Preventing excessive latency
    4. Meaningful Behavior: Avoiding misconfiguration and bugs
    5. Fault Isolation: Preventing shared fate across boundaries

Multicloud Resilience Best Practices

  1. Leverage Fault Isolation Boundaries: CSPs provide inherent fault isolation, allowing failures to be contained within a single provider.
  2. Implement a "Lifeboat" Strategy: Deploy a minimal critical functionality in a secondary CSP to act as a backup when the primary fails.
  3. Understand Data Access Patterns: Evaluate how data is shared between the primary and secondary environments to manage load and latency.
  4. Avoid Single Points of Failure: Carefully design communication, CI/CD, security, and network components to prevent single points of failure.
  5. Test Extensively: Regularly test the system under load, measure latency, and validate behavior in both the primary and secondary environments.
  6. Align People, Processes, and Technology: Ensure resilience is embedded across the software development lifecycle, with clear roles, responsibilities, and documented procedures.

Monzo Bank's Multicloud Resilience Strategy

  • Monzo, a digital-only bank, built a "Monzo Standin" platform in Google Cloud as a secondary environment to their primary AWS platform.
  • Key features of Monzo's approach:
    • Monzo Standin is a simplified version of the primary application, focused on critical functionality like payments and account management.
    • Data is continuously synchronized from the primary platform to Monzo Standin using an event-driven architecture.
    • Monzo can automatically or manually route traffic to Monzo Standin in the event of an outage in the primary platform.
    • Monzo Standin is tested daily by enrolling a subset of real customers to use the platform, and by running shadow testing to compare decisions between the two environments.
    • Monzo Standin can directly connect to payment networks if the primary platform is unavailable, avoiding a single point of failure.

Lessons and Considerations

  • Monzo's approach reduced the cost of maintaining Monzo Standin to only 1% of the primary platform, despite running it continuously.
  • The reduced complexity of Monzo Standin makes it easier to maintain and test, with fewer than 1% of changes made explicitly for the secondary platform.
  • This strategy is most effective for organizations that:
    • Operate both client-side and server-side components
    • Cannot tolerate any downtime for critical business functions
    • Can accept certain trade-offs, such as reduced functionality in the secondary environment

Additional Resources

  • AWS Multicloud page: [link]
  • AWS Multicloud blog: [link]
  • Monzo's blog post on Monzo Standin: [link]
  • Report on UK aviation outage: [link]

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.