AWS re:Invent 2025 - Building at global scale: Engineering AWS expansion (ARC312)

Building at Global Scale: Engineering AWS Expansion

AWS Global Infrastructure Overview

AWS currently has 38 regions launched, with more in development (EU sovereign cloud, Chile, Kingdom of Saudi Arabia)

Regions are the highest level of abstraction, comprising 3+ availability zones

Availability zones are isolated data centers within a region, designed for high resiliency

Local zones are extensions of availability zones, providing low-latency compute and storage closer to end-users

Resilient Architecture

Understanding the scope of services is key for building resilient systems

Zonal services (e.g. EC2) vs. regional services (e.g. S3, DynamoDB)
Leveraging multiple availability zones within a region can provide sufficient resiliency for most use cases
Multi-region strategies add complexity but may be required for certain workloads

AWS focuses on building resilience into its services from the ground up

Evolving Region Build Processes

Early region builds involved manually configuring each availability zone ("region bootstrap ninjas")

This was error-prone and difficult to scale as the global footprint expanded

Modern region builds leverage a "bootstrap region" to parallelize the physical and software build processes

The bootstrap region is used to pre-build core services and infrastructure
This allows the new region to be launched more quickly by migrating the pre-built components

Dependency Management Challenges

The AWS service ecosystem has grown extremely complex, with hundreds of interdependent services

Attempting to map and orchestrate all dependencies is impractical due to the dynamic nature of the system

AWS leverages "static stability" to enable services to recover gracefully without relying on perfect dependency resolution

Continuous Improvement and Testing

AWS conducts regular "game day" exercises to test failure scenarios and validate resilience

The "Correction of Errors" (COE) process is used to thoroughly investigate incidents and drive systemic improvements

Operational Readiness Reviews (ORRs) ensure services are operationally healthy before launch

Key Takeaways

Architect services to be aware of their zonal/regional scope and dependencies

Leverage infrastructure-as-code to automate configuration and deployment

Embrace a culture of continuous improvement, with rapid feedback loops

Empower engineers to escalate issues and drive systemic changes

Test extensively to validate resilience and uncover hidden dependencies

Additional Resources

AWS Builder Library article on "Building Resilient Services"

AWS Builder Library article on "Static Stability"

AWS re:Invent 2025 - Building at global scale: Engineering AWS expansion (ARC312)

Building at Global Scale: Engineering AWS Expansion

AWS Global Infrastructure Overview

Resilient Architecture

Evolving Region Build Processes

Dependency Management Challenges

Continuous Improvement and Testing

Key Takeaways

Additional Resources

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - Building at global scale: Engineering AWS expansion (ARC312)

Building at Global Scale: Engineering AWS Expansion

AWS Global Infrastructure Overview

Resilient Architecture

Evolving Region Build Processes

Dependency Management Challenges

Continuous Improvement and Testing

Key Takeaways

Additional Resources

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.