TalksAWS re:Invent 2025 - Amazon Aurora HA and DR design patterns for global resilience (DAT442)
AWS re:Invent 2025 - Amazon Aurora HA and DR design patterns for global resilience (DAT442)
Resilient Database Design with Amazon Aurora
Defining Resilience
Resilience is the ability of a workload to recover from disruption, dynamically acquire resources to meet demand, and mitigate issues like misconfiguration.
Resilience has two key pillars: availability and disaster recovery (DR).
Availability is the proportion of time a workload is available for use, often measured as historical uptime.
Disaster recovery refers to techniques used to recover a workload when something goes wrong, measured by recovery time objective (RTO) and recovery point objective (RPO).
Amazon Aurora MySQL and PostgreSQL
Aurora MySQL (AMS) and Aurora PostgreSQL (APG) are relational database services that combine the speed and availability of commercial databases with the simplicity and cost-effectiveness of the cloud and open source.
They are fully compatible with the open-source MySQL and PostgreSQL engines, allowing existing applications and tools to run without modification.
Handling Database Node Failures
In a self-managed database, a single node failure can lead to significant downtime and data loss.
Aurora addresses this with its storage layer, Grover, which replicates data across 3 Availability Zones (AZs) with 6 copies, providing durability even if an entire AZ fails.
Aurora's log-structured storage allows for fast, zero-RPO backups and point-in-time restores without impacting performance.
To ensure high availability, Aurora allows you to add read replicas that can be failed over to quickly in the event of a primary node failure.
Scaling Performance
Aurora automatically scales storage capacity and performance as needed, without manual provisioning.
You can add read replicas to offload read traffic and isolate workloads, improving performance.
Aurora provides connection pooling and load balancing features to efficiently manage connections across the cluster.
Multi-Region Resilience with Aurora Global Database
Aurora Global Database allows you to replicate your database asynchronously across up to 10 secondary AWS Regions.
This provides durability and high write availability across multiple regions, with the ability to fail over the primary region if needed.
The global endpoint and automated failover capabilities make it easy to manage the multi-region setup.
Switchovers between primary and secondary regions can be performed with minimal downtime and data loss.
Amazon Aurora Serverless v2 (DSQL)
DSQL Architecture
DSQL separates the compute (Query Processors) and storage components, allowing for independent scaling and high availability.
The Adjudicator service handles concurrency control using an optimistic concurrency control protocol.
The Journal service provides durable storage of transactions, replicated across multiple AZs.
Availability and Failover
DSQL is designed for high availability, with no single points of failure. Components automatically fail over in the event of failures.
Connections are managed by a Session Routing Layer, which can quickly provide new connections even in the event of mass connection churn.
Applications need to handle connection errors and retries, as individual connections may fail, but the overall service remains available.
Multi-Region Resilience
DSQL supports multi-region clusters, with a witness region to facilitate quorum-based failover between regions.
Applications can be deployed actively in multiple regions, with a global endpoint automatically routing traffic to the closest healthy region.
In the event of a regional failure, DSQL will automatically reconfigure to continue serving the application without data loss.
Maintenance and Upgrades
DSQL is a fully managed service, with no maintenance required by customers.
Automated security updates, minor version upgrades, and other maintenance tasks are handled by AWS, with no downtime or manual intervention required.
Key Takeaways
Amazon Aurora provides built-in resilience features, including high durability storage, automated failover, and multi-region capabilities, to ensure availability and disaster recovery.
Aurora Serverless v2 (DSQL) takes resilience a step further with a fully managed, highly available architecture that separates compute and storage, and supports active-active multi-region deployments.
Both Aurora offerings allow customers to focus on building applications, rather than managing the underlying database infrastructure and resilience mechanisms.
Specific technical features include:
6-way data replication across 3 AZs for Aurora storage
Automated failover and connection management in Aurora
Optimistic concurrency control and quorum-based multi-region failover in DSQL
Fully automated maintenance and upgrades for both Aurora offerings
Business Impact
The resilience features of Amazon Aurora and DSQL enable customers to build highly available, fault-tolerant applications that can withstand infrastructure failures and regional outages.
This reduces the operational burden and risk associated with running mission-critical databases, allowing teams to focus on developing new features and capabilities rather than managing the underlying database platform.
The multi-region capabilities of Aurora Global Database and DSQL's active-active architecture enable customers to serve users from the closest available region, improving application performance and responsiveness.
The seamless maintenance and upgrade processes ensure that customers always have access to the latest database capabilities and security improvements without disrupting their applications.
Examples and Use Cases
The presenters did not provide specific customer examples or use cases, but the resilience features of Aurora and DSQL would be beneficial for any mission-critical, globally-distributed applications that require high availability and disaster recovery, such as:
E-commerce platforms
Financial trading systems
SaaS applications with global user bases
IoT platforms that ingest and process data at scale
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.