Mastering resilience at every layer of the cake (ARC327)
Key Takeaways from the Video Transcription
Introduction
The session is about advanced resilience characteristics, features, and patterns in the cloud.
The presenters are Shlomo Ezra (AWS), Christopher Vocos (Vanguard), and Jenny (United Airlines).
The goal is to provide useful insights and takeaways that the audience can implement.
AWS Cloud Resilience Offerings
AWS offers different types of services:
Zonal services (e.g., EC2, EBS) that fail independently in each Availability Zone (AZ)
Regional services (e.g., Lambda) that are built to be resilient
Global services (e.g., IAM) that have a centralized control plane, which is less reliable than the data plane
Key Resilience Characteristics
Static stability: Maintaining the same operational pattern without changes, even in the face of AZ failures.
Shared responsibility: AWS is responsible for the resilience of the cloud, while customers are responsible for the resilience of what they build on top of it.
Trends and Demands
Growing demand for resilience testing, from basic "game days" to continuous experimentation and integration with CI/CD.
Increased interest in multi-region deployment for certain use cases, such as financial services, healthcare, and media/entertainment.
Emphasis on building a resilience culture, with a focus on setting objectives, designing/implementing resilient architectures, evaluating and testing, operating, and responding/learning.
Resilience Testing and Tools
AWS Fault Injection Service (FIS) and Scenario Library: Allows customers to test real-life scenarios, such as AZ failures, without writing custom code.
Demonstration of testing an application's resilience to an AZ failure using FIS.
Customer Experiences
Vanguard's approach to building a resilience testing suite, enabling engineers to run on-demand performance and chaos tests.
United Airlines' implementation of an automated disaster recovery solution, "Rapid Recovery," to streamline and accelerate disaster recovery across their applications.
Next Steps
Explore the resilience life cycle framework and AWS's offerings at each stage.
Consider implementing resilience testing and building a resilience culture within your organization.
Check out the provided resources and blog posts for further information.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.