​​Dr. Werner Vogels Keynote

Lessons in Managing Complexity

Evolvability as a Requirement

  • Make evolvability a key requirement when designing systems
  • Evolvability refers to a system's ability to easily accommodate future changes, distinct from maintainability (fine-grained, short-term changes)
  • Key principles for evolvable systems:
    • Build focused components modeling business concepts with fine-grained interfaces
    • Use decentralization and independently deployable "smart endpoints"
    • Support observability from multiple paradigms for flexible implementation

Breaking Down Complexity

  • Amazon S3 as an example of a simple API evolving into a complex system under the covers
    • Started with 6 microservices, now over 300
    • Maintained simplicity for customers despite growing complexity
  • Evolution of networking and host infrastructure at AWS to enable easier evolution
    • Example: Blackfoot network devices and Nitro host architecture

Aligning Organizations

  • Organize teams and architecture in parallel
  • "2-pizza teams" - small, autonomous teams able to independently deliver functionality
  • Importance of ownership and agency in teams to drive urgency and quality

Cell-Based Architecture

  • Break down systems into isolated, independent "cells" to limit the blast radius of issues
  • Use deterministic algorithms like hash functions to map requests to specific cells
  • Implement a simple routing layer to forward requests to the right cell

Designing for Predictability

  • Avoid event-driven, unpredictable processing patterns
  • Use "constant work" patterns like periodic retrieval of configuration files
  • Example: Route 53 health checker pulling configuration rather than pushing updates

Automating Complexity

  • Automate everything that doesn't require high human judgment
  • Security automation (e.g. automated threat intelligence)
  • Ticket triage using "agentic" workflow automation
  • General principle: Automate the common, standard behavior, and make manual input the exception

The Power of Time

  • Importance of synchronized, accurate clocks in distributed systems
    • Enables much simpler coordination algorithms compared to traditional distributed algorithms
  • Amazon's dedicated "Time Service" providing microsecond-level clock accuracy
  • Use of "Clock Bound" library to get nanosecond-level clocks with error bounds

Sharing Expertise

  • Importance of the AWS Heroes community in sharing lessons learned
  • Opportunity for AWS engineers to donate time and expertise to organizations addressing global challenges

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us