TalksAWS re:Invent 2025 - Architecting for hypergrowth: Scaling to 200 million users w/ Skyscanner-ARC209

AWS re:Invent 2025 - Architecting for hypergrowth: Scaling to 200 million users w/ Skyscanner-ARC209

Scaling to 200 Million Users: Lessons from Skyscanner's AWS Journey

Building an Initial Architecture

  • Start with a simple 3-tier web application architecture - front-end, back-end, and data store
  • Use a spectrum of compute options on AWS, from serverless (Lambda) to more managed containers (ECS, EKS, Fargate)
  • Choose a middle-ground like EKS to balance control and operational overhead
  • Use a purpose-built API service like API Gateway, ALB, or AppSync to expose business logic to the front-end

Scaling the Data Tier

  • Begin with a relational database like Amazon Aurora, then consider NoSQL options like DynamoDB for specific use cases
  • Leverage multi-region Aurora DSQL for global reach and high availability
  • Implement caching with services like Elasticache to speed up reads

Scaling the Back-end

  • Leverage Kubernetes autoscaling features like horizontal and vertical pod scaling
  • Use cluster autoscaling with tools like Cluster Autoscaler and Karpenter to dynamically provision worker nodes
  • Adopt a "cellular" Kubernetes architecture with bounded failure domains

Skyscanner's 10-Year Scaling Journey

  • Started with a hybrid on-premises and AWS architecture for burst capacity
  • Moved to containerized microservices on ECS, then Kubernetes (EKS)
  • Evolved to a "cellular" EKS architecture with bounded failure domains
  • Leveraged AWS services like CloudFront, Route53, NLBs, and caching to scale their flights pricing service
  • Emphasized cost control, observability, and cultural/organizational scaling tactics

Key Lessons Learned

  • Be opinionated - choose a small set of standardized, hardened technologies
  • Manage blast radius and failure propagation through architectural patterns
  • Speak the business language of cost and value, make it everyone's responsibility
  • Stay pragmatic - less-than-perfect architectures can still serve massive scale
  • Invest in observability, control plane simplicity, and operational readiness

Technical Details and Metrics

  • Skyscanner serves 160M+ monthly users, handles 100B+ flight prices per day
  • Runs 300+ Java microservices, 24 production Kubernetes clusters, 37,000+ cores
  • Processes 400,000 service-to-service requests per second, manages hundreds of terabytes of caching
  • Emits 55 billion data events per day into a 25PB data lake

Business Impact

  • Skyscanner's scalable, cloud-native architecture enabled them to grow from a small startup to a global travel marketplace serving hundreds of millions of users
  • Their ability to rapidly iterate, measure, and improve their architecture allowed them to keep pace with explosive growth in user demand and data volumes
  • The lessons they learned around managing blast radius, cost control, and organizational scaling tactics are applicable to any business facing hypergrowth challenges

Examples and Use Cases

  • Skyscanner's "cellular" Kubernetes architecture with bounded failure domains
  • Use of AWS services like CloudFront, Route53, NLBs, and caching to scale their flights pricing service
  • Adoption of open-source tools like Karpenter for efficient Kubernetes autoscaling

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.