TalksAWS re:Invent 2025 - Architecting for hypergrowth: Scaling to 200 million users w/ Skyscanner-ARC209
AWS re:Invent 2025 - Architecting for hypergrowth: Scaling to 200 million users w/ Skyscanner-ARC209
Scaling to 200 Million Users: Lessons from Skyscanner's AWS Journey
Building an Initial Architecture
Start with a simple 3-tier web application architecture - front-end, back-end, and data store
Use a spectrum of compute options on AWS, from serverless (Lambda) to more managed containers (ECS, EKS, Fargate)
Choose a middle-ground like EKS to balance control and operational overhead
Use a purpose-built API service like API Gateway, ALB, or AppSync to expose business logic to the front-end
Scaling the Data Tier
Begin with a relational database like Amazon Aurora, then consider NoSQL options like DynamoDB for specific use cases
Leverage multi-region Aurora DSQL for global reach and high availability
Implement caching with services like Elasticache to speed up reads
Scaling the Back-end
Leverage Kubernetes autoscaling features like horizontal and vertical pod scaling
Use cluster autoscaling with tools like Cluster Autoscaler and Karpenter to dynamically provision worker nodes
Adopt a "cellular" Kubernetes architecture with bounded failure domains
Skyscanner's 10-Year Scaling Journey
Started with a hybrid on-premises and AWS architecture for burst capacity
Moved to containerized microservices on ECS, then Kubernetes (EKS)
Evolved to a "cellular" EKS architecture with bounded failure domains
Leveraged AWS services like CloudFront, Route53, NLBs, and caching to scale their flights pricing service
Emphasized cost control, observability, and cultural/organizational scaling tactics
Key Lessons Learned
Be opinionated - choose a small set of standardized, hardened technologies
Manage blast radius and failure propagation through architectural patterns
Speak the business language of cost and value, make it everyone's responsibility
Stay pragmatic - less-than-perfect architectures can still serve massive scale
Invest in observability, control plane simplicity, and operational readiness
Technical Details and Metrics
Skyscanner serves 160M+ monthly users, handles 100B+ flight prices per day
Runs 300+ Java microservices, 24 production Kubernetes clusters, 37,000+ cores
Processes 400,000 service-to-service requests per second, manages hundreds of terabytes of caching
Emits 55 billion data events per day into a 25PB data lake
Business Impact
Skyscanner's scalable, cloud-native architecture enabled them to grow from a small startup to a global travel marketplace serving hundreds of millions of users
Their ability to rapidly iterate, measure, and improve their architecture allowed them to keep pace with explosive growth in user demand and data volumes
The lessons they learned around managing blast radius, cost control, and organizational scaling tactics are applicable to any business facing hypergrowth challenges
Examples and Use Cases
Skyscanner's "cellular" Kubernetes architecture with bounded failure domains
Use of AWS services like CloudFront, Route53, NLBs, and caching to scale their flights pricing service
Adoption of open-source tools like Karpenter for efficient Kubernetes autoscaling
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.