Here is a detailed summary of the key takeaways from the video transcription, broken down into sections:
Introduction
- The presentation is about a 10x scale program that Coinbase and AWS Professional Services worked on over the last 18 months.
- The key business objectives were:
- Reducing cost
- Scaling the architecture to support 10x the current traffic
- Modernizing the compute infrastructure from EC2 to a Kubernetes-based architecture
Coinbase Overview
- Coinbase is the first and only publicly traded cryptocurrency exchange.
- They face challenges with sudden volatility in the crypto market, which causes traffic influxes in their systems.
- They need to be able to scale up and down their infrastructure quickly to handle these fluctuations.
Cloud Center of Excellence (CCOE) at Coinbase
- The CCOE's role is to ensure Coinbase is using the cloud effectively and efficiently.
- The three pillars are:
- Cloud architecture excellence
- Cloud usage excellence
- Cloud lifecycle management
- Cloud optimization is a function of the expense of cloud workloads divided by the rates paid for the architecture.
The 10x Program Phases
- The program was broken into 3 distinct phases, each with its own statement of work and learnings.
Phase 1: EC2 X86 to EC2 Graviton
- Coinbase first worked on improving their autoscaling capabilities before bringing in AWS Professional Services.
- They implemented warm pools, step scaling policies, and more granular metrics to improve scalability and responsiveness.
- The migration to Graviton instances resulted in a 20% cost savings, better performance, and reduced carbon footprint.
- Challenges included build pipeline issues, instance availability, and limitations with autoscaling groups.
Phase 2: EC2 to EKS
- The goal was to shrink compute costs by 50% or more by moving to a Kubernetes-based architecture.
- Coinbase used a larger project team approach, with weekly program check-ins, management check-ins, and assigned teams.
- They focused on delivering ROI by identifying high-ROI services to migrate first, bin packing services, and leveraging the managed aspects of EKS.
- Key learnings included the importance of prerequisites, using migrations as teaching moments, and considering IP space needs.
- Results included a 68% reduction in resources for migrated services and a 50% increase in scaling speeds.
Phase 3: Graviton on EKS
- Coinbase realized they had lost some Graviton adoption when moving to EKS, so they created a new statement of work to combine the two efforts.
- They updated their Graviton migration guides for EKS, focused on Go-based services, and organized the work by clusters rather than individual services.
- Challenges included the cluster autoscaler preventing full Graviton utilization, which they solved by increasing Graviton node pools.
- The final results included 10% savings on compute due to Graviton's 20% cheaper instances, better scalability, and improved resiliency.
Overall Learnings
- ROI and financial measurement mattered, so they focused on high-ROI, low-effort, low-impact services first.
- Breaking the journey into phases allowed them to learn and carry forward improvements.
- Defining a single-threaded leader and pairing AWS Professional Services with internal teams helped bridge cultural and trust gaps.
- The overall 10x program resulted in a massive EKS migration in 12 months, proved the ROI-based funding model, achieved aggressive cost savings, and made the CCOE team experts in flexible workforce management.