TalksAWS re:Invent 2025 - Zero-Downtime at Scale: Migrating Peacock's Global Streaming to EKS (IND3325)
AWS re:Invent 2025 - Zero-Downtime at Scale: Migrating Peacock's Global Streaming to EKS (IND3325)
Migrating Peacock's Global Streaming to Amazon EKS: A Zero-Downtime Approach
Overview
NBC Universal Sky, a leading global streaming provider, successfully migrated their Peacock streaming platform from a self-managed Kubernetes environment to Amazon Elastic Kubernetes Service (EKS).
The migration was completed in under 12 months, with zero downtime and minimal disruption to the 40+ million Peacock customers.
This case study highlights the key principles, strategies, and technical details that enabled this large-scale migration project.
Driving Factors and Objectives
Challenges with Self-Managed Kubernetes
The global streaming platform team spent 70% of their time on "toil" tasks, such as Kubernetes upgrades and security patches, leaving limited resources for feature development.
The platform supported hundreds of development teams across multiple time zones, with thousands of deployments per day, making it increasingly difficult to manage.
Migration Objectives
No Action for Development Teams: Maintain consistent interfaces and workflows for application teams, minimizing disruption.
Live Migration: Perform the migration without any downtime or service interruptions.
Zero Downtime: Ensure the streaming platform remains available and reliable throughout the transition.
12-Month Completion: Achieve the migration within a 12-month timeframe.
Unnoticed Transition: Ensure the migration is transparent to end-users, with no visible changes to the platform.
Migration Approach and Strategies
Partnership with AWS
The global streaming team worked closely with AWS Solution Architects and technical specialists to validate the migration plan and leverage proven architectural patterns.
AWS services like Amazon S3, Amazon EKS, Amazon CloudWatch, and AWS Lambda were used to enable the migration and provide observability.
Six-Stage Migration Process
Pre-flight Validation: Prepare the environment by reducing DNS TTLs and setting up end-to-end testing.
EKS Cluster Provisioning: Spin up the new EKS cluster, configure Velero for backup and restore, and deploy core services.
Workload Migration: Migrate application workloads from the self-managed Kubernetes cluster to EKS, handling complex scenarios like message queues.
Traffic Shift: Delegate the DNS zone to the new EKS cluster and gradually shift traffic.
Decommission: Shut down the self-managed Kubernetes cluster after verifying the successful migration.
Post-Migration Validation: Ensure the new EKS-based platform meets or exceeds the performance and reliability of the previous environment.
Comprehensive Testing and Observability
The team developed a robust testing suite, including synthetic load testing, zone validation, node and cluster health checks, and firewall monitoring.
Metrics and observability were critical, with the team leveraging tools like Victoria Metrics to scale monitoring to the required level.
Declarative Platform Approach
The global streaming team is now focused on building a declarative platform interface, allowing development teams to declare their desired state without needing to interact with specific technologies.
This approach enables the platform engineering team to evolve the underlying infrastructure, such as moving from Kubernetes to managed database services, without disrupting application teams.
Key Results and Benefits
The percentage of "toil" tasks dropped from 30% to 10%, freeing up 50,000 lines of code that were no longer needed.
Kubernetes upgrades are now 6 times faster, allowing the team to accelerate the adoption of new features like Istio, Carpenter, and Argo.
The migration enabled the global streaming platform to better support large-scale live events, such as the Super Bowl and Olympics, by improving resilience and reducing operational overhead.
The declarative platform approach is being expanded to manage additional services beyond Kubernetes, including databases and content delivery networks, further enhancing the team's agility and flexibility.
Conclusion
The successful migration of Peacock's global streaming platform to Amazon EKS demonstrates the power of a well-planned, automated, and observability-driven approach to large-scale infrastructure transformations. By partnering with AWS and leveraging the right tools and strategies, the global streaming team was able to achieve their objectives of zero downtime, minimal disruption, and accelerated innovation, setting the stage for continued growth and expansion.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.