TalksAWS re:Invent 2025 - The future of Kubernetes on AWS (CNS205)
AWS re:Invent 2025 - The future of Kubernetes on AWS (CNS205)
The Future of Kubernetes on AWS (CNS205)
Overview
This presentation provided a detailed overview of recent enhancements to Amazon Elastic Kubernetes Service (EKS) as well as a preview of upcoming capabilities.
The key focus areas included:
Enabling customers to use Kubernetes without having to manage the underlying infrastructure
Scaling EKS to support massive, enterprise-grade workloads like AI/ML training and inference
Integrating EKS more deeply with other AWS services to simplify platform building
The Rise of Kubernetes
Kubernetes Adoption Trends
According to the CNCF survey, Kubernetes adoption in production has grown from 66% in 2023 to 80% in 2024.
The main drivers for Kubernetes' popularity are:
Simplicity - Kubernetes provides a simple, declarative API to manage cloud infrastructure
Consistency - Kubernetes can run anywhere, from on-premises to the edge
Extensibility - Kubernetes can be customized to support diverse workload types
EKS Evolution
EKS was first launched in 2017, starting as a managed Kubernetes control plane.
Over the years, EKS has expanded beyond the control plane to include add-ons, data plane management, and other capabilities.
The goal of EKS is to deliver the fundamental Kubernetes components (compute, networking, storage) in a native way, allowing customers to focus on delivering business value rather than operating infrastructure.
Enhancing the EKS Experience
Upgrades and Observability
EKS has introduced several features to simplify Kubernetes upgrades, including:
Cluster Insights - Scanning clusters for issues that may impact upgrades
Kubernetes Version Support Acceleration - New versions available in EKS within 45 days of upstream release
For observability, EKS provides:
EKS Global Dashboard - Centralized view of all clusters across accounts and regions
Enhanced Container Network Observability - Metrics and visualization for pod networking
EKS MCP Server - Hosted troubleshooting and runbook tool integrated with EKS
Cost Visibility and Security
EKS integrates with CubeCost and AWS Cost Allocation to provide granular cost visibility down to the Kubernetes resource level.
EKS also offers managed image signing through Amazon ECR, providing a secure, automated way to sign container images.
Extending EKS Capabilities
EKS Capabilities - Managed versions of popular Kubernetes projects like Argo CD and AWS Controllers for Kubernetes (ACK), simplifying platform building.
EKS Backup - Fully managed, agentless backup and restore for Kubernetes workloads.
EKS Everywhere - Support for running EKS on-premises, at the edge, and in hybrid environments.
Scaling EKS for Enterprise Workloads
Challenges of Large-Scale Kubernetes
The rise of AI/ML workloads is driving the need for Kubernetes to support unprecedented scale.
Key challenges include:
Increasing model sizes, from millions to trillions of parameters
Running diverse workloads (training, inference, batch processing) in the same clusters
Maintaining consistent performance and reliability at massive scale
EKS Ultra Clusters
EKS Ultra Clusters are designed to address these scale challenges, enabling:
Clusters up to 100,000 nodes and 800,000 GPUs
Concurrent management of diverse workload types (training, inference, batch)
Maintaining Kubernetes performance and reliability at scale
Architectural Innovations
Enhancements to the EKS control plane, including:
In-memory database for improved read/write performance
Partitioned key spaces for hot resource types
Offloading consensus management to AWS Journal system for scalability
Improvements to the data plane, such as:
Multi-ENI support for 100Gbps network bandwidth
Concurrent image pull and unpack for faster container startup
Automated node repair for accelerated workload recovery
The Future of EKS
Simplifying Platform Building
EKS aims to eliminate the need for large platform engineering teams by providing more managed capabilities.
Key priorities include:
Supporting any workload pattern at any scale
Deeper integrations with other AWS services
Meeting customers "where they are" - cloud, on-premises, edge
Accelerating innovation through open-source collaboration
Customer Spotlight: Netflix
Netflix, a major EKS customer, shared their journey of migrating their large-scale, highly dynamic container workloads to EKS.
Key highlights:
Netflix runs hundreds of thousands of containers across four primary regions
They required extremely high launch rates (70,000 containers in 5 minutes) to handle region failovers
The migration to EKS was completed in a single quarter, with a small engineering team
Conclusion
EKS is continuously evolving to meet the growing demands of enterprise Kubernetes workloads, with a focus on simplifying platform building, enabling massive scale, and deeply integrating with the broader AWS ecosystem. The platform's ability to support diverse, mission-critical workloads at unprecedented scale, as demonstrated by the Netflix case study, highlights the maturity and capabilities of EKS in the modern cloud-native landscape.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.