TalksAWS re:Invent 2025 - The future of Kubernetes on AWS (CNS205)

AWS re:Invent 2025 - The future of Kubernetes on AWS (CNS205)

The Future of Kubernetes on AWS (CNS205)

Overview

  • This presentation provided a detailed overview of recent enhancements to Amazon Elastic Kubernetes Service (EKS) as well as a preview of upcoming capabilities.
  • The key focus areas included:
    • Enabling customers to use Kubernetes without having to manage the underlying infrastructure
    • Scaling EKS to support massive, enterprise-grade workloads like AI/ML training and inference
    • Integrating EKS more deeply with other AWS services to simplify platform building

The Rise of Kubernetes

Kubernetes Adoption Trends

  • According to the CNCF survey, Kubernetes adoption in production has grown from 66% in 2023 to 80% in 2024.
  • The main drivers for Kubernetes' popularity are:
    1. Simplicity - Kubernetes provides a simple, declarative API to manage cloud infrastructure
    2. Consistency - Kubernetes can run anywhere, from on-premises to the edge
    3. Extensibility - Kubernetes can be customized to support diverse workload types

EKS Evolution

  • EKS was first launched in 2017, starting as a managed Kubernetes control plane.
  • Over the years, EKS has expanded beyond the control plane to include add-ons, data plane management, and other capabilities.
  • The goal of EKS is to deliver the fundamental Kubernetes components (compute, networking, storage) in a native way, allowing customers to focus on delivering business value rather than operating infrastructure.

Enhancing the EKS Experience

Upgrades and Observability

  • EKS has introduced several features to simplify Kubernetes upgrades, including:
    • Cluster Insights - Scanning clusters for issues that may impact upgrades
    • Kubernetes Version Support Acceleration - New versions available in EKS within 45 days of upstream release
  • For observability, EKS provides:
    • EKS Global Dashboard - Centralized view of all clusters across accounts and regions
    • Enhanced Container Network Observability - Metrics and visualization for pod networking
    • EKS MCP Server - Hosted troubleshooting and runbook tool integrated with EKS

Cost Visibility and Security

  • EKS integrates with CubeCost and AWS Cost Allocation to provide granular cost visibility down to the Kubernetes resource level.
  • EKS also offers managed image signing through Amazon ECR, providing a secure, automated way to sign container images.

Extending EKS Capabilities

  • EKS Capabilities - Managed versions of popular Kubernetes projects like Argo CD and AWS Controllers for Kubernetes (ACK), simplifying platform building.
  • EKS Backup - Fully managed, agentless backup and restore for Kubernetes workloads.
  • EKS Everywhere - Support for running EKS on-premises, at the edge, and in hybrid environments.

Scaling EKS for Enterprise Workloads

Challenges of Large-Scale Kubernetes

  • The rise of AI/ML workloads is driving the need for Kubernetes to support unprecedented scale.
  • Key challenges include:
    • Increasing model sizes, from millions to trillions of parameters
    • Running diverse workloads (training, inference, batch processing) in the same clusters
    • Maintaining consistent performance and reliability at massive scale

EKS Ultra Clusters

  • EKS Ultra Clusters are designed to address these scale challenges, enabling:
    • Clusters up to 100,000 nodes and 800,000 GPUs
    • Concurrent management of diverse workload types (training, inference, batch)
    • Maintaining Kubernetes performance and reliability at scale

Architectural Innovations

  • Enhancements to the EKS control plane, including:
    • In-memory database for improved read/write performance
    • Partitioned key spaces for hot resource types
    • Offloading consensus management to AWS Journal system for scalability
  • Improvements to the data plane, such as:
    • Multi-ENI support for 100Gbps network bandwidth
    • Concurrent image pull and unpack for faster container startup
    • Automated node repair for accelerated workload recovery

The Future of EKS

Simplifying Platform Building

  • EKS aims to eliminate the need for large platform engineering teams by providing more managed capabilities.
  • Key priorities include:
    • Supporting any workload pattern at any scale
    • Deeper integrations with other AWS services
    • Meeting customers "where they are" - cloud, on-premises, edge
    • Accelerating innovation through open-source collaboration

Customer Spotlight: Netflix

  • Netflix, a major EKS customer, shared their journey of migrating their large-scale, highly dynamic container workloads to EKS.
  • Key highlights:
    • Netflix runs hundreds of thousands of containers across four primary regions
    • They required extremely high launch rates (70,000 containers in 5 minutes) to handle region failovers
    • The migration to EKS was completed in a single quarter, with a small engineering team

Conclusion

EKS is continuously evolving to meet the growing demands of enterprise Kubernetes workloads, with a focus on simplifying platform building, enabling massive scale, and deeply integrating with the broader AWS ecosystem. The platform's ability to support diverse, mission-critical workloads at unprecedented scale, as demonstrated by the Netflix case study, highlights the maturity and capabilities of EKS in the modern cloud-native landscape.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.