AWS re:Invent 2025 - Generative and Agentic AI on Amazon EKS (CNS344)

Leveraging Kubernetes for Generative and Agentic AI

The AI Technology Revolution

We are in the midst of a rapid AI technology revolution that is transforming the world

This revolution is happening much faster than previous technology shifts, with new foundational models emerging every 2 days

The pace of change is overwhelming, causing anxiety for many businesses and individuals

Running AI on Kubernetes

Customers are increasingly choosing Kubernetes (EKS) to run both agentic AI agents and inference/fine-tuning workloads

Key reasons include:

Control and optimization of underlying infrastructure
Portability across clouds and on-premises
Single platform for all workloads (business apps, AI agents, inference, fine-tuning)

Deploying Agentic AI Agents on EKS

Agentic AI agents can solve problems requiring reasoning, unlike traditional software

Agentic frameworks like Strand provide a Python-based approach to building AI agents

Agents interact with large language models (LLMs) and external APIs/data sources via "tools"

Agents can be containerized and deployed to EKS like any other service

Key capabilities include authentication, observability (logs, metrics, traces), and memory management

Running Inference and Fine-Tuning on EKS

Inference and fine-tuning workloads have different patterns than agents:

Inference is variable/bursty, fine-tuning requires steady capacity
Both run on GPUs, requiring careful selection and optimization

Selecting the right GPU instance:

Determine model size and memory requirements
Use quantization techniques to reduce GPU memory needs
Choose from AWS GPU instance families (G5, G6) based on requirements

Provisioning GPUs in EKS:

Use EKS Autoscaler or open-source Cluster Autoscaler
Leverage EKS AMIs with pre-configured GPU drivers and software
Optimize container pull and model loading times

Observing and managing GPU health:

Monitor GPU utilization, temperature, power
Use node health monitoring and auto-repair features

Scaling inference workloads:

Use custom metrics and the Horizontal Pod Autoscaler (HPA)
Leverage inference frameworks like AI Bricks, Ray, and Deta

Recent EKS Innovations for AI Workloads

New GPU instance types: GB200, GB300, P6B200, P6B300, P54X

Dynamic Resource Allocation (DRRA) for flexible GPU allocation

Fast container pulls with Sochi for improved model loading

Enhancements to Cluster Autoscaler and EKS Autoscaler

EKS Provisioned Control Plane for high-scale AI workloads

ALB Target Optimizer for efficient inference load balancing

Hosted EKS MCP Server for AI agent observability and control

The Future of AI on EKS

Continued focus on providing a reliable, optimized foundation for AI workloads

Extending existing tools and automation to support GPU-based AI

Introducing higher-level AI-specific capabilities out-of-the-box

Leveraging AI to make EKS itself more intelligent and automated

Getting Started

Attend AWS workshops on inference and agentic AI on EKS

Use the AI/ML EKS user guide and Terraform blueprints

Check out related talks at re:Invent for more details

AWS re:Invent 2025 - Generative and Agentic AI on Amazon EKS (CNS344)

Leveraging Kubernetes for Generative and Agentic AI

The AI Technology Revolution

Running AI on Kubernetes

Deploying Agentic AI Agents on EKS

Running Inference and Fine-Tuning on EKS

Recent EKS Innovations for AI Workloads

The Future of AI on EKS

Getting Started

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - Generative and Agentic AI on Amazon EKS (CNS344)

Leveraging Kubernetes for Generative and Agentic AI

The AI Technology Revolution

Running AI on Kubernetes

Deploying Agentic AI Agents on EKS

Running Inference and Fine-Tuning on EKS

Recent EKS Innovations for AI Workloads

The Future of AI on EKS

Getting Started

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.