Talks AWS re:Invent 2025 - Generative and Agentic AI on Amazon EKS (CNS344) VIDEO
AWS re:Invent 2025 - Generative and Agentic AI on Amazon EKS (CNS344) Leveraging Kubernetes for Generative and Agentic AI
The AI Technology Revolution
We are in the midst of a rapid AI technology revolution that is transforming the world
This revolution is happening much faster than previous technology shifts, with new foundational models emerging every 2 days
The pace of change is overwhelming, causing anxiety for many businesses and individuals
Running AI on Kubernetes
Customers are increasingly choosing Kubernetes (EKS) to run both agentic AI agents and inference/fine-tuning workloads
Key reasons include:
Control and optimization of underlying infrastructure
Portability across clouds and on-premises
Single platform for all workloads (business apps, AI agents, inference, fine-tuning)
Deploying Agentic AI Agents on EKS
Agentic AI agents can solve problems requiring reasoning, unlike traditional software
Agentic frameworks like Strand provide a Python-based approach to building AI agents
Agents interact with large language models (LLMs) and external APIs/data sources via "tools"
Agents can be containerized and deployed to EKS like any other service
Key capabilities include authentication, observability (logs, metrics, traces), and memory management
Running Inference and Fine-Tuning on EKS
Inference and fine-tuning workloads have different patterns than agents:
Inference is variable/bursty, fine-tuning requires steady capacity
Both run on GPUs, requiring careful selection and optimization
Selecting the right GPU instance:
Determine model size and memory requirements
Use quantization techniques to reduce GPU memory needs
Choose from AWS GPU instance families (G5, G6) based on requirements
Provisioning GPUs in EKS:
Use EKS Autoscaler or open-source Cluster Autoscaler
Leverage EKS AMIs with pre-configured GPU drivers and software
Optimize container pull and model loading times
Observing and managing GPU health:
Monitor GPU utilization, temperature, power
Use node health monitoring and auto-repair features
Scaling inference workloads:
Use custom metrics and the Horizontal Pod Autoscaler (HPA)
Leverage inference frameworks like AI Bricks, Ray, and Deta
Recent EKS Innovations for AI Workloads
New GPU instance types: GB200, GB300, P6B200, P6B300, P54X
Dynamic Resource Allocation (DRRA) for flexible GPU allocation
Fast container pulls with Sochi for improved model loading
Enhancements to Cluster Autoscaler and EKS Autoscaler
EKS Provisioned Control Plane for high-scale AI workloads
ALB Target Optimizer for efficient inference load balancing
Hosted EKS MCP Server for AI agent observability and control
The Future of AI on EKS
Continued focus on providing a reliable, optimized foundation for AI workloads
Extending existing tools and automation to support GPU-based AI
Introducing higher-level AI-specific capabilities out-of-the-box
Leveraging AI to make EKS itself more intelligent and automated
Getting Started
Attend AWS workshops on inference and agentic AI on EKS
Use the AI/ML EKS user guide and Terraform blueprints
Check out related talks at re:Invent for more details
Your Digital Journey deserves a great story. Build one with us.