High-performance generative AI on Amazon EKS (KUB314)

Challenges of Running Generative AI Workloads

Organizational Challenges

Managing multiple models for different teams and use cases
Integrating and managing access to varied data sources
Scaling infrastructure to handle massive workloads

Data Scientist/ML Engineer Challenges

Needing readily available infrastructure to deploy and scale models
Avoiding boilerplate code/scripts to manage model lifecycle

How Amazon EKS Helps

Faster Deployment and Scaling

Leverage existing Kubernetes expertise and ecosystem of open-source tools
Native integration with AWS ML services for seamless scaling

Customization and Cost Optimization

Flexible configuration of the ML environment to suit specific needs
Automated instance selection and scaling with Karpenter for cost optimization

Customer Success Stories

Weviant Labs: Achieved 45% reduction in inference costs by using mixed CPU and GPU instances and optimizing GPU utilization.

Informatica: Built an LLM Ops platform on Amazon EKS, achieving 30% cost savings compared to managed services.

Zoom: Created a multi-model hosting platform on Amazon EKS to scale reliably and efficiently.

Hugging Face: Deployed their ML Hub platform on Amazon EKS to enable inference of millions of models with free-tier pricing.

Amazon EKS Features for Generative AI

Scalable Control Plane: Continuously enhanced for higher performance and scale.

Infrastructure Innovations: Easy integration of EFA, S3 mount, and accelerated AMIs.

Cost-Effective Compute: Support for diverse EC2 instance types, including Graviton, Inferentia, and Trainium.

Monitoring and Observability: CloudWatch Container Insights with automatic support for GPU/Inferentia metrics.

Inference-Specific Capabilities:

Scaling to zero, fast scaling, and optimized container images.
Integration with open-source projects like Ray, KServe, and Triton Inference Server.
Karpenter for dynamic and cost-effective inference scaling.

Eli Lilly's Generative AI Platform on Amazon EKS

Developed a centralized "CATs" platform on Amazon EKS to accelerate generative AI adoption.

Key components:

Model library for hosting and managing various LLMs
Orchestration tools for prompt engineering and multi-agent workflows
Scaling, maintenance, and observability capabilities
Compliance and security layer for governance

Benefits:

Accelerated development and deployment of generative AI solutions
Enabled rapid scaling and global deployment
Provided security, compliance, and quality assurance

Resources and Next Steps

Explore the "Data on EKS" open-source project for generative AI patterns and blueprints.

Check out upcoming sessions on EKS infrastructure as code, S&P Global's generative AI use case, and the future of Kubernetes on AWS.

Continue learning about Amazon EKS through workshops, digital badges, and best practices guides.

High-performance generative AI on Amazon EKS (KUB314)

Generative AI on Amazon EKS

Overview

Challenges of Running Generative AI Workloads

How Amazon EKS Helps

Customer Success Stories

Amazon EKS Features for Generative AI

Eli Lilly's Generative AI Platform on Amazon EKS

Resources and Next Steps

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

High-performance generative AI on Amazon EKS (KUB314)

Generative AI on Amazon EKS

Overview

Challenges of Running Generative AI Workloads

How Amazon EKS Helps

Customer Success Stories

Amazon EKS Features for Generative AI

Eli Lilly's Generative AI Platform on Amazon EKS

Resources and Next Steps

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.