High-performance generative AI on Amazon EKS (KUB314)
Generative AI on Amazon EKS
Overview
Generative AI and its Use Cases
Generative AI can produce human-like content, reducing time to build ML applications
Key use cases:
Enhancing customer experience
Boosting employee productivity
Content generation (images, videos)
Business operations (log analysis, developer onboarding)
Challenges of Running Generative AI Workloads
Organizational Challenges
Managing multiple models for different teams and use cases
Integrating and managing access to varied data sources
Scaling infrastructure to handle massive workloads
Data Scientist/ML Engineer Challenges
Needing readily available infrastructure to deploy and scale models
Avoiding boilerplate code/scripts to manage model lifecycle
How Amazon EKS Helps
Faster Deployment and Scaling
Leverage existing Kubernetes expertise and ecosystem of open-source tools
Native integration with AWS ML services for seamless scaling
Customization and Cost Optimization
Flexible configuration of the ML environment to suit specific needs
Automated instance selection and scaling with Karpenter for cost optimization
Customer Success Stories
Weviant Labs: Achieved 45% reduction in inference costs by using mixed CPU and GPU instances and optimizing GPU utilization.
Informatica: Built an LLM Ops platform on Amazon EKS, achieving 30% cost savings compared to managed services.
Zoom: Created a multi-model hosting platform on Amazon EKS to scale reliably and efficiently.
Hugging Face: Deployed their ML Hub platform on Amazon EKS to enable inference of millions of models with free-tier pricing.
Amazon EKS Features for Generative AI
Scalable Control Plane: Continuously enhanced for higher performance and scale.
Infrastructure Innovations: Easy integration of EFA, S3 mount, and accelerated AMIs.
Cost-Effective Compute: Support for diverse EC2 instance types, including Graviton, Inferentia, and Trainium.
Monitoring and Observability: CloudWatch Container Insights with automatic support for GPU/Inferentia metrics.
Inference-Specific Capabilities:
Scaling to zero, fast scaling, and optimized container images.
Integration with open-source projects like Ray, KServe, and Triton Inference Server.
Karpenter for dynamic and cost-effective inference scaling.
Eli Lilly's Generative AI Platform on Amazon EKS
Developed a centralized "CATs" platform on Amazon EKS to accelerate generative AI adoption.
Key components:
Model library for hosting and managing various LLMs
Orchestration tools for prompt engineering and multi-agent workflows
Scaling, maintenance, and observability capabilities
Compliance and security layer for governance
Benefits:
Accelerated development and deployment of generative AI solutions
Enabled rapid scaling and global deployment
Provided security, compliance, and quality assurance
Resources and Next Steps
Explore the "Data on EKS" open-source project for generative AI patterns and blueprints.
Check out upcoming sessions on EKS infrastructure as code, S&P Global's generative AI use case, and the future of Kubernetes on AWS.
Continue learning about Amazon EKS through workshops, digital badges, and best practices guides.
Your Digital Journey deserves a great story.
Build one with us.
This website stores cookies on your computer.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.