High-performance generative AI on Amazon EKS (KUB314)

Generative AI on Amazon EKS

Overview

  • Generative AI and its Use Cases
    • Generative AI can produce human-like content, reducing time to build ML applications
    • Key use cases:
      • Enhancing customer experience
      • Boosting employee productivity
      • Content generation (images, videos)
      • Business operations (log analysis, developer onboarding)

Challenges of Running Generative AI Workloads

  • Organizational Challenges

    • Managing multiple models for different teams and use cases
    • Integrating and managing access to varied data sources
    • Scaling infrastructure to handle massive workloads
  • Data Scientist/ML Engineer Challenges

    • Needing readily available infrastructure to deploy and scale models
    • Avoiding boilerplate code/scripts to manage model lifecycle

How Amazon EKS Helps

  • Faster Deployment and Scaling

    • Leverage existing Kubernetes expertise and ecosystem of open-source tools
    • Native integration with AWS ML services for seamless scaling
  • Customization and Cost Optimization

    • Flexible configuration of the ML environment to suit specific needs
    • Automated instance selection and scaling with Karpenter for cost optimization

Customer Success Stories

  • Weviant Labs: Achieved 45% reduction in inference costs by using mixed CPU and GPU instances and optimizing GPU utilization.
  • Informatica: Built an LLM Ops platform on Amazon EKS, achieving 30% cost savings compared to managed services.
  • Zoom: Created a multi-model hosting platform on Amazon EKS to scale reliably and efficiently.
  • Hugging Face: Deployed their ML Hub platform on Amazon EKS to enable inference of millions of models with free-tier pricing.

Amazon EKS Features for Generative AI

  • Scalable Control Plane: Continuously enhanced for higher performance and scale.
  • Infrastructure Innovations: Easy integration of EFA, S3 mount, and accelerated AMIs.
  • Cost-Effective Compute: Support for diverse EC2 instance types, including Graviton, Inferentia, and Trainium.
  • Monitoring and Observability: CloudWatch Container Insights with automatic support for GPU/Inferentia metrics.
  • Inference-Specific Capabilities:
    • Scaling to zero, fast scaling, and optimized container images.
    • Integration with open-source projects like Ray, KServe, and Triton Inference Server.
    • Karpenter for dynamic and cost-effective inference scaling.

Eli Lilly's Generative AI Platform on Amazon EKS

  • Developed a centralized "CATs" platform on Amazon EKS to accelerate generative AI adoption.

  • Key components:

    • Model library for hosting and managing various LLMs
    • Orchestration tools for prompt engineering and multi-agent workflows
    • Scaling, maintenance, and observability capabilities
    • Compliance and security layer for governance
  • Benefits:

    • Accelerated development and deployment of generative AI solutions
    • Enabled rapid scaling and global deployment
    • Provided security, compliance, and quality assurance

Resources and Next Steps

  • Explore the "Data on EKS" open-source project for generative AI patterns and blueprints.
  • Check out upcoming sessions on EKS infrastructure as code, S&P Global's generative AI use case, and the future of Kubernetes on AWS.
  • Continue learning about Amazon EKS through workshops, digital badges, and best practices guides.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us