Supercharge your AI and ML workloads on Amazon ECS (SVS331)

Enabling AI/ML Workloads on Amazon ECS

Challenges and Considerations

Flexibility:

Choice of models, customizations, runtimes, and ML toolkits/libraries
Reliability and consistency of results, and high availability

Performance:

Rapid interactions for applications like chatbots
Control over compute infrastructure and desired accelerators

Scalability:

Scale up application and underlying model layer to support growing demand
Scale back down during periods of lower demand

Cost Optimization:

Running a cost-optimal solution at scale

Observability:

Monitoring, troubleshooting, and debugging capabilities

Security and Compliance:

Building solutions in a secure and compliant manner

Architectural Approach

Two-Layer Architecture:

Decouple the customer-facing application and the model layer
Enables independent scaling, deployment, and technology choices

Hosting the Customer-Facing Application:

Recommend using serverless technologies like AWS Lambda
ECS can also be used as a "serverless control plane"

Hosting the Model Layer:

Options include AWS Bedrock, Amazon SageMaker, and self-hosting on ECS
ECS provides full control, configurability, and flexibility

ECS Compute Options and Considerations

Compute Options:

AWS Fargate (serverless compute)
ECS on EC2 instances (access to accelerated compute options)
ECS Anywhere (hybrid/edge deployment)

Cost Optimization:

Utilization of spot instances and savings plans
Access to Graviton-based instances for better price-performance

Scalability:

ECS Service Auto Scaling with various policies (e.g., Target Tracking, Predictive Scaling)
ECS Capacity Providers for underlying EC2 infrastructure scaling

Storage Options:

Container image bundling
Amazon S3 for model storage
Amazon EFS for elastic file storage

Observability:

Amazon CloudWatch Container Insights
Amazon X-Ray for end-to-end tracing

Customer Success Stories

Womo and Scenario used ECS and Fargate for faster time-to-market with their Gen AI workloads.

Kepler used ECS Anywhere for a hybrid cloud and edge deployment of their ML applications.

Amazon used ECS, EC2 instances with Nvidia GPUs, and AWS Inferentia to build their Rofus ML tool.

Demonstration: Building a Gen AI Inference Application on ECS

Architecture:

Asynchronous architecture with a message broker (SQS) and decoupled inference endpoint
Leverages AWS services like API Gateway, Lambda, SNS, SQS, and ECS

Performance Optimization:

Leveraging GPU-optimized EC2 instances (G6 family)
Pre-warming instances using ASG warm pools
Storing model files in Amazon EFS for fast loading

Scalability:

Autoscaling based on custom backlog-per-task metric
Utilizing ECS Capacity Providers and spot instances for cost optimization

Observability:

Using AWS X-Ray for end-to-end tracing
Integrating NVIDIA Data Center GPU Manager (dcgm) for GPU metrics

In summary, the video highlights how ECS can be leveraged to build reliable, performant, and scalable Gen AI applications, with the flexibility to choose the right compute options, storage, and observability tools to meet the unique requirements of these workloads.

Supercharge your AI and ML workloads on Amazon ECS (SVS331)

Enabling AI/ML Workloads on Amazon ECS

Challenges and Considerations

Architectural Approach

ECS Compute Options and Considerations

Customer Success Stories

Demonstration: Building a Gen AI Inference Application on ECS

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Supercharge your AI and ML workloads on Amazon ECS (SVS331)

Enabling AI/ML Workloads on Amazon ECS

Challenges and Considerations

Architectural Approach

ECS Compute Options and Considerations

Customer Success Stories

Demonstration: Building a Gen AI Inference Application on ECS

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.