TalksScale your high-traffic events to gen AI deployments with AWS Support (SUP308)
Scale your high-traffic events to gen AI deployments with AWS Support (SUP308)
Here is a detailed summary of the video transcription in markdown format, broken down into sections:
Introduction
The session covers how to scale high-traffic events to generative AI deployments with AWS support.
The presenters introduce themselves - Neil Sandes, a Principal Technical Account Manager at AWS, and MK, a Technical Account Manager at AWS. They are also joined by Manish Sinha, the Senior Director of Advanced Analytics and AI at Georgia Pacific.
Common Challenges in Migrating Prototypes to Production
50% of all migration and modernization initiatives on the cloud will be delayed by at least 2 years, as reported by Gartner.
The key reasons are:
Underestimating the effort to take prototypes and scale them into production-ready deployments.
Unprepared downtime and revenue loss (up to $100,000 per hour of downtime).
Constantly working on reactive support mode, even after scaling the workloads, due to technical debt.
The five common challenges discussed are:
Integration with existing infrastructure
Monitoring and observability
Security
Performance
Cost management
Additional Considerations for Generative AI Workloads
Generative AI brings additional complexities, such as:
Data preparation at scale:
Centralized data management
Cleansing and validating data
Sourcing ground truth data
Evaluating model outputs
MLOps:
Continuous training, deployment, and version control of models
Prompt consistency and library management
Real-time error monitoring and reaction
Security and governance:
Preventing training data poisoning and prompt injection
Securing and validating model outputs
Cost management:
Implementing financial controls
Experimenting with smaller models
Optimizing prompt engineering
AWS Well-Architected Framework
The Well-Architected Framework is a comprehensive guide to build secure, fault-tolerant, resilient, and efficient cloud infrastructure.
It provides design principles, best practices, and questions to assess the current architecture across six pillars: security, reliability, performance efficiency, cost optimization, operational excellence, and sustainability.
AWS Offerings to Support Generative AI Workloads
AWS Countdown
AWS Countdown is offered in two flavors: Standard and Premium.
Countdown Standard helps anticipate capacity needs and work with service teams to approve resource requests.
Countdown Premium is an engineering-led offering that supports the entire journey from initial architecture to production deployment.
Reference Use Case: Fashion Retailer Product Description Generation
The architecture involves a front-end web application, a serverless backend using AWS services (Lambda, API Gateway, DynamoDB, etc.), and a Step Functions workflow to generate product descriptions.
Key design decisions include:
Choosing the right Generative AI model (Bedrock)
Structuring the data strategy for input and output
Implementing security best practices
Enabling logging and cost optimization
Scaling the solution
Georgia Pacific's Generative AI Journey
Georgia Pacific is a large manufacturing company with 30-35,000 employees and $22 billion in revenue.
The key drivers for their AI and Generative AI initiatives are:
Labor scarcity and the need to transfer knowledge to the next generation of workers
Automating undesirable and repetitive tasks in their manufacturing operations
Improving overall equipment effectiveness (OEE) and reducing operating envelope gaps
The Operator Assistant use case:
Combines real-time sensor data with unstructured data (procedures, manuals) to provide prescriptive guidance to operators.
Went through an iterative process of deployment, feedback, and re-engineering to arrive at a scalable, production-ready solution.
Leveraged AWS Countdown Premium to fast-track the journey and optimize the architecture.
Key learnings and next steps:
Parameterize the solution and use standard databases for easy maintainability.
Optimize costs and performance, explore graph-based Retrieval Augmented Generation (RAG).
Investigate edge deployments for faster response times.
Aim for rapid deployment and iteration, while building a robust, long-term architecture.
Additional Resources
QR codes for Well-Architected Framework and AWS Countdown information
Upcoming AWS re:Invent sessions on managing security for Generative AI workloads
Recording of the MFG 2011 session on Georgia Pacific's Operator Assistant use case
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.