Scale your high-traffic events to gen AI deployments with AWS Support (SUP308)

Here is a detailed summary of the video transcription in markdown format, broken down into sections:

Introduction

  • The session covers how to scale high-traffic events to generative AI deployments with AWS support.
  • The presenters introduce themselves - Neil Sandes, a Principal Technical Account Manager at AWS, and MK, a Technical Account Manager at AWS. They are also joined by Manish Sinha, the Senior Director of Advanced Analytics and AI at Georgia Pacific.

Common Challenges in Migrating Prototypes to Production

  • 50% of all migration and modernization initiatives on the cloud will be delayed by at least 2 years, as reported by Gartner.
  • The key reasons are:
    • Underestimating the effort to take prototypes and scale them into production-ready deployments.
    • Unprepared downtime and revenue loss (up to $100,000 per hour of downtime).
    • Constantly working on reactive support mode, even after scaling the workloads, due to technical debt.
  • The five common challenges discussed are:
    1. Integration with existing infrastructure
    2. Monitoring and observability
    3. Security
    4. Performance
    5. Cost management

Additional Considerations for Generative AI Workloads

  • Generative AI brings additional complexities, such as:
    1. Data preparation at scale:
      • Centralized data management
      • Cleansing and validating data
      • Sourcing ground truth data
      • Evaluating model outputs
    2. MLOps:
      • Continuous training, deployment, and version control of models
      • Prompt consistency and library management
      • Real-time error monitoring and reaction
    3. Security and governance:
      • Preventing training data poisoning and prompt injection
      • Securing and validating model outputs
    4. Cost management:
      • Implementing financial controls
      • Experimenting with smaller models
      • Optimizing prompt engineering

AWS Well-Architected Framework

  • The Well-Architected Framework is a comprehensive guide to build secure, fault-tolerant, resilient, and efficient cloud infrastructure.
  • It provides design principles, best practices, and questions to assess the current architecture across six pillars: security, reliability, performance efficiency, cost optimization, operational excellence, and sustainability.

AWS Offerings to Support Generative AI Workloads

AWS Countdown

  • AWS Countdown is offered in two flavors: Standard and Premium.
  • Countdown Standard helps anticipate capacity needs and work with service teams to approve resource requests.
  • Countdown Premium is an engineering-led offering that supports the entire journey from initial architecture to production deployment.

Reference Use Case: Fashion Retailer Product Description Generation

  • The architecture involves a front-end web application, a serverless backend using AWS services (Lambda, API Gateway, DynamoDB, etc.), and a Step Functions workflow to generate product descriptions.
  • Key design decisions include:
    • Choosing the right Generative AI model (Bedrock)
    • Structuring the data strategy for input and output
    • Implementing security best practices
    • Enabling logging and cost optimization
    • Scaling the solution

Georgia Pacific's Generative AI Journey

  • Georgia Pacific is a large manufacturing company with 30-35,000 employees and $22 billion in revenue.
  • The key drivers for their AI and Generative AI initiatives are:
    • Labor scarcity and the need to transfer knowledge to the next generation of workers
    • Automating undesirable and repetitive tasks in their manufacturing operations
    • Improving overall equipment effectiveness (OEE) and reducing operating envelope gaps
  • The Operator Assistant use case:
    • Combines real-time sensor data with unstructured data (procedures, manuals) to provide prescriptive guidance to operators.
    • Went through an iterative process of deployment, feedback, and re-engineering to arrive at a scalable, production-ready solution.
    • Leveraged AWS Countdown Premium to fast-track the journey and optimize the architecture.
  • Key learnings and next steps:
    • Parameterize the solution and use standard databases for easy maintainability.
    • Optimize costs and performance, explore graph-based Retrieval Augmented Generation (RAG).
    • Investigate edge deployments for faster response times.
    • Aim for rapid deployment and iteration, while building a robust, long-term architecture.

Additional Resources

  • QR codes for Well-Architected Framework and AWS Countdown information
  • Upcoming AWS re:Invent sessions on managing security for Generative AI workloads
  • Recording of the MFG 2011 session on Georgia Pacific's Operator Assistant use case

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us