TalksAWS re:Invent 2025 - Build generative and agentic AI applications on-premises & at the edge (HMC308)

AWS re:Invent 2025 - Build generative and agentic AI applications on-premises & at the edge (HMC308)

Deploying Generative and Agentic AI at the Edge

Generative AI and Agent AI Capabilities

  • Generative AI agents are more goal-oriented and can handle complex tasks with less human intervention
  • Agent AI systems are fully autonomous, can interact with each other, and mimic human-like reasoning

Business Drivers for Edge Deployments

  1. Data Residency: Certain regulated workloads cannot be deployed in cloud regions due to data sovereignty requirements
  2. Corporate Policy: Some enterprises cannot deploy data and models outside their own on-premises facilities
  3. Low Latency: Some workloads require ultra-low latency that cannot be achieved by deploying in distant cloud regions

Key Use Cases for Edge AI

  1. Enhancing Customer Experience:

    • Chatbots using generative AI for natural conversations
    • Call center support with audio transcription and sentiment analysis
  2. Driving Productivity and Creativity:

    • Automated meeting summaries and action item generation
    • Content creation with generative AI for first drafts
  3. Optimizing Business Processes:

    • Automating the generation of critical business reports
    • Reducing the time and cost of manual processes

Optimizing Pre-Trained Models for the Edge

  1. Model Selection:

    • Evaluate models based on specific task requirements, not just size
    • Balance performance (tokens/sec) and accuracy for the use case
  2. Model Optimization:

    • Use the Llama CPP framework to compress models and improve performance by up to 5x
    • Experiment with smaller models first before scaling up
  3. Prompt Engineering:

    • Carefully craft prompts to guide the model's behavior and output
    • Provide context, instructions, and constraints to the model
  4. Fine-Tuning:

    • Perform parameter-efficient fine-tuning to adapt pre-trained models to specific tasks
    • Retrain only a few layers to maintain model performance

Deploying Agentic AI at the Edge

  1. Agentic AI Architecture:

    • Leverage a semantic cache to avoid redundant processing
    • Allow agents to recursively call tools and other agents as needed
  2. Agentic AI Components:

    • Perception: Understand the input question or task
    • Reasoning: Plan a workflow to address the problem
    • Action: Execute the workflow, calling tools and other agents
    • Memory: Maintain short-term and long-term context
  3. Agentic AI Use Case:

    • Demonstrated a generative AI-powered chef assistant to plan a casual lunch menu
    • Utilized small language models running on edge infrastructure

Key Takeaways

  • AWS provides a comprehensive AI continuum from cloud regions to the edge, enabling consistent experiences
  • Careful model selection, optimization, and prompt engineering are crucial for efficient edge deployments
  • Agentic AI systems can be effectively deployed at the edge using small language models and a modular architecture
  • Combining generative AI and agentic AI can drive innovative applications that automate complex workflows

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.