Talks AWS re:Invent 2025 - Build generative and agentic AI applications on-premises & at the edge (HMC308) VIDEO
AWS re:Invent 2025 - Build generative and agentic AI applications on-premises & at the edge (HMC308) Deploying Generative and Agentic AI at the Edge
Generative AI and Agent AI Capabilities
Generative AI agents are more goal-oriented and can handle complex tasks with less human intervention
Agent AI systems are fully autonomous, can interact with each other, and mimic human-like reasoning
Business Drivers for Edge Deployments
Data Residency : Certain regulated workloads cannot be deployed in cloud regions due to data sovereignty requirements
Corporate Policy : Some enterprises cannot deploy data and models outside their own on-premises facilities
Low Latency : Some workloads require ultra-low latency that cannot be achieved by deploying in distant cloud regions
Key Use Cases for Edge AI
Enhancing Customer Experience :
Chatbots using generative AI for natural conversations
Call center support with audio transcription and sentiment analysis
Driving Productivity and Creativity :
Automated meeting summaries and action item generation
Content creation with generative AI for first drafts
Optimizing Business Processes :
Automating the generation of critical business reports
Reducing the time and cost of manual processes
Optimizing Pre-Trained Models for the Edge
Model Selection :
Evaluate models based on specific task requirements, not just size
Balance performance (tokens/sec) and accuracy for the use case
Model Optimization :
Use the Llama CPP framework to compress models and improve performance by up to 5x
Experiment with smaller models first before scaling up
Prompt Engineering :
Carefully craft prompts to guide the model's behavior and output
Provide context, instructions, and constraints to the model
Fine-Tuning :
Perform parameter-efficient fine-tuning to adapt pre-trained models to specific tasks
Retrain only a few layers to maintain model performance
Deploying Agentic AI at the Edge
Agentic AI Architecture :
Leverage a semantic cache to avoid redundant processing
Allow agents to recursively call tools and other agents as needed
Agentic AI Components :
Perception: Understand the input question or task
Reasoning: Plan a workflow to address the problem
Action: Execute the workflow, calling tools and other agents
Memory: Maintain short-term and long-term context
Agentic AI Use Case :
Demonstrated a generative AI-powered chef assistant to plan a casual lunch menu
Utilized small language models running on edge infrastructure
Key Takeaways
AWS provides a comprehensive AI continuum from cloud regions to the edge, enabling consistent experiences
Careful model selection, optimization, and prompt engineering are crucial for efficient edge deployments
Agentic AI systems can be effectively deployed at the edge using small language models and a modular architecture
Combining generative AI and agentic AI can drive innovative applications that automate complex workflows
Your Digital Journey deserves a great story. Build one with us.