TalksAWS re:Invent 2025 - Control humanoid robots and drones with voice and Agentic AI (DEV313)
AWS re:Invent 2025 - Control humanoid robots and drones with voice and Agentic AI (DEV313)
Summary of AWS re:Invent 2025 Presentation: "Control Humanoid Robots and Drones with Voice and Agentic AI"
Introduction to Agentic AI
Overview of the evolution of generative AI, from rule-based low-agency systems to high-agency autonomous intelligent agents
Predictions from top investment institutions on the future "agent economy" - a global network of interconnected AI agents
Emergence of the "one-person unicorn" - companies operated by a single individual collaborating with AI agents
Need to adapt a "stochastic mindset" to thrive in the world of agentic AI
AWS Agentic AI Portfolio
Three-layer architecture: infrastructure, AI/agent development, and applications
Key services highlighted: Amazon Lex for voice interaction, AWS Kendra for knowledge-driven development, and AWS IoT for device integration
AWS's involvement in industry standards like MCP and A2A to enable interoperability between AI agents and applications
Integrating Robots and Large Language Models
Project background: Student team from the Hong Kong Institute of Information Technology
Demonstration of integrating humanoid robots, robot dogs, and digital humans with voice and agentic AI
Architecture overview:
Serverless design using AWS Lambda, AppRunner, and DynamoDB
Integration of Amazon Lex Sonic for voice interaction and streaming
AWS IoT for device control and MCP server implementation in Lambda
Technical Approach and Challenges
Importance of using frameworks like STR Agent to simplify development and handle complex tool orchestration
Challenges encountered:
Accurately mapping voice commands to robot actions
Ensuring robots execute commands reliably without lying about completion
Handling a large number of robot tools and capabilities
Solutions:
Leveraging Kendra to generate prompts and improve accuracy
Explicitly listing all available tools and handling "I can't do that" responses
Implementing parallel execution of robot commands using AWS IoT
Key Takeaways
Embracing frameworks and serverless architecture can significantly simplify agentic AI development
Careful prompt engineering and handling of agent limitations is crucial for reliable robot control
Integrating voice, language models, and IoT devices can enable powerful real-world applications of agentic AI
Business Impact and Applications
Enabling a new generation of interactive, voice-controlled robotic assistants for homes, offices, and public spaces
Automating complex workflows and tasks by combining human-like language understanding with physical robot capabilities
Opportunities for developers to position themselves as critical nodes in the emerging "agent economy"
Conclusion
The presenters showcased an impressive integration of cutting-edge technologies, including large language models, voice interaction, and robotics
The project demonstrates the potential for agentic AI to transform how we interact with and control physical systems in the real world
As the underlying technologies continue to evolve, the opportunities for developers to build innovative, voice-controlled robotic applications will only grow.
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.