TalksAWS re:Invent 2025 - Control humanoid robots and drones with voice and Agentic AI (DEV313)

AWS re:Invent 2025 - Control humanoid robots and drones with voice and Agentic AI (DEV313)

Summary of AWS re:Invent 2025 Presentation: "Control Humanoid Robots and Drones with Voice and Agentic AI"

Introduction to Agentic AI

  • Overview of the evolution of generative AI, from rule-based low-agency systems to high-agency autonomous intelligent agents
  • Predictions from top investment institutions on the future "agent economy" - a global network of interconnected AI agents
  • Emergence of the "one-person unicorn" - companies operated by a single individual collaborating with AI agents
  • Need to adapt a "stochastic mindset" to thrive in the world of agentic AI

AWS Agentic AI Portfolio

  • Three-layer architecture: infrastructure, AI/agent development, and applications
  • Key services highlighted: Amazon Lex for voice interaction, AWS Kendra for knowledge-driven development, and AWS IoT for device integration
  • AWS's involvement in industry standards like MCP and A2A to enable interoperability between AI agents and applications

Integrating Robots and Large Language Models

  • Project background: Student team from the Hong Kong Institute of Information Technology
  • Demonstration of integrating humanoid robots, robot dogs, and digital humans with voice and agentic AI
  • Architecture overview:
    • Serverless design using AWS Lambda, AppRunner, and DynamoDB
    • Integration of Amazon Lex Sonic for voice interaction and streaming
    • AWS IoT for device control and MCP server implementation in Lambda

Technical Approach and Challenges

  • Importance of using frameworks like STR Agent to simplify development and handle complex tool orchestration
  • Challenges encountered:
    • Accurately mapping voice commands to robot actions
    • Ensuring robots execute commands reliably without lying about completion
    • Handling a large number of robot tools and capabilities
  • Solutions:
    • Leveraging Kendra to generate prompts and improve accuracy
    • Explicitly listing all available tools and handling "I can't do that" responses
    • Implementing parallel execution of robot commands using AWS IoT

Key Takeaways

  • Embracing frameworks and serverless architecture can significantly simplify agentic AI development
  • Careful prompt engineering and handling of agent limitations is crucial for reliable robot control
  • Integrating voice, language models, and IoT devices can enable powerful real-world applications of agentic AI

Business Impact and Applications

  • Enabling a new generation of interactive, voice-controlled robotic assistants for homes, offices, and public spaces
  • Automating complex workflows and tasks by combining human-like language understanding with physical robot capabilities
  • Opportunities for developers to position themselves as critical nodes in the emerging "agent economy"

Conclusion

  • The presenters showcased an impressive integration of cutting-edge technologies, including large language models, voice interaction, and robotics
  • The project demonstrates the potential for agentic AI to transform how we interact with and control physical systems in the real world
  • As the underlying technologies continue to evolve, the opportunities for developers to build innovative, voice-controlled robotic applications will only grow.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.