TalksAWS re:Invent 2025 - Monitor the quality and accuracy of your generative AI workloads (COP418)

AWS re:Invent 2025 - Monitor the quality and accuracy of your generative AI workloads (COP418)

Monitoring the Quality and Accuracy of Generative AI Workloads

Introduction to AI Agents

  • The presenters discuss the evolution of AI agents, from the early days of chatbots and prompt engineering in 2023, to the adoption of AI capabilities within applications in 2024, and the current state in 2025 where AI is ubiquitous across enterprises.
  • Building an AI agent is relatively straightforward, but observing and monitoring these agents at scale in production is a significant challenge.
  • AWS provides the Strands SDK and other popular frameworks like LangChain and Hugging Face to build AI agents, but customers have struggled to get a complete view of agent performance and behavior.

Observability with Amazon CloudWatch GenAI

  • To address the observability challenge, AWS introduced Amazon CloudWatch GenAI, a curated observability solution for generative AI workloads.
  • Key features of CloudWatch GenAI:
    • Provides out-of-the-box insights and metrics, including invocation, latency, token usage, and more, to give a 360-degree view of agent workflows.
    • Leverages the OpenTelemetry standard for zero-code instrumentation, allowing agents to send telemetry data to CloudWatch.
    • Offers end-to-end prompt tracing to track the entire lifecycle of AI agent requests.
    • Includes built-in data protection policies to mask sensitive data and ensure compliance.
    • Introduces new evaluation metrics to monitor the quality and accuracy of AI agents.
  • CloudWatch GenAI's flexibility allows it to monitor AI agents running on various platforms, including Bedrock Agent Core, EKS, ECS, and Lambda.

Detailed Observability Concepts

  • CloudWatch GenAI observability is structured around the following key concepts:
    • Sessions: Represent the complete conversation between a user and an AI agent.
    • Traces: Track individual conversations from request to response.
    • Spans: Capture the individual tasks performed within a trace, such as parsing the user request or initiating the language model.
    • Sub-spans: Provide fine-grained details, like individual API calls or user interactions.
  • This hierarchical structure gives users a comprehensive view of their AI agent's performance and behavior.

Live Demo: Building and Observing an AI Agent

  • The presenters demonstrate building a "Wagle AI" agent using the Strands SDK and deploying it to the Bedrock Agent Core runtime.
  • They integrate the agent with microservices to fetch real-time data, such as pet and food details, and use the AWS Distro for OpenTelemetry SDK to instrument the agent and send telemetry data to CloudWatch.
  • The CloudWatch GenAI console is used to explore the agent's performance, including session traces, span details, latency, token usage, and error monitoring.
  • The presenters also showcase the new evaluation metrics feature, which allows users to monitor the accuracy and quality of their AI agents.

Key Takeaways and Resources

  • Building AI agents is relatively straightforward, but observing and monitoring them at scale is a significant challenge.
  • CloudWatch GenAI provides a comprehensive observability solution for generative AI workloads, offering insights, tracing, data protection, and evaluation metrics.
  • The flexibility of CloudWatch GenAI allows users to monitor AI agents across various deployment platforms, including Bedrock Agent Core, EKS, ECS, and Lambda.
  • The presenters provide several resources, including code samples and documentation, to help users get started with building and observing their AI agents.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.