TalksAWS re:Invent 2025 - Build observable AI agents with Strands, AgentCore, and Datadog (AIM233)

AWS re:Invent 2025 - Build observable AI agents with Strands, AgentCore, and Datadog (AIM233)

Building Observable AI Agents with Strands, AgentCore, and Datadog

Observability Challenges in Modern Architectures

  • Modern architectures have led to an explosion in complexity:
    • Diverse technologies, multiple clouds, open-source frameworks, SaaS providers
    • Increasing ephemeral compute (e.g., serverless functions, containers)
    • Rapid rate of change
  • Adding AI agents further multiplies this complexity:
    • Agents are compound systems with vector stores, models, evals, orchestration
    • Agents operate with autonomy and can be non-deterministic
    • Accountability is shared across models, frameworks, orchestration, and tool calls
  • Key challenges in running agents at scale:
    • Reliability: Ensuring agents don't "hallucinate" and operate with quality
    • Troubleshooting: Complexity makes it difficult to identify the root cause of issues
    • Cost: Every model interaction consumes tokens, which can escalate quickly
    • Security and safety: Enforcing guardrails and secure operation

Building and Deploying Agents with AWS Strands and AgentCore

  • Strands Agents: An open-source Python and TypeScript SDK for building agent-based applications
    • Model-agnostic, supports multiple LLMs (OpenAI, Anthropic, Amazon Bedrock)
    • Includes support for MCP (Multi-Agent Communication Protocol) and A2A (Agent-to-Agent)
  • Deploying Agents with AWS AgentCore
    • AgentCore is a fully managed service that provides runtime, memory, tool gateway, and observability
    • Supports any agent framework and model, with isolated and secure compute environments
    • Provides short-term and long-term memory management, enabling agents to maintain context across sessions

Operationalizing Agents with the AWS Well-Architected Framework

  • Applying the AWS Well-Architected Framework's Generative AI Lens:
    • Operational Excellence:
      • Collect metrics, user feedback, and functional performance data
      • Implement guardrails to detect policy violations, prompt injection, and PII exposure
      • Monitor success and latency of API/tool calls, and track costs per workflow, user, and model
      • Measure and report inference efficiencies to guide sustainable scaling
    • Security:
      • Enforce security policies and guardrails to ensure safe agent operation
      • Monitor for potential security threats, such as prompt injection or PII exposure
    • Reliability:
      • Ensure agents operate consistently and with high quality
      • Troubleshoot issues by analyzing agent reasoning paths and identifying root causes
    • Performance Efficiency:
      • Optimize agent performance by experimenting with different models and prompts
      • Monitor and manage agent resource utilization (e.g., token consumption)
    • Cost Optimization:
      • Track and manage the costs associated with agent operation, including model interactions and tool calls

Operationalizing Observability with Datadog

  • Integrating Strands Agents with Datadog for observability:
    • Strands Agents provide out-of-the-box telemetry (metrics, traces, logs) to Datadog
    • Datadog's AI Observability features enable:
      • Troubleshooting: Analyzing agent reasoning paths, identifying root causes of issues
      • Monitoring: Setting up alerts and automating actions based on key telemetry
      • Evaluation: Implementing pre-built and custom evaluations to assess agent quality
      • Experimentation: Comparing the performance of different models and prompts
  • Datadog's Trace Explorer provides visibility into agent behavior:
    • Observing input/output, tool calls, security/safety checks, and cost metrics
    • Identifying potential issues like PII exposure or prompt injection attempts
  • Datadog's Evaluation capabilities:
    • Pre-built evaluations for failure to answer, hallucination, input/output toxicity, etc.
    • Ability to create custom evaluations to assess brand voice, goal completion, and more
  • Datadog's Experimentation features:
    • Comparing the performance of different models and prompts
    • Measuring key metrics like information accuracy, user satisfaction, and cost

Key Takeaways

  • Observability is crucial for building reliable, secure, and cost-efficient AI agents at scale
  • AWS Strands Agents and AgentCore provide a framework for building and deploying agents
  • The AWS Well-Architected Framework's Generative AI Lens offers best practices for operationalizing agents
  • Datadog's AI Observability features enable comprehensive monitoring, troubleshooting, evaluation, and experimentation for agent-based applications

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.