TalksAWS re:Invent 2025 - Elevate application and generative AI observability (COP326)

AWS re:Invent 2025 - Elevate application and generative AI observability (COP326)

Elevating Application and Generative AI Observability

Challenges with Complex Application Monitoring

  • Lack of visibility and observability in today's complex, AI-powered applications
  • Difficulty understanding how systems are performing and reacting to user dynamics
  • Need for comprehensive monitoring and observability to avoid "guessing" about application behavior

Comprehensive Application Monitoring with Amazon CloudWatch

  • CloudWatch provides native integration and a single pane of glass for monitoring across multiple layers:
    • Infrastructure (EC2, containers, serverless, on-premises)
    • Application (logs, traces, service-level objectives)
    • Database
    • User experience (real user monitoring, synthetic monitoring)
  • Leverages the three pillars of observability: metrics, logs, and traces

Monitoring Application Health with Golden Signals

  • Key metrics to monitor:
    • Request volume
    • Latency
    • Errors and faults
  • Tying these technical metrics to business impact:
    • Revenue per minute
    • Page load time
    • API error codes
    • Session duration
  • Establishing service-level objectives (SLOs) to maintain optimal application health

Enhancing Observability with Amazon CloudWatch Application Insights

  • Automatically discovers applications and provides pre-built dashboards for key metrics
  • Enables easy root cause analysis for issues like HTTP errors and exceptions
  • Allows defining SLOs and tying them to business-level service-level agreements (SLAs)

Monitoring Generative AI Workloads

  • Challenges with observing AI-powered applications:
    • Indeterministic agent behavior
    • Difficulty tracing and analyzing the sequence of AI model invocations
    • Assessing system health and quality of AI responses
  • Amazon Genai Observability capabilities:
    • 360-degree view of AI agents across different frameworks
    • Simple instrumentation using OpenTelemetry
    • End-to-end prompt tracing and data protection
    • Continuous evaluation of AI response quality

Integrating Observability for AI Agents on AWS

  • Leveraging AWS services like Amazon Bedrock and Amazon Agent Core to build and deploy AI agents
  • Instrumenting AI workloads using OpenTelemetry to send telemetry data to Amazon CloudWatch
  • Utilizing CloudWatch's pre-built dashboards and capabilities to monitor AI agent performance, quality, and behavior

Key Takeaways

  • Comprehensive application monitoring is crucial for understanding complex, AI-powered systems
  • Amazon CloudWatch provides a unified observability platform to monitor applications across all layers
  • Establishing SLOs and tying them to business metrics enables proactive management of application health
  • Observability for generative AI workloads requires new capabilities to understand agent behavior, quality, and reasoning
  • AWS provides a full stack of services and tools to build, deploy, and observe AI-powered applications

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.