Talks AWS re:Invent 2025 - Elevate application and generative AI observability (COP326) VIDEO
AWS re:Invent 2025 - Elevate application and generative AI observability (COP326) Elevating Application and Generative AI Observability
Challenges with Complex Application Monitoring
Lack of visibility and observability in today's complex, AI-powered applications
Difficulty understanding how systems are performing and reacting to user dynamics
Need for comprehensive monitoring and observability to avoid "guessing" about application behavior
Comprehensive Application Monitoring with Amazon CloudWatch
CloudWatch provides native integration and a single pane of glass for monitoring across multiple layers:
Infrastructure (EC2, containers, serverless, on-premises)
Application (logs, traces, service-level objectives)
Database
User experience (real user monitoring, synthetic monitoring)
Leverages the three pillars of observability: metrics, logs, and traces
Monitoring Application Health with Golden Signals
Key metrics to monitor:
Request volume
Latency
Errors and faults
Tying these technical metrics to business impact:
Revenue per minute
Page load time
API error codes
Session duration
Establishing service-level objectives (SLOs) to maintain optimal application health
Enhancing Observability with Amazon CloudWatch Application Insights
Automatically discovers applications and provides pre-built dashboards for key metrics
Enables easy root cause analysis for issues like HTTP errors and exceptions
Allows defining SLOs and tying them to business-level service-level agreements (SLAs)
Monitoring Generative AI Workloads
Challenges with observing AI-powered applications:
Indeterministic agent behavior
Difficulty tracing and analyzing the sequence of AI model invocations
Assessing system health and quality of AI responses
Amazon Genai Observability capabilities:
360-degree view of AI agents across different frameworks
Simple instrumentation using OpenTelemetry
End-to-end prompt tracing and data protection
Continuous evaluation of AI response quality
Integrating Observability for AI Agents on AWS
Leveraging AWS services like Amazon Bedrock and Amazon Agent Core to build and deploy AI agents
Instrumenting AI workloads using OpenTelemetry to send telemetry data to Amazon CloudWatch
Utilizing CloudWatch's pre-built dashboards and capabilities to monitor AI agent performance, quality, and behavior
Key Takeaways
Comprehensive application monitoring is crucial for understanding complex, AI-powered systems
Amazon CloudWatch provides a unified observability platform to monitor applications across all layers
Establishing SLOs and tying them to business metrics enables proactive management of application health
Observability for generative AI workloads requires new capabilities to understand agent behavior, quality, and reasoning
AWS provides a full stack of services and tools to build, deploy, and observe AI-powered applications
Your Digital Journey deserves a great story. Build one with us.