TalksAWS re:Invent 2025 - Ops in the AI age: Innovating together for faster, more efficient operations
AWS re:Invent 2025 - Ops in the AI age: Innovating together for faster, more efficient operations
Innovating Together for Faster, More Efficient Cloud Operations
Embracing the Age of Agentic AI
AI has disrupted customer experiences, with end-users now expecting faster and smarter experiences powered by AI
AI adoption varies across industries, with 88% of organizations using AI in at least one business function
Compliance and security are critical for the adoption of generative AI
Enhancing Fan Experiences with AI at PGA Tour
PGA Tour is a massive, data-intensive operation with over 120 cameras and 36 radar trackers capturing 32,000+ shots per week
They use AI-powered shot commentary to provide context and relevance to fans watching the live events
This system is built on a cloud-based architecture with real-time data ingestion, analysis, and commentary generation
Operational dashboards and monitoring provide visibility into the health and performance of the system
Addressing the Challenges of Agentic AI
Trust and Safety: Visibility into AI agent behavior is critical to understand how and why decisions are made
Operational Complexity: Managing microservices, distributed agents, and event-driven architectures across multiple accounts and regions
Data Explosion: AI agents handle thousands of customer requests per hour, generating exponential amounts of telemetry data to monitor and secure
Providing Observability for Agentic AI
Generative AI Observability in CloudWatch provides visibility into AI agent behavior, including latency, token usage, and performance
CloudWatch Application Map automatically discovers and organizes services, correlating metrics, logs, and traces to simplify root cause analysis
Developers can use CloudWatch MCP Servers to troubleshoot AI agents directly from their preferred IDEs and productivity tools
Empowering Smarter Operations with AI
CloudWatch Investigations uses generative AI to automate root cause analysis and provide troubleshooting guidance
Interactive Incident Report Generation leverages the "Five Whys" framework to capture and share operational learnings
Application Observability for AWS GitHub Actions brings observability directly into developers' workflows, enabling them to diagnose production issues using live telemetry
Simplifying Overall Cloud Operations
Enhancements to CloudWatch Log Analytics, including natural language support and over 35 analytical commands, provide faster and more accessible insights
Expanded CloudWatch Real User Monitoring for mobile apps enables end-to-end visibility, from user behavior to underlying infrastructure
CloudTrail Aggregated Events simplify security monitoring by pre-aggregating high-volume API calls and identifying anomalies
Centralizing Observability Across Hybrid and Multi-Cloud Environments
Cross-account, cross-region log centralization in CloudWatch consolidates data from development, QA, and production environments
Database Insights expand the cross-account, cross-region capabilities to support monitoring and troubleshooting of RDS and Aurora databases
Delivering Personalized Ads at Scale with Warner Bros. Discovery
Warner Bros. Discovery operates a massive ad platform across streaming services like HBO Max and Discovery+
They use a custom metric called "permits" to autoscale their infrastructure based on factors like content duration, DVR window, and number of ads
Predictive autoscaling, based on the rate of change in permits, helps them maintain a 90% ad fill rate with low latency
Key Takeaways
AWS is committed to simplifying cloud operations for organizations, whether they are using agentic AI or traditional microservices
Observability is the control plane for trust, safety, and accountability in the age of agentic AI
AI-powered tools like CloudWatch Investigations and Incident Report Generation can automate root cause analysis and capture operational learnings
Centralized observability across hybrid and multi-cloud environments is crucial for managing complex, data-intensive workloads
Innovative approaches to autoscaling and infrastructure management can enable the delivery of personalized, low-latency experiences at scale
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.