TalksAWS re:Invent 2025 - Ops in the AI age: Innovating together for faster, more efficient operations

AWS re:Invent 2025 - Ops in the AI age: Innovating together for faster, more efficient operations

Innovating Together for Faster, More Efficient Cloud Operations

Embracing the Age of Agentic AI

  • AI has disrupted customer experiences, with end-users now expecting faster and smarter experiences powered by AI
  • AI adoption varies across industries, with 88% of organizations using AI in at least one business function
  • Compliance and security are critical for the adoption of generative AI

Enhancing Fan Experiences with AI at PGA Tour

  • PGA Tour is a massive, data-intensive operation with over 120 cameras and 36 radar trackers capturing 32,000+ shots per week
  • They use AI-powered shot commentary to provide context and relevance to fans watching the live events
  • This system is built on a cloud-based architecture with real-time data ingestion, analysis, and commentary generation
  • Operational dashboards and monitoring provide visibility into the health and performance of the system

Addressing the Challenges of Agentic AI

  1. Trust and Safety: Visibility into AI agent behavior is critical to understand how and why decisions are made
  2. Operational Complexity: Managing microservices, distributed agents, and event-driven architectures across multiple accounts and regions
  3. Data Explosion: AI agents handle thousands of customer requests per hour, generating exponential amounts of telemetry data to monitor and secure

Providing Observability for Agentic AI

  • Generative AI Observability in CloudWatch provides visibility into AI agent behavior, including latency, token usage, and performance
  • CloudWatch Application Map automatically discovers and organizes services, correlating metrics, logs, and traces to simplify root cause analysis
  • Developers can use CloudWatch MCP Servers to troubleshoot AI agents directly from their preferred IDEs and productivity tools

Empowering Smarter Operations with AI

  • CloudWatch Investigations uses generative AI to automate root cause analysis and provide troubleshooting guidance
  • Interactive Incident Report Generation leverages the "Five Whys" framework to capture and share operational learnings
  • Application Observability for AWS GitHub Actions brings observability directly into developers' workflows, enabling them to diagnose production issues using live telemetry

Simplifying Overall Cloud Operations

  • Enhancements to CloudWatch Log Analytics, including natural language support and over 35 analytical commands, provide faster and more accessible insights
  • Expanded CloudWatch Real User Monitoring for mobile apps enables end-to-end visibility, from user behavior to underlying infrastructure
  • CloudTrail Aggregated Events simplify security monitoring by pre-aggregating high-volume API calls and identifying anomalies

Centralizing Observability Across Hybrid and Multi-Cloud Environments

  • Cross-account, cross-region log centralization in CloudWatch consolidates data from development, QA, and production environments
  • Database Insights expand the cross-account, cross-region capabilities to support monitoring and troubleshooting of RDS and Aurora databases

Delivering Personalized Ads at Scale with Warner Bros. Discovery

  • Warner Bros. Discovery operates a massive ad platform across streaming services like HBO Max and Discovery+
  • They use a custom metric called "permits" to autoscale their infrastructure based on factors like content duration, DVR window, and number of ads
  • Predictive autoscaling, based on the rate of change in permits, helps them maintain a 90% ad fill rate with low latency

Key Takeaways

  • AWS is committed to simplifying cloud operations for organizations, whether they are using agentic AI or traditional microservices
  • Observability is the control plane for trust, safety, and accountability in the age of agentic AI
  • AI-powered tools like CloudWatch Investigations and Incident Report Generation can automate root cause analysis and capture operational learnings
  • Centralized observability across hybrid and multi-cloud environments is crucial for managing complex, data-intensive workloads
  • Innovative approaches to autoscaling and infrastructure management can enable the delivery of personalized, low-latency experiences at scale

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.