TalksAWS re:Invent 2025 - AIOps Revolution: How iHeart slashed incident response time by 60% with Bedrock

AWS re:Invent 2025 - AIOps Revolution: How iHeart slashed incident response time by 60% with Bedrock

Transforming IT Incident Response with Agentic AI: iHeart's Journey

The Challenge of Modern IT Incidents

  • Large digital organizations like iHeart Media face complex, distributed IT systems that make it difficult to quickly identify and resolve incidents
  • Incident response often involves a "seven circles of on-call hell" - logging in, hunting for information, relying on tribal knowledge, manual diagnosis, and more - wasting precious time
  • Traditional monitoring systems generate too much noise, making it hard to pinpoint root causes

iHeart Media's Scale and Operations

  • iHeart Media is a massive media company with:
    • 850+ AM/FM radio stations
    • 250 million monthly digital users
    • 150 million monthly podcast downloads
    • 5-7 billion monthly digital requests
    • 70+ AWS services powering its architecture
  • The company's digital platform is mission-critical, requiring 24/7 uptime and fast incident resolution to avoid major business impacts

Introducing Agentic AI for IT Operations (AIOps)

  • iHeart built a multi-agent AI system to automate incident response and remediation
  • Key components:
    • Slack bot interface for human interaction
    • Orchestrator agent to delegate tasks to specialized sub-agents
    • Sub-agents for monitoring, logs, Kubernetes, knowledge base, etc.
    • Leveraging AWS Bedrock Agent Core for secure, scalable agent deployment
  • Agents work together to quickly triage incidents, diagnose root causes, and recommend remediation steps

Benefits of the Agentic AI Approach

  • 60% reduction in incident response time by automating triage and diagnosis
  • Improved operational efficiency and reduced toil for on-call engineers
  • Increased consistency and reliability in incident response
  • Preservation of institutional knowledge for faster future incident resolution

Implementing the Agentic AI Solution

  • Slack bot interface allows simple, natural language interaction to trigger incident investigation
  • Orchestrator agent delegates tasks to specialized sub-agents, each with their own context window
    • Prevents sub-agents from overloading the main context with unnecessary data
    • Allows parallel, targeted investigations across monitoring, logs, Kubernetes, etc.
  • Bedrock Agent Core provides a secure, scalable runtime to deploy and manage the multi-agent system

Lessons and Next Steps

  • Quality of context data is critical - "garbage in, garbage out" for AI agents
  • Gradual adoption approach: start with read-only, low-risk tasks before expanding to high-stakes actions
  • Build a robust evaluation environment to continuously test and validate agent performance
  • Future goals include expanding agent capabilities, integrating more data sources, and enabling proactive incident prevention

Key Takeaways

  • Agentic AI can revolutionize IT incident response by automating triage, diagnosis, and remediation
  • iHeart achieved 60% faster incident resolution, reduced toil, and preserved institutional knowledge
  • Implementing a multi-agent architecture with specialized sub-agents is key to managing complexity
  • Careful attention to context data quality and gradual adoption are critical for success
  • The agentic AI revolution is here, and AWS Bedrock provides the tools to join it

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.