TalksAWS re:Invent 2025-The AI revolution in customer support: Building predictive service systems-SPS315

AWS re:Invent 2025-The AI revolution in customer support: Building predictive service systems-SPS315

Transforming AWS Support with AI: Building Predictive Service Systems

The Need for Transformation

  • AWS support has traditionally operated in a reactive, "firefighting" mode, responding to customer issues as they arise
  • This model is not scalable as cloud complexity and customer demands have increased
  • Customers want near-zero downtime, maximum resilience, and zero-impact maintenance
  • Reactive support is expensive and time-consuming, leading AWS to shift towards a more proactive, preventative approach

Evolving AWS Support with AI

  • AWS is leveraging AI and automation to transform its support operations and deliver better customer experiences
  • Key goals:
    • Detect issues before they impact production
    • Partner with customers to build resilient architectures
    • Use AI for accurate, contextual troubleshooting and faster incident response
    • Provide proactive guidance and security monitoring

Unified Operations: Delivering Mission-Critical Support

  • Unified Operations is AWS's new top-tier support plan for business-critical workloads
    • Provides a dedicated team of engineers, TAMs, and incident managers
    • 5-minute response time for critical incidents
    • Leverages context about customer environments and workloads
  • Focuses on:
    • Proactive risk management and resilience testing
    • Tabletop exercises and game days to identify gaps
    • Contextual incident management and retrospectives

Whoop's Journey with Unified Operations

  • Whoop, a health/fitness wearable company, partnered with Unified Operations to prepare for a major product launch
  • Engaged in a risk assessment exercise to identify potential issues
    • Allowed Whoop to scale up and stress-test their systems
    • Provided access to AWS experts for real-time guidance and troubleshooting
  • Achieved 100% availability during the launch and subsequent peak periods
    • Reduced downtime by 70% in Q3 through faster incident response

Leveraging AI and Automation

  • AWS has developed a multi-tiered approach to AI-powered support:
    • Chatbots and knowledge bases for basic assistance
    • AI-augmented human experts for complex troubleshooting
    • Autonomous AI agents for proactive monitoring and recommendations
  • Key technologies:
    • Structured runbooks and workflows for deterministic automation
    • Graph-based knowledge retrieval and fine-tuned models for contextual insights
    • Orchestration layer to coordinate multiple AI agents and human experts
    • Automated reasoning checks for safety and compliance

The AWS DevOps Agent

  • Newly launched tool that integrates with AWS Support
  • Extracts application topology and relevant signals to provide root cause analysis
  • Leverages insights from past issues to proactively recommend preventative actions
  • Allows seamless escalation to human experts when needed

Key Takeaways

  • Transforming support operations requires a shift from reactive to proactive, preventative approaches
  • Leveraging AI and automation is crucial to scale support and deliver better customer experiences
  • Context and historical data are key to providing accurate, actionable insights
  • Partnering with customers and providing dedicated, expert-led support can drive significant business impact
  • Continuous improvement and learning from past incidents are essential for optimizing support systems

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.