TalksAWS re:Invent 2025 - Building custom agents for intelligent AWS patch automation (COP407)

AWS re:Invent 2025 - Building custom agents for intelligent AWS patch automation (COP407)

Building Custom Agents for Intelligent AWS Patch Automation

Overview

The presentation showcased a solution called "Patchy" - an agentic AI system that automates and intelligently manages the AWS patch process. The key highlights include:

Motivation

  • Experiences from the Log4j vulnerability event, where teams struggled to quickly assess exposure and coordinate patching at scale
  • The need for better automation and intelligence to handle critical vulnerabilities and compliance requirements

Solution Architecture

  • Supervisor-Specialist Agent Pattern
    • Supervisor agent orchestrates the end-to-end workflow
    • Specialized agents (Vulnerability Analyst, Patch Manager, Compliance Analyst) handle specific tasks
  • Leverages Large Language Models (LLMs) running on Amazon Bedrock for intent understanding and reasoning
  • Integrates with AWS services like Inspector, Config, Systems Manager for vulnerability detection, compliance, and patching
  • Stores compliance history in S3 for reporting

Key Capabilities

  1. Vulnerability Assessment: Quickly identifies affected instances, critical CVEs, and compliance risks
  2. Patching Decision: Analyzes SLA requirements, maintenance windows, and business impact to determine optimal patching schedule
  3. Phased Rollout: Applies patches in a controlled, multi-environment rollout with health checks
  4. Compliance Reporting: Tracks patching history and SLA violations for auditing and continuous improvement

Technical Details

  • Uses Strands SDK to build agents that can interact with LLMs and AWS services in an "agentic loop"
  • Leverages Agent Core for scalable and secure agent deployment and management
  • Prompts and tools are designed for deterministic behavior, reduced token consumption, and easy maintainability

Business Impact

  • Enables faster vulnerability assessment and patching, avoiding SLA violations and compliance issues
  • Frees up engineering teams from manual data gathering and decision-making, allowing them to focus on higher-value work
  • Provides visibility and reporting on patching history and compliance, improving governance and audit readiness
  • Increases overall system resilience and reliability by automating the patch management process

Example Walkthrough

The presenters demonstrated the Patchy solution in action, showing how it:

  1. Summarized the production environment and compliance requirements
  2. Identified critical vulnerabilities and affected instances
  3. Determined the optimal patching timeline based on SLA and maintenance windows
  4. Executed the patching process in a phased, controlled manner across dev, staging, and production
  5. Reported on past compliance breaches and SLA violations

Key Takeaways

  1. Tools vs. Prompts: Tools provide more deterministic behavior, while prompts allow for more creative reasoning by the LLM.
  2. Augment, not Replace: AI agents are meant to augment human teams, not replace them entirely. Governance and oversight are still crucial.
  3. Start Small, Test, and Iterate: Begin with a simple MVP, gather feedback, and gradually build out the solution's capabilities.

Resources

  • Bedrock Agent Core code samples: [QR Code]
  • Strands Agent code samples: [QR Code]

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.