TalksAWS re:Invent 2025 - AI agents for cloud ops: Automating infrastructure management (AIM340)

AWS re:Invent 2025 - AI agents for cloud ops: Automating infrastructure management (AIM340)

Automating Infrastructure Management with AI Agents

Introduction

  • Presentation on using AI agents to automate cloud operations and security management
  • Covers core components of AI agents, integrations, tools, and fundamental concepts
  • Demonstrates a real-world example of an "Agentic" application built using the Strands framework

Challenges with Manual Cloud Operations

  • Security engineers often struggle with manual processes to analyze disparate log data from various sources (VPC flow logs, firewall logs, API logs, CloudTrail, etc.)
  • Piecing together fragments from different systems is time-consuming and error-prone
  • Difficulty identifying suspicious activity and correlating events across systems

Benefits of AI for Security Operations

  • AI can analyze vast, complex data sets in seconds to detect patterns and anomalies
  • Automates repetitive tasks like triage and log correlation
  • Surfaces key insights to focus on real threats, not just grunt work

Comparing LLMs and Agentic AI

  • LLMs provide simple, concise answers based on existing knowledge
  • Agentic AI can think independently, iterate, and reason to provide valuable responses
  • Agentic AI leverages specialized tools and data sources to enhance its capabilities

Centralizing Security Data with Amazon Security Lake

  • Normalizes and formats data from various sources (CloudTrail, Security Hub, VPC flow logs, etc.) into a central data lake
  • Provides a unified, easy-to-query data format (OCSF) for AI agents to leverage

Multi-Agent Architecture

  • Employs a team of specialized agents, each with their own skills and knowledge domains
  • Allows agents to work together to coordinate tasks and synthesize insights
  • Supports both event-driven and interactive usage models

Demonstrating the Metadata Agent

  • Retrieves business context metadata (account ID, business unit, criticality, compliance scope, etc.) from a DynamoDB table
  • Provides crucial context about the security incident beyond just technical details
  • Enables security teams to quickly prioritize and triage issues

Demonstrating the Security Lake Agent

  • Leverages Amazon Athena to query the centralized security data lake
  • Dynamically generates SQL queries based on natural language prompts
  • Correlates data from various sources (CloudTrail, VPC flow logs, Security Hub) to surface insights

Implementing the Supervisor Agent

  • Orchestrates the execution of multiple specialized agents to complete complex tasks
  • Integrates with native Strands tools (file writes, current time) and other agent-based tools
  • Generates a comprehensive incident report with account context, risk assessment, and recommended actions

Deploying Agents in Production

  • Strands Agent Core Runtime provides a managed service to run AI agents at scale
  • Supports various model providers and frameworks (Strands, LangChain, etc.)
  • Enables authentication, observability, and integration with cloud management protocols (MCP)

Key Takeaways

  • AI agents can automate the grunt work of security operations, freeing up teams to focus on high-impact tasks
  • Centralizing security data in a normalized format (e.g., Amazon Security Lake) is crucial for effective AI-powered analysis
  • A multi-agent architecture allows for specialized capabilities and coordinated problem-solving
  • Deploying agents in production requires considerations around security, reliability, and human oversight

Real-World Impact

  • Demonstrated agents can quickly retrieve business context, analyze security data, and generate actionable insights
  • Reduces the time and effort required to triage security incidents, enabling faster response and remediation
  • Enables security teams to focus on higher-level strategic tasks rather than manual data analysis

Examples

  • Agents can correlate unusual API activity with network anomalies and recommend containment actions
  • Agents can analyze security group changes, unusual traffic patterns, and other events to identify and mitigate threats

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.