TalksAWS re:Invent 2025 - AI agents for cloud ops: Automating infrastructure management (AIM340)

AWS re:Invent 2025 - AI agents for cloud ops: Automating infrastructure management (AIM340)

Automating Infrastructure Management with AI Agents

Introduction

Presentation on using AI agents to automate cloud operations and security management
Covers core components of AI agents, integrations, tools, and fundamental concepts
Demonstrates a real-world example of an "Agentic" application built using the Strands framework

Challenges with Manual Cloud Operations

Security engineers often struggle with manual processes to analyze disparate log data from various sources (VPC flow logs, firewall logs, API logs, CloudTrail, etc.)
Piecing together fragments from different systems is time-consuming and error-prone
Difficulty identifying suspicious activity and correlating events across systems

Benefits of AI for Security Operations

AI can analyze vast, complex data sets in seconds to detect patterns and anomalies
Automates repetitive tasks like triage and log correlation
Surfaces key insights to focus on real threats, not just grunt work

Comparing LLMs and Agentic AI

LLMs provide simple, concise answers based on existing knowledge
Agentic AI can think independently, iterate, and reason to provide valuable responses
Agentic AI leverages specialized tools and data sources to enhance its capabilities

Centralizing Security Data with Amazon Security Lake

Normalizes and formats data from various sources (CloudTrail, Security Hub, VPC flow logs, etc.) into a central data lake
Provides a unified, easy-to-query data format (OCSF) for AI agents to leverage

Multi-Agent Architecture

Employs a team of specialized agents, each with their own skills and knowledge domains
Allows agents to work together to coordinate tasks and synthesize insights
Supports both event-driven and interactive usage models

Demonstrating the Metadata Agent

Retrieves business context metadata (account ID, business unit, criticality, compliance scope, etc.) from a DynamoDB table
Provides crucial context about the security incident beyond just technical details
Enables security teams to quickly prioritize and triage issues

Demonstrating the Security Lake Agent

Leverages Amazon Athena to query the centralized security data lake
Dynamically generates SQL queries based on natural language prompts
Correlates data from various sources (CloudTrail, VPC flow logs, Security Hub) to surface insights

Implementing the Supervisor Agent

Orchestrates the execution of multiple specialized agents to complete complex tasks
Integrates with native Strands tools (file writes, current time) and other agent-based tools
Generates a comprehensive incident report with account context, risk assessment, and recommended actions

Deploying Agents in Production

Strands Agent Core Runtime provides a managed service to run AI agents at scale
Supports various model providers and frameworks (Strands, LangChain, etc.)
Enables authentication, observability, and integration with cloud management protocols (MCP)

Key Takeaways

AI agents can automate the grunt work of security operations, freeing up teams to focus on high-impact tasks
Centralizing security data in a normalized format (e.g., Amazon Security Lake) is crucial for effective AI-powered analysis
A multi-agent architecture allows for specialized capabilities and coordinated problem-solving
Deploying agents in production requires considerations around security, reliability, and human oversight

Real-World Impact

Demonstrated agents can quickly retrieve business context, analyze security data, and generate actionable insights
Reduces the time and effort required to triage security incidents, enabling faster response and remediation
Enables security teams to focus on higher-level strategic tasks rather than manual data analysis

Examples

Agents can correlate unusual API activity with network anomalies and recommend containment actions
Agents can analyze security group changes, unusual traffic patterns, and other events to identify and mitigate threats

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

AWS re:Invent 2025 - AI agents for cloud ops: Automating infrastructure management (AIM340)

Automating Infrastructure Management with AI Agents

Introduction

Challenges with Manual Cloud Operations

Benefits of AI for Security Operations

Comparing LLMs and Agentic AI

Centralizing Security Data with Amazon Security Lake

Multi-Agent Architecture

Demonstrating the Metadata Agent

Demonstrating the Security Lake Agent

Implementing the Supervisor Agent

Deploying Agents in Production

Key Takeaways

Real-World Impact

Examples

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - AI agents for cloud ops: Automating infrastructure management (AIM340)

Automating Infrastructure Management with AI Agents

Introduction

Challenges with Manual Cloud Operations

Benefits of AI for Security Operations

Comparing LLMs and Agentic AI

Centralizing Security Data with Amazon Security Lake

Multi-Agent Architecture

Demonstrating the Metadata Agent

Demonstrating the Security Lake Agent

Implementing the Supervisor Agent

Deploying Agents in Production

Key Takeaways

Real-World Impact

Examples

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.