TalksAWS re:Invent 2025 - AI agents for cloud ops: Automating infrastructure management (AIM340)
AWS re:Invent 2025 - AI agents for cloud ops: Automating infrastructure management (AIM340)
Automating Infrastructure Management with AI Agents
Introduction
Presentation on using AI agents to automate cloud operations and security management
Covers core components of AI agents, integrations, tools, and fundamental concepts
Demonstrates a real-world example of an "Agentic" application built using the Strands framework
Challenges with Manual Cloud Operations
Security engineers often struggle with manual processes to analyze disparate log data from various sources (VPC flow logs, firewall logs, API logs, CloudTrail, etc.)
Piecing together fragments from different systems is time-consuming and error-prone
Difficulty identifying suspicious activity and correlating events across systems
Benefits of AI for Security Operations
AI can analyze vast, complex data sets in seconds to detect patterns and anomalies
Automates repetitive tasks like triage and log correlation
Surfaces key insights to focus on real threats, not just grunt work
Comparing LLMs and Agentic AI
LLMs provide simple, concise answers based on existing knowledge
Agentic AI can think independently, iterate, and reason to provide valuable responses
Agentic AI leverages specialized tools and data sources to enhance its capabilities
Centralizing Security Data with Amazon Security Lake
Normalizes and formats data from various sources (CloudTrail, Security Hub, VPC flow logs, etc.) into a central data lake
Provides a unified, easy-to-query data format (OCSF) for AI agents to leverage
Multi-Agent Architecture
Employs a team of specialized agents, each with their own skills and knowledge domains
Allows agents to work together to coordinate tasks and synthesize insights
Supports both event-driven and interactive usage models
Demonstrating the Metadata Agent
Retrieves business context metadata (account ID, business unit, criticality, compliance scope, etc.) from a DynamoDB table
Provides crucial context about the security incident beyond just technical details
Enables security teams to quickly prioritize and triage issues
Demonstrating the Security Lake Agent
Leverages Amazon Athena to query the centralized security data lake
Dynamically generates SQL queries based on natural language prompts
Correlates data from various sources (CloudTrail, VPC flow logs, Security Hub) to surface insights
Implementing the Supervisor Agent
Orchestrates the execution of multiple specialized agents to complete complex tasks
Integrates with native Strands tools (file writes, current time) and other agent-based tools
Generates a comprehensive incident report with account context, risk assessment, and recommended actions
Deploying Agents in Production
Strands Agent Core Runtime provides a managed service to run AI agents at scale
Supports various model providers and frameworks (Strands, LangChain, etc.)
Enables authentication, observability, and integration with cloud management protocols (MCP)
Key Takeaways
AI agents can automate the grunt work of security operations, freeing up teams to focus on high-impact tasks
Centralizing security data in a normalized format (e.g., Amazon Security Lake) is crucial for effective AI-powered analysis
A multi-agent architecture allows for specialized capabilities and coordinated problem-solving
Deploying agents in production requires considerations around security, reliability, and human oversight
Real-World Impact
Demonstrated agents can quickly retrieve business context, analyze security data, and generate actionable insights
Reduces the time and effort required to triage security incidents, enabling faster response and remediation
Enables security teams to focus on higher-level strategic tasks rather than manual data analysis
Examples
Agents can correlate unusual API activity with network anomalies and recommend containment actions
Agents can analyze security group changes, unusual traffic patterns, and other events to identify and mitigate threats
These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.
If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.