TalksAWS re:Invent 2025 - Run and Scale Agentic AI Applications in Production (AIM243)

AWS re:Invent 2025 - Run and Scale Agentic AI Applications in Production (AIM243)

Running and Scaling Agentic AI Applications in Production

Overview

This presentation from HashiCorp discussed the challenges and best practices for running and scaling "agentic AI" applications in production environments. The speakers, Cole Morrison and Kyle Ruddy, provided a detailed technical overview of the key considerations and solutions for managing AI-powered workloads alongside traditional infrastructure.

The Two Planes: Model and Ops

The presenters emphasized the importance of differentiating between the "model plane" (the AI/ML components) and the "ops plane" (the underlying infrastructure). This distinction is crucial for understanding where the complexity lies and how to effectively manage each aspect.

Model Plane

The "model plane" encompasses the core AI/ML components, including:
- Prompts: The inputs provided to the AI model
- Model: The AI/ML model itself, which is essentially a "glorified autocomplete"
- Reasoning: The heuristics and decision-making logic of the agent
- Memory: The context and state maintained by the agent
- Supervisor pattern: The hierarchical structure of specialized agents
- Tools and APIs: The external services and capabilities available to the agent
These model-level components represent the "thinking" part of the agentic AI application.

Ops Plane

The "ops plane" refers to the underlying infrastructure and services required to run the AI/ML workloads, including:
- Compute resources (CPUs, GPUs)
- Networking and connectivity
- Storage (databases, vector databases)
- Identity and access management
These ops-level components represent the "execution" part of the agentic AI application.

Deployment Considerations

The presenters discussed several key factors to consider when deploying and managing agentic AI applications in production:

Model Serving Options

Hosting the model within the same cluster as the application (for low latency)
Hosting the model in a separate runtime (for better separation of concerns)
Using edge devices for small, specialized models (for low-cost, low-power inference)

Challenges

Speed: Ensuring low-latency access to the AI/ML models and resources
Cost: Managing the potentially high costs of GPU-powered AI/ML workloads
Risk: Mitigating the security and access risks posed by agentic AI agents

Operational Patterns and Solutions

The presenters outlined several operational patterns and solutions for managing agentic AI applications:

Pacing the Front Door

Implementing rate limiting and queueing mechanisms to control the influx of requests
Using Terraform modules and policies to enforce guardrails and best practices

Controlling Reach

Assigning dedicated identities to agents to enable better auditing and access control
Restricting agent access to only the necessary tools and services
Consolidating exit points and applying network-level restrictions

Managing Resources

Isolating agent workloads into dedicated node pools
Using signal-based autoscaling based on queue length rather than resource utilization
Implementing resource throttling and cost controls

HashiCorp Cloud Platform

The presenters highlighted how the HashiCorp Cloud Platform (HCP) can be leveraged to address many of the challenges in managing agentic AI applications:

HCP Terraform provides a centralized, policy-driven approach to infrastructure as code
HCP supports the deployment and management of Kubernetes clusters for running AI/ML workloads
HCP's policy engine and private module registry enable the enforcement of security and best practices

Key Takeaways

Differentiate between the "model plane" (AI/ML components) and the "ops plane" (underlying infrastructure) to better understand and manage the complexity.
Address the unique challenges of speed, cost, and risk when running agentic AI applications in production.
Leverage operational patterns like pacing, controlling reach, and managing resources to maintain control and stability.
Utilize platforms like the HashiCorp Cloud Platform to centralize infrastructure management and enforce policies for AI/ML workloads.

Real-world Examples and Use Cases

The presenters did not provide specific real-world examples or use cases during the presentation. However, they did mention the potential for agentic AI applications to be used in a variety of industries and scenarios, such as:

Automating and optimizing complex workflows
Handling "messy input" and long-tail questions
Providing personalized experiences and recommendations
Leveraging large amounts of data and complex decision-making

The focus of the presentation was on the technical challenges and operational best practices for running these types of AI-powered applications in production environments.

Your Digital Journey deserves a great story.

Build one with us.

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

AWS re:Invent 2025 - Run and Scale Agentic AI Applications in Production (AIM243)

Running and Scaling Agentic AI Applications in Production

Overview

The Two Planes: Model and Ops

Model Plane

Ops Plane

Deployment Considerations

Model Serving Options

Challenges

Operational Patterns and Solutions

Pacing the Front Door

Controlling Reach

Managing Resources

HashiCorp Cloud Platform

Key Takeaways

Real-world Examples and Use Cases

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

AWS re:Invent 2025 - Run and Scale Agentic AI Applications in Production (AIM243)

Running and Scaling Agentic AI Applications in Production

Overview

The Two Planes: Model and Ops

Model Plane

Ops Plane

Deployment Considerations

Model Serving Options

Challenges

Operational Patterns and Solutions

Pacing the Front Door

Controlling Reach

Managing Resources

HashiCorp Cloud Platform

Key Takeaways

Real-world Examples and Use Cases

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.