TalksAWS re:Invent 2025 - Run and Scale Agentic AI Applications in Production (AIM243)

AWS re:Invent 2025 - Run and Scale Agentic AI Applications in Production (AIM243)

Running and Scaling Agentic AI Applications in Production

Overview

This presentation from HashiCorp discussed the challenges and best practices for running and scaling "agentic AI" applications in production environments. The speakers, Cole Morrison and Kyle Ruddy, provided a detailed technical overview of the key considerations and solutions for managing AI-powered workloads alongside traditional infrastructure.

The Two Planes: Model and Ops

The presenters emphasized the importance of differentiating between the "model plane" (the AI/ML components) and the "ops plane" (the underlying infrastructure). This distinction is crucial for understanding where the complexity lies and how to effectively manage each aspect.

Model Plane

  • The "model plane" encompasses the core AI/ML components, including:
    • Prompts: The inputs provided to the AI model
    • Model: The AI/ML model itself, which is essentially a "glorified autocomplete"
    • Reasoning: The heuristics and decision-making logic of the agent
    • Memory: The context and state maintained by the agent
    • Supervisor pattern: The hierarchical structure of specialized agents
    • Tools and APIs: The external services and capabilities available to the agent
  • These model-level components represent the "thinking" part of the agentic AI application.

Ops Plane

  • The "ops plane" refers to the underlying infrastructure and services required to run the AI/ML workloads, including:
    • Compute resources (CPUs, GPUs)
    • Networking and connectivity
    • Storage (databases, vector databases)
    • Identity and access management
  • These ops-level components represent the "execution" part of the agentic AI application.

Deployment Considerations

The presenters discussed several key factors to consider when deploying and managing agentic AI applications in production:

Model Serving Options

  • Hosting the model within the same cluster as the application (for low latency)
  • Hosting the model in a separate runtime (for better separation of concerns)
  • Using edge devices for small, specialized models (for low-cost, low-power inference)

Challenges

  1. Speed: Ensuring low-latency access to the AI/ML models and resources
  2. Cost: Managing the potentially high costs of GPU-powered AI/ML workloads
  3. Risk: Mitigating the security and access risks posed by agentic AI agents

Operational Patterns and Solutions

The presenters outlined several operational patterns and solutions for managing agentic AI applications:

Pacing the Front Door

  • Implementing rate limiting and queueing mechanisms to control the influx of requests
  • Using Terraform modules and policies to enforce guardrails and best practices

Controlling Reach

  • Assigning dedicated identities to agents to enable better auditing and access control
  • Restricting agent access to only the necessary tools and services
  • Consolidating exit points and applying network-level restrictions

Managing Resources

  • Isolating agent workloads into dedicated node pools
  • Using signal-based autoscaling based on queue length rather than resource utilization
  • Implementing resource throttling and cost controls

HashiCorp Cloud Platform

The presenters highlighted how the HashiCorp Cloud Platform (HCP) can be leveraged to address many of the challenges in managing agentic AI applications:

  • HCP Terraform provides a centralized, policy-driven approach to infrastructure as code
  • HCP supports the deployment and management of Kubernetes clusters for running AI/ML workloads
  • HCP's policy engine and private module registry enable the enforcement of security and best practices

Key Takeaways

  1. Differentiate between the "model plane" (AI/ML components) and the "ops plane" (underlying infrastructure) to better understand and manage the complexity.
  2. Address the unique challenges of speed, cost, and risk when running agentic AI applications in production.
  3. Leverage operational patterns like pacing, controlling reach, and managing resources to maintain control and stability.
  4. Utilize platforms like the HashiCorp Cloud Platform to centralize infrastructure management and enforce policies for AI/ML workloads.

Real-world Examples and Use Cases

The presenters did not provide specific real-world examples or use cases during the presentation. However, they did mention the potential for agentic AI applications to be used in a variety of industries and scenarios, such as:

  • Automating and optimizing complex workflows
  • Handling "messy input" and long-tail questions
  • Providing personalized experiences and recommendations
  • Leveraging large amounts of data and complex decision-making

The focus of the presentation was on the technical challenges and operational best practices for running these types of AI-powered applications in production environments.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.