Running and Scaling Agentic AI Applications in Production
Overview
This presentation from HashiCorp discussed the challenges and best practices for running and scaling "agentic AI" applications in production environments. The speakers, Cole Morrison and Kyle Ruddy, provided a detailed technical overview of the key considerations and solutions for managing AI-powered workloads alongside traditional infrastructure.
The Two Planes: Model and Ops
The presenters emphasized the importance of differentiating between the "model plane" (the AI/ML components) and the "ops plane" (the underlying infrastructure). This distinction is crucial for understanding where the complexity lies and how to effectively manage each aspect.
Model Plane
- The "model plane" encompasses the core AI/ML components, including:
- Prompts: The inputs provided to the AI model
- Model: The AI/ML model itself, which is essentially a "glorified autocomplete"
- Reasoning: The heuristics and decision-making logic of the agent
- Memory: The context and state maintained by the agent
- Supervisor pattern: The hierarchical structure of specialized agents
- Tools and APIs: The external services and capabilities available to the agent
- These model-level components represent the "thinking" part of the agentic AI application.
Ops Plane
- The "ops plane" refers to the underlying infrastructure and services required to run the AI/ML workloads, including:
- Compute resources (CPUs, GPUs)
- Networking and connectivity
- Storage (databases, vector databases)
- Identity and access management
- These ops-level components represent the "execution" part of the agentic AI application.
Deployment Considerations
The presenters discussed several key factors to consider when deploying and managing agentic AI applications in production:
Model Serving Options
- Hosting the model within the same cluster as the application (for low latency)
- Hosting the model in a separate runtime (for better separation of concerns)
- Using edge devices for small, specialized models (for low-cost, low-power inference)
Challenges
- Speed: Ensuring low-latency access to the AI/ML models and resources
- Cost: Managing the potentially high costs of GPU-powered AI/ML workloads
- Risk: Mitigating the security and access risks posed by agentic AI agents
Operational Patterns and Solutions
The presenters outlined several operational patterns and solutions for managing agentic AI applications:
Pacing the Front Door
- Implementing rate limiting and queueing mechanisms to control the influx of requests
- Using Terraform modules and policies to enforce guardrails and best practices
Controlling Reach
- Assigning dedicated identities to agents to enable better auditing and access control
- Restricting agent access to only the necessary tools and services
- Consolidating exit points and applying network-level restrictions
Managing Resources
- Isolating agent workloads into dedicated node pools
- Using signal-based autoscaling based on queue length rather than resource utilization
- Implementing resource throttling and cost controls
HashiCorp Cloud Platform
The presenters highlighted how the HashiCorp Cloud Platform (HCP) can be leveraged to address many of the challenges in managing agentic AI applications:
- HCP Terraform provides a centralized, policy-driven approach to infrastructure as code
- HCP supports the deployment and management of Kubernetes clusters for running AI/ML workloads
- HCP's policy engine and private module registry enable the enforcement of security and best practices
Key Takeaways
- Differentiate between the "model plane" (AI/ML components) and the "ops plane" (underlying infrastructure) to better understand and manage the complexity.
- Address the unique challenges of speed, cost, and risk when running agentic AI applications in production.
- Leverage operational patterns like pacing, controlling reach, and managing resources to maintain control and stability.
- Utilize platforms like the HashiCorp Cloud Platform to centralize infrastructure management and enforce policies for AI/ML workloads.
Real-world Examples and Use Cases
The presenters did not provide specific real-world examples or use cases during the presentation. However, they did mention the potential for agentic AI applications to be used in a variety of industries and scenarios, such as:
- Automating and optimizing complex workflows
- Handling "messy input" and long-tail questions
- Providing personalized experiences and recommendations
- Leveraging large amounts of data and complex decision-making
The focus of the presentation was on the technical challenges and operational best practices for running these types of AI-powered applications in production environments.