TalksAWS re:Invent 2025 - Agents in the enterprise: Best practices with Amazon Bedrock AgentCore(AIM3310)

AWS re:Invent 2025 - Agents in the enterprise: Best practices with Amazon Bedrock AgentCore(AIM3310)

Scaling Agents in the Enterprise: Best Practices with Amazon Bedrock AgentCore

Introduction

  • Presenters: Costivasakis (Product Management Lead on AgentCore) and Lera Tankke (Tech Lead on Aentic AI team)
  • Objective: Discuss best practices for taking agent-based applications from proof-of-concept to production at scale

The Challenge of Moving from Proof-of-Concept to Production

  • Customers describe a "PC to production chasm" - it's difficult to go from a demo to a production application that scales across users and provides the necessary governance
  • Key capabilities required:
    1. Accuracy: Agents need to work well with real users, whose behavior may differ from developer expectations
    2. Scalability: Agents must scale across users and domains while maintaining personalization
    3. Secure Memory: Agents must securely handle memory across users and sessions
    4. Cost Control: Hosting infrastructure and token usage for agents can be expensive, requiring cost observability
    5. Observability: Detailed observability is needed to understand agent behavior and performance
    6. Monitoring: Continuous monitoring is required to detect and address agent drift over time

Overview of Amazon Bedrock AgentCore

  • Runtime: Secure, serverless hosting engine for tools and agents, supporting real-time and long-running use cases
  • Memory: Provides short-term and long-term memory capabilities to maintain context across user sessions
  • Gateway: Exposes internal APIs and services to agents, with identity and access control
  • Identity: Integrates with workforce credentials (e.g. Okta, Cognito) to manage access to agents and tools
  • Policy: Allows defining rules to control access and actions for agents and tools
  • Tools: Provides pre-built components like a browser, code interpreter, and observability dashboards

Best Practices for Scaling Agents

  1. Start Small, Think Big: Define a specific use case, create a proof-of-concept, and iterate quickly to validate what works
  2. Implement Observability from the Start: Use open-telemetry compatible traces to understand agent behavior, with dashboards for monitoring
  3. Expose Tools and APIs to Agents: Provide clear descriptions and parameters for tools, handle errors and retries, and reuse existing MCP servers
  4. Leverage Evaluations to Improve Agents: Define success metrics (both technical and business-oriented) and continuously evaluate agent performance
  5. Adopt a Multi-Agent Architecture: Break down complex agents into specialized components to improve accuracy, speed, and cost-effectiveness
  6. Scale Agents Securely and Personalized: Isolate user contexts and sessions, use per-user memory, and enforce access policies
  7. Leverage Code for Deterministic Tasks: Use code for calculations, validations, and other deterministic logic, reserving agents for reasoning and orchestration
  8. Test, Test, and Test Again: Implement continuous testing pipelines, use A/B testing, and monitor for performance drift in production

Clearwater Analytics' Experience with AgentCore

  • Clearwater Analytics is a public fintech company providing financial accounting and reporting for institutional investors
  • They were early adopters of agent-based solutions, starting in 2023
  • Key use cases:
    • Internal knowledge base and SOP assistance
    • Salesforce ticket support
    • Accounting data analysis, anomaly detection, and visualization
    • Automated coding and code review
    • Financial data intake from PDFs
  • Challenges they faced:
    • Scalability, zero-downtime deployments, and avoiding "noisy neighbors"
    • Maintaining rapid follow-ups and context
    • Preserving existing custom features and integrations
  • Why they chose AgentCore:
    • Zero-downtime deployments and flexible technology stack
    • Isolated sessions and memory management
    • Ease of creating MCP servers for data access
  • Best Practices Learned:
    1. Context is King: Ensure agents have unambiguous context to avoid hallucinations
    2. Manage User Interactions: Use clarification in chat, and output confidence/rationale in automated workflows
    3. Rollout Strategically: Identify user pain points, build narrow use cases, and continuously monitor and iterate

Key Takeaways

  • Agents require a robust infrastructure to scale effectively in the enterprise, addressing accuracy, scalability, security, cost, observability, and monitoring
  • Amazon Bedrock AgentCore provides a modular, managed platform to host and operate agent-based applications at scale
  • Best practices include starting small, implementing observability, exposing tools, using evaluations, adopting multi-agent architectures, scaling securely, leveraging code, and continuous testing
  • Clearwater Analytics' experience demonstrates the real-world application of these principles, highlighting the importance of context, user interaction management, and strategic rollout

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.