TalksAWS re:Invent 2025 - Live hallucination detection in prod with LaunchDarkly’s AI Configs (AIM258)

AWS re:Invent 2025 - Live hallucination detection in prod with LaunchDarkly’s AI Configs (AIM258)

Comprehensive Summary: Live Hallucination Detection in Production with LaunchDarkly's AI Configs

Overview

  • Presentation by Scarlett Attensil (Senior Developer Educator) and Mark Pollock (Head of LaunchDarkly) on using LaunchDarkly's AI Configs to address the challenge of hallucinations in production AI systems.
  • Key focus areas:
    1. Real-time control and configuration of AI agents
    2. Experimentation and optimization of prompt strategies
    3. Building self-healing AI systems to detect and mitigate hallucinations

The Challenge of Hallucinations in Production AI

  • Hallucinations are a major problem for companies deploying generative AI models in production, leading to:
    • Increased costs
    • Engineering time waste
    • Reduced confidence in system reliability
    • Brand equity damage
  • Hallucinations are inherent to the nature of language models and cannot be solved solely by brute-force approaches.
  • The goal is to build systems that are tolerant to accuracy lapses and can automatically heal from interruptions.

Real-time Control and Configuration of AI Agents

  • Traditional software engineering principles of determinism, reproducibility, and traceability do not apply to probabilistic AI systems.
  • LaunchDarkly's AI Configs allow dynamically configuring and modulating AI agent components (prompts, tools, models, hyperparameters) in real-time without redeploying the underlying application.
  • This provides the ability to:
    • Rapidly test and iterate on agent configurations
    • Dynamically serve different agent variations based on user context
    • Maintain version control and approval workflows for AI configurations

Experimentation and Optimization of Prompt Strategies

  • Intuition-driven prompt engineering often fails - the best prompt strategy must be empirically determined.
  • LaunchDarkly enables running A/B experiments on prompt variations, model choices, and other agent parameters using real user traffic.
  • Example experiment results:
    • Switching from Sonnet 4 to Llama 4 model improved accuracy by 2.89%, reduced tokens by 24%, and decreased costs by 24%.
    • A concise prompt outperformed a more systematic prompt, reducing response time by 34% and negative feedback by 72%.
  • Ability to measure key business metrics (cost, accuracy, customer satisfaction) to optimize for desired outcomes.

Building Self-Healing AI Systems

  • Continuous monitoring and evaluation of AI agents is critical, but traditional approaches rely on delayed, passive monitoring.
  • LaunchDarkly's approach injects real-time evaluators that can detect and automatically mitigate hallucinations before they reach customers.
    • Triggers a fallback to a known-good agent configuration
    • Passes feedback to automatically correct the issue
  • Provides full observability and tracing to diagnose and resolve issues.
  • Enables guarded rollouts, automatically rolling back changes that degrade key performance metrics.

Real-World Examples

  • Relay Networks used LaunchDarkly to roll out a secure, HIPAA-compliant AI-powered healthcare communication solution in 3 weeks.
  • Hierology's HR tech chatbot leveraged LaunchDarkly to iterate on prompts hourly, test different models per user segment, and automatically monitor quality.

Key Takeaways

  • Traditional software deployment strategies fail for probabilistic AI systems - a new approach is needed.
  • LaunchDarkly's AI Configs provide the ability to dynamically configure, experiment on, and self-heal AI agents in production.
  • Empirical optimization of prompt strategies and model choices can yield significant improvements in accuracy, cost, and user satisfaction.
  • Continuous, real-time monitoring and automatic mitigation of hallucinations is critical for deploying AI in mission-critical applications.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.