Best practices for generative AI observability (COP404)

Introduction

  • Dennis Bov and Greg Apple presented a talk on observability for generative AI (Gen) systems.
  • They discussed the challenges of moving Gen proofs-of-concept (POCs) to production and the importance of observability in addressing these challenges.
  • The talk covered the key components of a Gen platform and the observability concerns related to each component.

Challenges of Moving Gen POCs to Production

  • With traditional machine learning, the focus has been on accuracy, but with Gen, the overall quality of the system is more important.
  • Factors like relevance, factual correctness, and robustness are critical, but difficult to achieve with the current state-of-the-art in Gen models.
  • There have been several high-profile incidents of Gen models producing inaccurate or harmful outputs, highlighting the need for better observability.
  • Additionally, performance and cost are major concerns, with Gen models often exhibiting much slower latencies and higher costs compared to traditional systems.

Key Components of a Gen Platform

  1. Foundation Model Hub: Provides access to various Gen models and includes features like access control, cost tracking, and failover.
  2. Retrieval Augmented Generation (RAG): Combines Gen models with information retrieval to provide context-aware responses.
  3. Orchestration: Handles higher-level application-level coordination, versioning, and governance.
  4. Underlying Infrastructure: The IT infrastructure that the Gen platform runs on.

Observability for Gen Platforms

  1. Foundation Model Hub:

    • Observe API invocations, latency, token consumption, and other model-specific metrics.
    • Implement cost tracking and throttling.
  2. Retrieval Augmented Generation:

    • Observe the indexing and embedding processes.
    • Analyze the quality of the retrieved context and its relevance to the prompts.
  3. Orchestration:

    • Trace the execution of the Gen-based workflows.
    • Observe the performance and cost of the orchestration components.
  4. Observability Tooling:

    • Leverage CloudWatch for metrics, logs, and traces.
    • Utilize features like alarm hierarchies, log pattern analysis, and embedding drift detection.
  5. End-User Feedback:

    • Capture user sentiment (e.g., thumbs up/down) and associate it with the Gen system's outputs.
    • Use CloudWatch Embedded Metric Format to correlate logs and metrics for this feedback.

Conclusion

  • Observability is critical for the successful adoption and operation of Gen systems in production.
  • A layered approach to observability, leveraging CloudWatch's capabilities, can help address the challenges of Gen platforms.
  • Combining metrics, logs, and traces, along with advanced features like guardrails and end-user feedback, can provide a comprehensive observability solution for Gen systems.

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.

Talk to us