TalksAWS re:Invent 2025 - Scaling open source observability stack feat. Warner Bros Discovery (COP333)

AWS re:Invent 2025 - Scaling open source observability stack feat. Warner Bros Discovery (COP333)

Scaling Open Source Observability at Enterprise Scale

Observability Fundamentals

  • Metrics, logs, and traces are the key building blocks of observability
  • Metrics provide application health and performance data
  • Logs capture detailed event information with timestamps
  • Traces track end-to-end request flows through microservices

Challenges of Large-Scale Observability

  • Correlating metrics, logs, and traces to identify root causes is difficult at scale
  • Overwhelming number of dashboards and alerts makes it hard to find issues quickly
  • Lack of standardization and metadata makes it hard to connect data sources

Layered Observability Approach

  1. Start with basic alerts and monitoring
  2. Dive into logs to understand errors and issues
  3. Enhance with trace data to see end-to-end request flows
  4. Correlate all telemetry data using shared metadata

AWS Managed Open Source Observability Services

  • Amazon Managed Service for Prometheus for metrics storage and analysis
  • Amazon OpenSearch Service for logs and trace data
  • Amazon Managed Grafana for visualization

Benefits of Managed Open Source Observability

  • End-to-end solutions that integrate metrics, logs, and traces
  • Scalability, reliability, and security of AWS-managed services
  • Cost controls and transparency into observability costs
  • Seamless integration with other AWS services

Warner Bros. Discovery's Observability Journey

  • Faced challenges with multiple observability tools and lack of data standardization
  • Adopted an open source approach to avoid vendor lock-in and gain control over data
  • Implemented a three-part strategy:

1. Data Organizing

  • Established an Operational Metadata (OMD) hierarchy to tag and track data
  • Implemented a unified event schema across logs and traces

2. Scaling through Sharding

  • Geo-sharding by region for metrics, logs, and traces
  • Logical sharding by business service, service, and component
  • Abstraction layers (proxy, cross-cluster search) to simplify querying

3. Cost Metering and Accountability

  • Triangulated cost data, usage metrics, and allocation rules
  • Built a chargeback framework to attribute costs to specific teams and services
  • Drove a culture of cost awareness and optimization

Key Takeaways

  • Organizing observability data with metadata and schemas is critical for scale
  • Intelligent sharding strategies and abstraction layers enable seamless scalability
  • Cost metering and chargeback frameworks drive accountability and optimization
  • Managed open source services provide the scalability, reliability, and cost controls needed for enterprise-grade observability

Your Digital Journey deserves a great story.

Build one with us.

Cookies Icon

These cookies are used to collect information about how you interact with this website and allow us to remember you. We use this information to improve and customize your browsing experience, as well as for analytics.

If you decline, your information won’t be tracked when you visit this website. A single cookie will be used in your browser to remember your preference.