Talks AWS re:Invent 2025 - Scaling open source observability stack feat. Warner Bros Discovery (COP333) VIDEO
AWS re:Invent 2025 - Scaling open source observability stack feat. Warner Bros Discovery (COP333) Scaling Open Source Observability at Enterprise Scale
Observability Fundamentals
Metrics, logs, and traces are the key building blocks of observability
Metrics provide application health and performance data
Logs capture detailed event information with timestamps
Traces track end-to-end request flows through microservices
Challenges of Large-Scale Observability
Correlating metrics, logs, and traces to identify root causes is difficult at scale
Overwhelming number of dashboards and alerts makes it hard to find issues quickly
Lack of standardization and metadata makes it hard to connect data sources
Layered Observability Approach
Start with basic alerts and monitoring
Dive into logs to understand errors and issues
Enhance with trace data to see end-to-end request flows
Correlate all telemetry data using shared metadata
AWS Managed Open Source Observability Services
Amazon Managed Service for Prometheus for metrics storage and analysis
Amazon OpenSearch Service for logs and trace data
Amazon Managed Grafana for visualization
Benefits of Managed Open Source Observability
End-to-end solutions that integrate metrics, logs, and traces
Scalability, reliability, and security of AWS-managed services
Cost controls and transparency into observability costs
Seamless integration with other AWS services
Warner Bros. Discovery's Observability Journey
Faced challenges with multiple observability tools and lack of data standardization
Adopted an open source approach to avoid vendor lock-in and gain control over data
Implemented a three-part strategy:
1. Data Organizing
Established an Operational Metadata (OMD) hierarchy to tag and track data
Implemented a unified event schema across logs and traces
2. Scaling through Sharding
Geo-sharding by region for metrics, logs, and traces
Logical sharding by business service, service, and component
Abstraction layers (proxy, cross-cluster search) to simplify querying
3. Cost Metering and Accountability
Triangulated cost data, usage metrics, and allocation rules
Built a chargeback framework to attribute costs to specific teams and services
Drove a culture of cost awareness and optimization
Key Takeaways
Organizing observability data with metadata and schemas is critical for scale
Intelligent sharding strategies and abstraction layers enable seamless scalability
Cost metering and chargeback frameworks drive accountability and optimization
Managed open source services provide the scalability, reliability, and cost controls needed for enterprise-grade observability
Your Digital Journey deserves a great story. Build one with us.