Control observability data cost & complexity for Amazon EKS workloads (KUB101)

Observability Challenges in Kubernetes

In the VM era, observability data patterns were more predictable, and Telemetry volumes were manageable.

The Kubernetes era presents new challenges:

Modern Kubernetes environments generate orders of magnitude more data, with a 10-100x increase in time series data.
Exponential growth in metrics cardinality, with a 250% year-over-year increase in log volume.

Impact on Cost and Productivity

Factors influencing observability costs:

Number of containers, high metric granularity, log verbosity, and retention policies.
Cardinality explosion, leading to exponential increase in query complexity and processing costs.

Impact on developer productivity:

87% of engineers say cloud-native architecture has increased the complexity of incident discovery and troubleshooting.
Engineers spend an average of 10 hours per week (25% of their work week) trying to triage and understand incidents.
88% report that the time spent on issues negatively impacts their careers, leading to burnout.

Strategies for Reducing Cost and Noise

Low-Hanging Fruit:

Drop in metrics: Disable unnecessary metrics from tools like cAdvisor and keep-state.

Logs: Reroute seldom-used data to object storage and sample information-level logs.

Traces: Set global header and/or tail sampling to capture only interesting traces.

Advanced Solutions:

Aggregation:

Remove unused metric dimensions.
Create rollup metrics to reduce time stamp precision for long-term trending.

Logs-to-Metrics:

Summarize logs into metrics without ingesting raw data.
Convert detailed error logs into error rate metrics.

Tiered Sampling in Traces:

Capture a higher percentage of traces for revenue-critical paths.
Reduce sampling rates for lower-priority flows.

Chronosphere's Approach

Chronosphere's observability platform and Telemetry pipeline provide control and optimization capabilities:

The control plane analyzes the value of data and allows teams to optimize before storage.
The Telemetry pipeline processes data in-flight to reduce, transform, and enrich logs.

Chronosphere has helped customers save 60% on observability costs and 30% on logging costs.

Case Study: Affirm

Affirm, a leading buy-now-pay-later company, faced challenges with observability during high-traffic events like Black Friday.

Chronosphere's aggregation and filtering capabilities allowed Affirm to control costs while maintaining high data quality for developers.

Chronosphere's platform demonstrated robust capabilities, with 99.9% availability, and enabled Affirm to increase data ingestion and achieve significant cost savings.

Control observability data cost & complexity for Amazon EKS workloads (KUB101)

Managing Observability in EKS Environments

Introduction

Observability Challenges in Kubernetes

Impact on Cost and Productivity

Strategies for Reducing Cost and Noise

Low-Hanging Fruit:

Advanced Solutions:

Chronosphere's Approach

Case Study: Affirm

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Control observability data cost & complexity for Amazon EKS workloads (KUB101)

Managing Observability in EKS Environments

Introduction

Observability Challenges in Kubernetes

Impact on Cost and Productivity

Strategies for Reducing Cost and Noise

Low-Hanging Fruit:

Advanced Solutions:

Chronosphere's Approach

Case Study: Affirm

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.