Handle millions of observability events with Apache Flink & Prometheus (OPN406)

Introduction to Observability

Observability gives visibility into a system, allowing for real-time troubleshooting and better customer experience.

Observability goes beyond just monitoring IT infrastructure and application - it's about observing the entire business.

The three pillars of observability are logs, traces, and metrics. This talk focuses on metrics and time series data.

Understanding Prometheus

Prometheus is a multi-dimensional time series database used for real-time visualization, alerting, and integration with various systems.

Prometheus is designed for operational metrics and provides high availability and data freshness over consistency.

Prometheus can be used for a variety of use cases beyond just IT infrastructure monitoring, such as IoT, manufacturing, and telecommunications.

Prometheus supports two main ways of ingesting data: pull-based scraping and push-based remote write.

Challenges with Prometheus for Observability

When dealing with high-cardinality and high-frequency data (e.g., IoT devices), Prometheus may face performance challenges when querying and storing the data.

There may be a need for pre-processing and enrichment of the raw data before writing to Prometheus to reduce cardinality and improve query performance.

Introducing Apache Flink

Apache Flink is a framework and distributed processing engine for stateful computation over unbounded and bounded data streams.

Flink provides a unified API for processing both bounded and unbounded data, making it well-suited for stream processing use cases.

Flink has a rich ecosystem of connectors that allow reading from and writing to various systems, including databases, message queues, and file systems.

Combining Flink and Prometheus

The built-in Flink Prometheus reporter is not suitable for high-scale observability use cases as it is designed to monitor the Flink application itself, not process external observability data.

Implementing a custom Prometheus remote write integration with Flink is possible but requires significant effort to handle batching, error handling, and other complexities.

The Flink Prometheus Connector

The Flink Prometheus connector is a new addition to the Flink ecosystem that simplifies the integration between Flink and Prometheus.

The connector fully implements the Prometheus remote write specification, optimizing for high-throughput writes and horizontal scalability.

The connector handles batching, retrying, and ordering of the data written to Prometheus, making it a suitable solution for high-scale observability use cases.

Demo: Connected Vehicles Use Case

The demo showcases a use case of processing observability data from a fleet of connected vehicles using Flink and Prometheus.

The pre-processor Flink application performs data enrichment, aggregation, and cardinality reduction before writing the processed metrics to Prometheus.

Compared to the raw event writer approach that directly writes to Prometheus, the pre-processor approach provides better performance and cost-efficiency when querying the data in Prometheus.

Conclusion

Combining Flink and Prometheus, enabled by the Flink Prometheus connector, unlocks the ability to observe and monitor widely distributed resources at scale, such as IoT devices, vehicles, or other systems.

The Flink Prometheus connector allows for efficient pre-processing and enrichment of observability data before writing to Prometheus, improving query performance and cost-effectiveness.

The resources provided (documentation, demo code, and managed service links) can help developers get started with this solution.

Handle millions of observability events with Apache Flink & Prometheus (OPN406)

Introduction to Observability

Understanding Prometheus

Challenges with Prometheus for Observability

Introducing Apache Flink

Combining Flink and Prometheus

The Flink Prometheus Connector

Demo: Connected Vehicles Use Case

Conclusion

Your Digital Journey deserves a great story.

Build one with us.

Headquarters

Delivery Centre

Handle millions of observability events with Apache Flink & Prometheus (OPN406)

Introduction to Observability

Understanding Prometheus

Challenges with Prometheus for Observability

Introducing Apache Flink

Combining Flink and Prometheus

The Flink Prometheus Connector

Demo: Connected Vehicles Use Case

Conclusion

Your Digital Journey deserves a great story.

Build one with us.

This website stores cookies on your computer.