Metrics rollup processes

The following diagram illustrates how Kloudfuse handles metrics, from life stream processing, to queries from dashboards and alerts. It describes the stages of ingestion, processing, calculation, storage, and query processing.

Metrics ingestion, processing, calculation, storage, and queries

Metrics ingestion, processing, and storage

Refer to the upper part of the Metrics ingestion, processing, calculation, storage, and queries diagram that illustrates the rollup workflow. The numbers in light blue circles correspond to these steps:

Kloudfuse gets time series data from your environment, either through agents or from cloud sources.
The Ingester Service pre-processes the data stream, and routes it to Kafka as kf_metrics_topic.
Kafka handles the same data stream in two parallel processes:
- Raw metrics
- Rollup metrics
Kafka forwards the kf_metrics_topic directly to Pinot.
Kafka uses kf_metrics_topic to extract rollup metrics:
It sends kf_metrics_topic to the Metrics Transformer.

The Metrics Transformer creates kf_metrics_rollup_topic to calculate aggregations and markers for each configured rollup resolution (by default: 5 min, 10 min, 30 min, 1 hour, and 4 hours), and sends it back to Kafka.

Kafka forwards kf_metrics_rollup_topic to Pinot.

Pinot handles the topics in the following manner:
- Raw metrics
- Rollup metrics
The Metrics Decoder receives kf_metrics_topic, performs necessary calculations, and writes it to table kf_metrics.

The table columns are: name (of metric), timestamp, labels, value, and le.

The Metrics Rollup Decoder receives the kf_metrics_rollup_topic, performs necessary calculations and aggregations, and writes it to the table kf_metrics_rollup.

The table columns are name (of metric), timestamp, labels, sum, count, min, max, counter, first, first_ts, and le.

Kloudfuse calculates the aggregations sum, count, min, and max over the raw values in the other table.

Kloudfuse uses both counter (last counter value that accounts for resets within the rollup window), first (first value encounter in the bucket), and first_ts (timestamp of first) to ensure data integrity.

Metrics queries

Refer to the lower part of the diagram that illustrates the query workflow.

The numbers in the dark blue circles correspond to these steps:

Kloudfuse gets a query request from a user interface.

This may be triggered by starting the Metrics interface, loading dashboards, changing and reloading dashboards and reports, changing the time picker values, and so on.
The Query Service selects the most appropriate rollup resolution based on the query’s step size and time range. For queries with a small step size or short time range, it reads from the raw table. For longer time ranges, it selects a coarser rollup resolution to reduce the amount of data scanned.
The Query Service receives results from:

Raw metrics

Table kf_metrics

Rollup metrics

Table kf_metrics_rollup

If the query spans a time range where only part of the data has rollup coverage, the query service automatically splits the query — using the rollup table for the newer portion and the raw table for the older portion.
The Query Service combines the results and returns them to the requesting UI.