Metrics roll up processes

The following diagram illustrates how Kloudfuse handles metrics, from life stream processing, to queries from dashboards and alerts. It describes the stages of ingestion, processing, calculation, storage, and query processing.

metrics rollup architecture
Metrics ingestion, processing, calculation, storage, and queries

Metrics ingestion, processing, and storage

Refer to the upper part of the Metrics ingestion, processing, calculation, storage, and queries diagram that illustrates the rollup workflow. The numbers in light blue circles correspond to these steps:

  1. Kloudfuse gets time series data from your environment, either through agents or from cloud sources.

  2. The Ingester Service pre-processes the data stream, and routes it to Kafka as kf_metrics_topic.

  3. Kafka handles the same data stream in two parallel processes:

    • Raw metrics

    • Rollup metrics

    Kafka forwards the kf_metrics_topic directly to Pinot.

    Kafka uses kf_metrics_topic to extract roll-up metrics:

    1. It sends kf_metrics_topic to the Metrics Transformer.

    2. The Metrics Transformer creates kf_metrics_rollup_topic to calculate aggregations and markers for the specified intervals (default: 5 minutes), and sends it back to Kafka.

    3. Kafka forwards kf_metrics_rollup_topic to Pinot.

  1. Pinot handles the topics in the following manner:

    • Raw metrics

    • Rollup metrics

    The Metrics Decoder receives kf_metrics_topic, performs necessary calculations, and writes it to table kf_metrics.

    The table columns are: name (of metric), timestamp, labels, value, and le.

    The Metrics Rollup Decoder receives the kf_metrics_rollup_topic, performs necessary calculations and aggregations, and writes it to the table kf_metrics_rollup.

    The table columns are name (of metric), timestamp, labels, sum, count, min, max, counter, first, first_ts, and le.

    Kloudfuse calculates the aggregations sum, count, min, and max over the raw values in the other table.

    Kloudfuse uses both counter (last counter value that accounts for resets within the rollup window), first (first value encounter in the bucket), and first_ts (timestamp of first) to ensure data integrity.

Metrics queries

Refer to the upper part of the xref#diagram[diagram] that illustrates the rollup workflow.

The numbers in the dark blue circles correspond to these steps:

  1. Kloudfuse gets a query request from a user interface.

    This may be triggered by starting the Metrics interface, loading dashboards, changing and reloading dashboards and reports, changing the time picker values, and so on.

  2. The Query Service determines the source table for reading the metrics, based on the time interval (more or less than 2 days) or step size (more or less than 5 minutes), and issues the appropriate read requests.

  3. The Query Service receives results for all queries from:

    Raw metrics

    Table kf_metrics

    Rollup metrics

    Table kf_metrics_rollup

  4. The Query combines the results and forwards it to the original requesting UI.