Metrics roll up

Kloudfuse now supports roll up of metrics data, computed directly from the data stream.

To improve query performance and reduce loading times, Kloudfuse computes and aggregates metrics data in aggregated intervals directly from the data stream. Depending on the time span of the query, Kloudfuse calculates results either from raw data, or from rolled up data. In the shorter time spans, we continue to use raw metrics because the calculation approach could potentially smooth out the data and potentially miss important signals, such as outliers.

For the more general discussion of this feature, read these sections:

Benefits

The primary benefit of this approach is a reduced I/O cost, as Kloudfuse samples aggregate metrics instead of raw values. Query performance improves by these pre-calculated aggregates. And quicker calculation means faster loading results for dashboards and graphs. Additionally, it is relatively inexpensive to increase retention times for these aggregated metrics.

Consider the number of metrics that your system processes regularly. The following image is a plot of select monitored metrics as they appear in the Kloudfuse plane:

Select metrics from the Kloudfuse plane
Select metrics from the Kloudfuse plane

In situations where the raw data stream has intervals of 15 or 30 seconds, compare the number of records that each query processes with the number of records when using pre-aggregated metrics data at 5 minute interval, and at 10 minute interval. When the data stream is at 15 or 30 seconds, using rolled up (pre-aggregated) metrics at 5 minutes improves efficiency by reducing the data retrieval time by a factor of 20 or 10, respectively. With a roll up interval of 10 minutes, data retrieval performance improves by a factor of 40 or 20, respectively.

Disk access counts for raw metrics vs. rolled up metrics

Query Duration

Number of stored records

1 metric

200 metrics

Raw data

Rolled up data

Raw data

Rolled up data

15s

30s

5 min

10 min

15s

30s

5 min

10 min

6h

1,440

720

120

60

288 K

14 K

24 K

12 K

2d

11,520

5,760

960

480

2,304 K

1,152 K

19.2 K

96 K

7d

40,320

20,160

3,360

1,680

8,064 K

4,032 K

672 K

336 K

2w

80,640

40,320

6,720

3,360

16,126 K

8,063 K

1,344 K

672 K

1mo=30d

172,800

86,400

14,400

7,200

34,560 K

17,280 K

2,880 K

1,440 K

1y=365.25d

2,104 K

1,052 K

175 K

88 K

420,768 K

210,384 K

35,064 K

17,532 K

Drawbacks

In addition to some storage overhead that may potentially lead to adding disks when you plan to retain large amounts of historical data, metrics roll up uses more in-memory resources than working with raw metrics alone.