Rolling quantile

To calculate the rolling quantile advanced function, we use the quantile_over_time function of the Prometheus engine.

To determine the upper and lower bounds, we adjust them using standard deviations. Instead of computing the standard deviation over the entire time range, we compute it over a specific window of past data.

When computing, we calculate the deviations over the original series and apply them to future events. This is because the standard deviations of the predicted bands are extremely close to zero. We chose this approach because quantiles are generally stable metrics.

Calculating the lower band
lower = avg_over_time(
  (quantile_over_time(0.16, $Query[5m:])
  -$Bound*stddev_over_time($Query[5m:]))[5m:])
code
Calculating the upper band
upper = avg_over_time(
  (quantile_over_time(0.84, $Query[5m:])
  +$Bound*stddev_over_time($Query[5m:]))[5m:])
code

In Dashboards

To use the kf_rolling_quantile operator in a dashboard, apply the following function:

kf_rolling_quantile( \
  ${promql}, \ (1)
  ${window}, \ (2)
  ${bound}, \ (3)
  ${band} \ (4)
)`
code
1 ${promql}: PromQL query to evaluate
2 ${window}: Number of milliseconds in the window (1,800,000 ms for a 30-minute window)
3 ${bound}: Number of standard deviations (stdv): 1, 2, or 3
4 ${band}: 4 = lower band, 5 = upper band, 6 = both upper and lower bands

Limitations

Because the operator uses quantiles to evaluate alert rules, the alert threshold gets triggered infrequently. It makes the rolling quantile less useful for alerts with a short window. A longer time window can mitigate this.

Next steps

For an in-depth discussion of the rolling quantile functions, see these external resources: