Rolling quantile
To calculate the rolling quantile advanced function, we use the quantile_over_time
function of the Prometheus engine.
To determine the upper and lower bounds, we adjust them using standard deviations. Instead of computing the standard deviation over the entire time range, we compute it over a specific window of past data.
When computing, we calculate the deviations over the original series and apply them to future events. This is because the standard deviations of the predicted bands are extremely close to zero. We chose this approach because quantiles are generally stable metrics.
lower = avg_over_time(
(quantile_over_time(0.16, $Query[5m:])
-$Bound*stddev_over_time($Query[5m:]))[5m:])
upper = avg_over_time(
(quantile_over_time(0.84, $Query[5m:])
+$Bound*stddev_over_time($Query[5m:]))[5m:])
In Dashboards
To use the kf_rolling_quantile
operator in a dashboard, apply the following function:
kf_rolling_quantile( \
${promql}, \ (1)
${window}, \ (2)
${bound}, \ (3)
${band} \ (4)
)`
1 | ${promql} : PromQL query to evaluate |
2 | ${window} : Number of milliseconds in the window (1,800,000 ms for a 30-minute window) |
3 | ${bound} : Number of standard deviations (stdv): 1, 2, or 3 |
4 | ${band} : 4 = lower band, 5 = upper band, 6 = both upper and lower bands |
Limitations
Because the operator uses quantiles to evaluate alert rules, the alert threshold gets triggered infrequently. It makes the rolling quantile less useful for alerts with a short window. A longer time window can mitigate this.
Next steps
For an in-depth discussion of the rolling quantile functions, see these external resources:
-
Quantile regression in StatsModels
-
histogram_quantile() in Prometheus querying functions