Topk operator

The topk operator keeps the highest-ranked rows of an aggregation result. Use it to answer questions like "the top 5 hosts by error count" or "the top 3 endpoints by latency per service".

topk runs after an aggregation. It sorts the aggregated rows in descending order by a ranking field, keeps at most k rows, and adds a rank column starting at 1. With a by clause, topk ranks within each group independently and returns the top _k rows for every group.

Syntax

| topk(<k>, <ranking_field>)
| topk(<k>, <ranking_field>) by <field1>, <field2>, ...
none
<k>

Positive integer. The maximum number of rows to keep. Must be >= 1. If fewer rows exist than k, all rows are returned.

<ranking_field>

The field used to rank rows in descending order. This is typically an aggregation output column such as _count, or an aliased aggregation like errors from count as errors.

by <field1>, <field2>, …​

Optional grouping fields. When present, topk partitions input rows by the listed fields and returns the top k rows within each group. Group order in the output follows the order in which each group is first seen in the input.

Output

topk appends a _rank column to the schema. _rank is 1 for the highest-ranked row in each result set (or per group), 2 for the next, and so on. Within a by group, _rank restarts at 1.

Rows are ordered:

  • Globally (no by clause): descending by <ranking_field>.

  • Per group (by clause): groups appear in input order; within each group, rows are descending by <ranking_field>.

Behavior

  • topk must appear after an aggregation operator such as count, sum, avg, or percentiles. Using topk before an aggregation returns an error.

  • Comparison is numeric when both values being compared are numeric (including mixed int and float values); otherwise the values are compared as strings.

  • Missing or null ranking values sort to the end, so rows with no value never appear in the top results unless k exceeds the number of non-null rows.

  • A topk with a value of <k> that is larger than the input set is not an error — topk returns all input rows.

Example: Top N across all results

Return the 5 sources producing the most logs:

* | count by source | topk(5, _count)
none

The result has at most 5 rows ranked by _count descending, with _rank values 1 through 5.

Example: Top N per group

Return the 3 most frequent error sources in each 5-minute window:

level="error"
| timeslice 5m
| count by (_timeslice, source)
| topk(3, _count) by _timeslice
none

Each _timeslice bucket has up to 3 rows, each carrying a _rank of 1, 2, or 3 for that bucket.

Example: Ranking by an aliased aggregation

topk can rank by any output column, including an aliased aggregation:

* | avg(durationMs) as p_avg by endpoint | topk(10, p_avg)
none

This returns the 10 slowest endpoints by average duration.