Topk operator
The topk operator keeps the highest-ranked rows of an aggregation result. Use it to answer questions like "the top 5 hosts by error count" or "the top 3 endpoints by latency per service".
topk runs after an aggregation. It sorts the aggregated rows in descending order by a ranking field, keeps at most k rows, and adds a rank column starting at 1. With a by clause, topk ranks within each group independently and returns the top _k rows for every group.
Syntax
| topk(<k>, <ranking_field>)
| topk(<k>, <ranking_field>) by <field1>, <field2>, ...
<k>-
Positive integer. The maximum number of rows to keep. Must be
>= 1. If fewer rows exist thank, all rows are returned. <ranking_field>-
The field used to rank rows in descending order. This is typically an aggregation output column such as
_count, or an aliased aggregation likeerrorsfromcount as errors. by <field1>, <field2>, …-
Optional grouping fields. When present,
topkpartitions input rows by the listed fields and returns the top k rows within each group. Group order in the output follows the order in which each group is first seen in the input.
Output
topk appends a _rank column to the schema. _rank is 1 for the highest-ranked row in each result set (or per group), 2 for the next, and so on. Within a by group, _rank restarts at 1.
Rows are ordered:
-
Globally (no
byclause): descending by<ranking_field>. -
Per group (
byclause): groups appear in input order; within each group, rows are descending by<ranking_field>.
Behavior
-
topkmust appear after an aggregation operator such ascount,sum,avg, orpercentiles. Usingtopkbefore an aggregation returns an error. -
Comparison is numeric when both values being compared are numeric (including mixed
intandfloatvalues); otherwise the values are compared as strings. -
Missing or
nullranking values sort to the end, so rows with no value never appear in the top results unlesskexceeds the number of non-null rows. -
A
topkwith a value of<k>that is larger than the input set is not an error —topkreturns all input rows.
Example: Top N across all results
Return the 5 sources producing the most logs:
* | count by source | topk(5, _count)
The result has at most 5 rows ranked by _count descending, with _rank values 1 through 5.