Topk operator

Table of Contents

Syntax
Output
Behavior
Example: Top N across all results
Example: Top N per group
Example: Ranking by an aliased aggregation

The topk operator keeps the highest-ranked rows of an aggregation result. Use it to answer questions like "the top 5 hosts by error count" or "the top 3 endpoints by latency per service".

topk runs after an aggregation. It sorts the aggregated rows in descending order by a ranking field, keeps at most k rows, and adds a rank column starting at 1. With a by clause, topk ranks within each group independently and returns the top _k rows for every group.

Syntax

| topk(<k>, <ranking_field>)
| topk(<k>, <ranking_field>) by <field1>, <field2>, ...

none

<k>: Positive integer. The maximum number of rows to keep. Must be >= 1. If fewer rows exist than k, all rows are returned.
<ranking_field>: The field used to rank rows in descending order. This is typically an aggregation output column such as _count, or an aliased aggregation like errors from count as errors.
by <field1>, <field2>, …: Optional grouping fields. When present, topk partitions input rows by the listed fields and returns the top k rows within each group. Group order in the output follows the order in which each group is first seen in the input.

Output

topk appends a _rank column to the schema. _rank is 1 for the highest-ranked row in each result set (or per group), 2 for the next, and so on. Within a by group, _rank restarts at 1.

Rows are ordered:

Globally (no by clause): descending by <ranking_field>.
Per group (by clause): groups appear in input order; within each group, rows are descending by <ranking_field>.

Behavior

topk must appear after an aggregation operator such as count, sum, avg, or percentiles. Using topk before an aggregation returns an error.
Comparison is numeric when both values being compared are numeric (including mixed int and float values); otherwise the values are compared as strings.
Missing or null ranking values sort to the end, so rows with no value never appear in the top results unless k exceeds the number of non-null rows.
A topk with a value of <k> that is larger than the input set is not an error — topk returns all input rows.

Example: Top N across all results

Return the 5 sources producing the most logs:

* | count by source | topk(5, _count)

none

The result has at most 5 rows ranked by _count descending, with _rank values 1 through 5.

Example: Top N per group

Return the 3 most frequent error sources in each 5-minute window:

level="error"
| timeslice 5m
| count by (_timeslice, source)
| topk(3, _count) by _timeslice

none

Each _timeslice bucket has up to 3 rows, each carrying a _rank of 1, 2, or 3 for that bucket.

Example: Ranking by an aliased aggregation

topk can rank by any output column, including an aliased aggregation:

* | avg(durationMs) as p_avg by endpoint | topk(10, p_avg)

none

This returns the 10 slowest endpoints by average duration.