Derive log facets

You can add a tokenizer to derive log facets during ingestion into the Kloudfuse observability platform.

In addition to auto-extracted log facets, the system can derive log facets during ingestion through a customized tokenizer. A typical situation for this is when you must capture a string in the log event as a named log facet that is not auto-derived.

Use tokenizer to derive a log event at ingestion

When you parse this log line:

10.12.0.35 - - [26/May/2021:18:59:10 +0000] "GET /unavailable HTTP/1.1" 503 21 "-" "hey/0.0.1

Using this tokenizer:

'%{sourceIp} - - [%{timestamp}] "%{requestMethod} %{uri} %{_}" %{responseCode} %{contentLength}'

Kloudfuse generates these log facets: sourceIp: 10.12.0.35, requestMethod: GET, responseCode: 503, and contentLength: 21.

Apply tokenizer

You can apply the tokenizer to incoming log lines based on source and line filters by configuring the logs-parser values.yaml file, and then performing a helm upgrade.

This values.yaml demonstrates how to apply a conditional pattern to an incoming log event. Add the pipeline: section from this example, and any values that may already exist in the logs-parser section.

Apply conditional pattern to log events
logs-parser: (1)
  pipeline: (2)
    configPath: "/conf" (3)
    config: |- (4)
      - nginx: (5)
        - pipeline: (6)
          - func: dissect (7)
            params:
              - tokenizer: '%{sourceIp} - - [%{timestamp}] "%{requestMethod} %{uri} %{_} %{responseCode} %{contentLength}' (8)
      - pinot: (9)
        - pipeline:
          - if: 'msg contains "LLRealtimeSegmentDataManager_"' #  (10)
            then:
              - func: dissect
                params:
                  - tokenizer: '%{timestamp} %{level} [LLRealtimeSegmentDataManager_%{segment_name}]'
1 logs-parser specifies the values for logs-parser helm chart, a sub-chart of the Kloudfuse stack.
2 pipeline represents the values for the logs-parser pipeline definition. This pipeline definition holds across all sources. A pipeline is a sequence of functions that are applied to an incoming log event to extract and process the log event.
3 configPath represents the path where the pipeline file is loaded into the logs-parser pod.
4 config represents the yaml that is dumped into the pipeline file at configPath.
5 nginx represents the pipeline applied to the events with label source=nginx.
6 pipeline represents the pipeline definition for the given source; in this case, nginx.
7 func: dissect instructs the logs-parser to apply the dissect function to the incoming log line. It applies the user-provided pattern to the incoming log line.
8 tokenizer: 'pattern' is a required argument to func: dissect. It defines the pattern.
9 pinot is another source-specific pipeline for source=pinot
10 if: 'msg contains line-filter' is an optional line filter that can apply a tokenizer conditionally on a line filter.