Derive log facets
You can add a tokenizer to derive log facets during ingestion into the Kloudfuse observability platform.
In addition to auto-extracted log facets, the system can derive log facets during ingestion through a customized tokenizer. A typical situation for this is when you must capture a string in the log event as a named log facet that is not auto-derived.
When you parse this log line:
10.12.0.35 - - [26/May/2021:18:59:10 +0000] "GET /unavailable HTTP/1.1" 503 21 "-" "hey/0.0.1
Using this tokenizer:
'%{sourceIp} - - [%{timestamp}] "%{requestMethod} %{uri} %{_}" %{responseCode} %{contentLength}'
Kloudfuse generates these log facets: sourceIp: 10.12.0.35, requestMethod: GET, responseCode: 503, and contentLength: 21.
Apply tokenizer
You can apply the tokenizer to incoming log lines based on source and line filters by configuring the logs-parser values.yaml file, and then performing a helm upgrade.
This values.yaml demonstrates how to apply a conditional pattern to an incoming log event. Add the pipeline: section from this example, and any values that may already exist in the logs-parser section.
logs-parser: (1)
pipeline: (2)
configPath: "/conf" (3)
config: |- (4)
- nginx: (5)
- pipeline: (6)
- func: dissect (7)
params:
- tokenizer: '%{sourceIp} - - [%{timestamp}] "%{requestMethod} %{uri} %{_} %{responseCode} %{contentLength}' (8)
- pinot: (9)
- pipeline:
- if: 'msg contains "LLRealtimeSegmentDataManager_"' # (10)
then:
- func: dissect
params:
- tokenizer: '%{timestamp} %{level} [LLRealtimeSegmentDataManager_%{segment_name}]'
| 1 | logs-parser specifies the values for logs-parser helm chart, a sub-chart of the Kloudfuse stack. |
| 2 | pipeline represents the values for the logs-parser pipeline definition. This pipeline definition holds across all sources. A pipeline is a sequence of functions that are applied to an incoming log event to extract and process the log event. |
| 3 | configPath represents the path where the pipeline file is loaded into the logs-parser pod. |
| 4 | config represents the yaml that is dumped into the pipeline file at configPath. |
| 5 | nginx represents the pipeline applied to the events with label source=nginx. |
| 6 | pipeline represents the pipeline definition for the given source; in this case, nginx. |
| 7 | func: dissect instructs the logs-parser to apply the dissect function to the incoming log line. It applies the user-provided pattern to the incoming log line. |
| 8 | tokenizer: 'pattern' is a required argument to func: dissect. It defines the pattern. |
| 9 | pinot is another source-specific pipeline for source=pinot |
| 10 | if: 'msg contains line-filter' is an optional line filter that can apply a tokenizer conditionally on a line filter. |