Reading multiline log files using OpenTelemetry Collector’s logfilereceiver

Imagine having a log file with the following content that we want to log to a Grafana Loki or OpenSearch or similar tool to start analyzing the logs in real-time using OpenTelemetry:

[2024-10-29 20:50:10.687] DEBUG Func process
[2024-10-29 20:52:10.687] DEBUG Func process
[2024-10-29 20:57:10.687] DEBUG Func analyzing
Function() row 6, col 12
http GET, 200

The only thing we need to do is to setup our OpenTelemetry Collector to use the filelogreceiver like this.

receivers:
  filelog:
    include: [ 'mylogfiles*' ]
    multiline:
      line_start_pattern: '^\[\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3}\]'
    operators:
      - type: regex_parser 
        regex: '\[(?P<time>\d{4}-\d{2}-\d{2} \d{2}:\d{2}:\d{2}.\d{3})\] (?P<sev>[A-Z]*) (?P<msg>[\s\S]*?\z)'
        timestamp:
          parse_from: attributes.time
          layout: '%Y-%m-%d %H:%M:%S.%f'
        severity:
          parse_froM: attributes.sev

That tells the OTEL Collector that we have multiline log rows using the multiline attribute, where each line starts with a date time wrapped with square brackets and also we are then using a regex_parser as an operator for each of the log rows.

Also, since we can use regex groups, we can give each group a name, such as ?P<time> which will later be available using the parse_from attribute using attributes.time.

If everyhing works as it should, we can start to use the OTEL Collector to log all messages from our log files and we can get it into Grafana or similar tools.