Skip to content

Filelog receiver does not support mixed compression use case #37772

Closed
@verejoel

Description

@verejoel

Component(s)

receiver/filelog

What happened?

Description

A common use-case is to read a live uncompressed log, which is typically rotated on a daily basis into a compressed version with some date suffix.

Now, to support outages of the collector (e.g. let's say the service was stopped for 3 days) we'd like to have a wildcard entry on the include statement to pick up all the logs, and backfill them across the duration of the outage.

As far as I can tell, it is not currently possible to support this use-case in the filelog receiver. We could potentially support the use case with two separate filelog receivers, however, this is actually a bad approach as every time the file is rotated, the entire batch of logs will be shipped again by the second scraper.

Steps to Reproduce

filelog/frr:
    # wildcard to scoop up raw log `test.log` and rotated logs `test.log.0.gz
    include: [ "/var/logs/test.log*" ]
    start_at: beginning
    # persistent storage for positions file
    storage: file_storage/frr

Setup a receiver like this, and observe how it can happily ingest test.logs. But when we rotate (mv to test.logs.0, then gzip) the new log is shipped, but with binary content in the string. So it is recognized as a new file and shipped twice (but with gzipped data).

Setting compression: gzip results in the raw log not being ingester, and the following errors:

{"level":"error","ts":"2025-02-07T14:50:45.977+0100","caller":"reader/reader.go:82","msg":"failed to create gzip reader","service.name":"otelcol-osag","kind":"receiver","name":"filelog/test","data_type":"logs","component":"fileconsumer","path":"/tmp/test.log","error":"gzip: invalid header"}

Expected Result

I would expect this use-case to be supported. When the log is rotated and compressed, the compressed data should not be shipped again.

Actual Result

We cannot watch compressed and uncompressed logs at the same time.

Collector version

0.118.0

Environment information

Environment

Running on K8s. 1.30

OpenTelemetry Collector configuration

Log output

Additional context

This could probably be solved by inspecting the file MIME type, and inspecting the uncompressed fingerprint to determine if the file has been seen before.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions