Description
Component(s)
receiver/filelog
What happened?
Description
A common use-case is to read a live uncompressed log, which is typically rotated on a daily basis into a compressed version with some date suffix.
Now, to support outages of the collector (e.g. let's say the service was stopped for 3 days) we'd like to have a wildcard entry on the include
statement to pick up all the logs, and backfill them across the duration of the outage.
As far as I can tell, it is not currently possible to support this use-case in the filelog receiver. We could potentially support the use case with two separate filelog receivers, however, this is actually a bad approach as every time the file is rotated, the entire batch of logs will be shipped again by the second scraper.
Steps to Reproduce
filelog/frr:
# wildcard to scoop up raw log `test.log` and rotated logs `test.log.0.gz
include: [ "/var/logs/test.log*" ]
start_at: beginning
# persistent storage for positions file
storage: file_storage/frr
Setup a receiver like this, and observe how it can happily ingest test.logs
. But when we rotate (mv to test.logs.0, then gzip) the new log is shipped, but with binary content in the string. So it is recognized as a new file and shipped twice (but with gzipped data).
Setting compression: gzip
results in the raw log not being ingester, and the following errors:
{"level":"error","ts":"2025-02-07T14:50:45.977+0100","caller":"reader/reader.go:82","msg":"failed to create gzip reader","service.name":"otelcol-osag","kind":"receiver","name":"filelog/test","data_type":"logs","component":"fileconsumer","path":"/tmp/test.log","error":"gzip: invalid header"}
Expected Result
I would expect this use-case to be supported. When the log is rotated and compressed, the compressed data should not be shipped again.
Actual Result
We cannot watch compressed and uncompressed logs at the same time.
Collector version
0.118.0
Environment information
Environment
Running on K8s. 1.30
OpenTelemetry Collector configuration
Log output
Additional context
This could probably be solved by inspecting the file MIME type, and inspecting the uncompressed fingerprint to determine if the file has been seen before.