Description
Component(s)
receiver/filelog
Is your feature request related to a problem? Please describe.
We've had a few users running the filelog receiver in environments where logs don't really "roll" traditionally.
Instead, these users are running in environments where a file is written to until some threshold (time, size), then a completely new file is created and written to instead. In these environments, the old files won't be cleaned up for a long time, and also won't be written to again.
When many old files exist in a directory, we experience high CPU utilization. It seems like we could reduce this utilization if we somehow didn't read the file and compare it's fingerprint every cycle, but instead only payed attention to files that grew in size, or had it's date modified.
In benchmarks, I've seen a 25% increase in performance by stat'ing files for mtime and size over reading a fingerprint.
Describe the solution you'd like
There are a couple of solutions that have been discussed:
- Tap into the sorting logic to allow using the modtime directly on the file, instead of only a regex.
- Allow fingerprinting logic to have configurable implementations, such that the current behavior (first N bytes) is used by default, but optionally you would be able to use a different implementation that allowed for filename to be used, and only read the file when the size of the file has changed.
Describe alternatives you've considered
The existing sorting logic works here for the most part, but it is:
- Difficult for users to configure (requires knowledge of filename pattern, specifying a regex, then specifying a date format)
- Not possible to use in instances where the timestamp is not in the file name
Additional context
No response