Skip to content

Unable to remove items from persistent queue storage if the storage is full #7198

@swiatekm

Description

@swiatekm

Describe the bug
The persistent queue removes items from storage after they're successfully exported. This removal happens in a transaction which also updates the list of currently dispatched items. Depending on the implementation details of the underlying storage, this transaction may fail if the storage device is full.

As a result, we can take items out of the queue, but they're not actually removed from the storage, and no new items can be put in.

Steps to reproduce
See the unit test in the linked PR.

Additional context
I've confirmed that filestorage can behave this way via the following test: open-telemetry/opentelemetry-collector-contrib@dbe3105. I suspect that this will be true of any transactional storage engine, as some amount of transaction data needs to be persisted to disk before it can be committed.

How often this can happen in practice is difficult to estimate. It depends heavily on how the size of queue items aligns with available disk space. Anecdotally, I've seen it happen during an incident, on a volume with multiple queues sharing space.

Metadata

Metadata

Assignees

Labels

bugSomething isn't working

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions