Skip to content

Prometheus Remote Write Exporter not sending data for 15min at a time #39987

Open
@obs-gh-mattcotter

Description

@obs-gh-mattcotter

Component(s)

exporter/prometheusremotewrite

Describe the issue you're reporting

We had an ongoing issue with the Prometheus Remote Write Exporter where all metrics would fail to send for 15 minutes at a time and then it would recover on its own. The published stats from the internal telemetry showed otelcol_exporter_sent_metric_points_total drop to 0. The queue size (otelcol_exporter_queue_size) remained at 0, and the http endpoint that the PRWE was configured to hit did not receive any traffic. We enabled debug logging, but the PRWE seems to not write any debug logs. The error message we saw is:

error	internal/queue_sender.go:46	Exporting failed. Dropping data.	{"otelcol.component.id": "prometheusremotewrite/observe", "otelcol.component.kind": "Exporter", "otelcol.signal": "metrics", "error": "Permanent error: Permanent error: context deadline exceeded; Permanent error: Permanent error: context deadline exceeded", "errorCauses": [{"error": "Permanent error: Permanent error: context deadline exceeded"}, {"error": "Permanent error: Permanent error: context deadline exceeded"}], "dropped_items": 638}
	/home/runner/work/observe-agent/observe-agent/vendor/go.opentelemetry.io/collector/exporter/exporterqueue/async_queue.go:47
go.opentelemetry.io/collector/exporter/exporterqueue.(*asyncQueue[...]).Start.func1
	/home/runner/work/observe-agent/observe-agent/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/internal/batcher/disabled_batcher.go:23
go.opentelemetry.io/collector/exporter/exporterhelper/internal/batcher.(*disabledBatcher[...]).Consume
	/home/runner/work/observe-agent/observe-agent/vendor/go.opentelemetry.io/collector/exporter/exporterhelper/internal/queue_sender.go:46
go.opentelemetry.io/collector/exporter/exporterhelper/internal.NewQueueSender.func1

This issue was present in v0.118.0, but we upgraded to v0.124.0 and it seems to be fixed. I am hoping to better understand the root cause since it was never listed as a bug fix under the PRWE (I am guessing it may have been related to an exporter helper fix). Can anyone help point me to what the fix might have been? Thank you!

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions