Skip to content

[receiver/prometheusreceiver] fix prom receiver config reload #40780

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

erikburt
Copy link

@erikburt erikburt commented Jun 17, 2025

Description

Fixing a bug where reloading the Prometheus config, caused config entries of type Secret to be replaced with the literal value of <secret>.

Link to tracking issue

Fixes #40520

Testing

Documentation

N/A

@erikburt erikburt requested review from dashpole, ArthurSens and a team as code owners June 17, 2025 18:14
Copy link

linux-foundation-easycla bot commented Jun 17, 2025

CLA Signed

The committers listed above are authorized under a signed CLA.

@github-actions github-actions bot added the receiver/prometheus Prometheus receiver label Jun 17, 2025
Comment on lines 75 to 80
if cfg.ContainsScrapeConfigs() {
return nil
}

// only reload if there are no scrape configs, implying that there is a target_allocator.
// reloading existing scrape configs corrupts authentication configurations
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if cfg.ContainsScrapeConfigs() {
return nil
}
// only reload if there are no scrape configs, implying that there is a target_allocator.
// reloading existing scrape configs corrupts authentication configurations

TL;DR;

Add these lines to reloadPromConfig before Marshal call instead at the top:

origMarshalSecretValue := commonconfig.MarshalSecretValue
commonconfig.MarshalSecretValue = true
defer func() {
	commonconfig.MarshalSecretValue = origMarshalSecretValue
}()

Longer:

I've tried to understand what's going on here and in the failing simplreprometheusreceiver.

The reason why reloadPromConfig fails on reload is that it's taking the already UnMarshaled Prometheus config, Marshals it and that Marshal replaces sensitive information with <secret>. So when it then loads the config again, the secret is wrong. So by commenting out this line we fix that.

And I tested that start/reload of the collector does work even if I have static scrape configs, since start/reload unmarshalls the whole collector config into a more generic collector config, which unmarshalls and loads the Prometheus config in it in a generic way. And the difference is that in that case reloadPromConfig takes a generic map[string]any as input , not the Prometheus config type, so the Prometheus marshalling doesn't kick in, meaning the secrets aren't covered up.

In the simpleprometheusreceiver case we are giving Prometheus config to the receiver directly without it being loaded from marshalled text, so it would be reloadPromConfig's job to load it, but that never happens.

The most dumb thing I can think of is to not create the Prometheus config in the simpleprometheusreceiver as a struct, but as a string and load it with Prometheus config load. Basically avoid having to use Marshal on the Prometheus config since that scrambles the secrets.

By the way, this means that currently secrets don't work at all in the simpleprometheusreceiver.

However I then realized that there's a global variable you can set to turn off marshalling secrets to secret

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure if we can have multiple receivers at the same time, maybe it's simpler to just set that variable to true all the time.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've discussed this in the OTEL Prometheus WG meeting.
The problem with using the global variable to not scramble the secrets is that concurrent goroutines might expose the secrets by mistake (e.g. /api/v1/status/config of Prometheus API). So we need to come up with a different solution here or in the simpleprometheusreceiver.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tested the current solution in the PR (early return) with the simpleprometheusreceiver for real (not in unit test), and I get the same error I we get in CI:

2025/06/19 14:42:21 collector server run finished with error: cannot start pipelines: failed to start "prometheus_simple" receiver: could not get scrape configs: scrape config cannot be fetched, main config was not validated and loaded correctly; should not happen

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've started to implement rendering YAML via basic types in the simpleprometheusreceiver so that I can use prometheus/config.Load on it, but it becomes complicated very fast when it comes to http config:

// Make a dumb representation so we can render the Prometheus
	// config into YAML without secrets being replaced with
	// <secret>. After that we can use the Prometheus code to load
	// the YAML properly.
	rawConfig := map[string]any{}
	rawConfig["global"] = config.DefaultGlobalConfig

	rawScrapeConfig := map[string]any{}
	rawScrapeConfig["scrape_interval"] = model.Duration(cfg.CollectionInterval)
	rawScrapeConfig["scrape_timeout"] = model.Duration(cfg.CollectionInterval)
	rawScrapeConfig["job_name"] = jobName
	rawScrapeConfig["honor_timestamps"] = true
	rawScrapeConfig["scheme"] = scheme
	rawScrapeConfig["metrics_path"] = cfg.MetricsPath
	rawScrapeConfig["params"] = cfg.Params
	rawScrapeConfig["static_configs"] = []map[string]any{
		{
			"targets": []string{cfg.Endpoint},
			"labels": labels,
		},
	}

	rawConfig["scrape_configs"] = []any{rawScrapeConfig}
	renderedConfig, err := yaml.Marshal(rawConfig)
	if err != nil {
		return nil, err
	}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Created issue in Prometheus for a long term solution: prometheus/prometheus#16756

Copy link
Author

@erikburt erikburt Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for taking the lead on this, I am well out of my depth here.

Feel free to close this PR.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From what I understood about the problem which could very well be wrong:

The issue is when marshalling the config, the fields of type Secret are replaced with a placeholder value, so they're not leaked.

The Reload() method was responsible for marshalling the config and then loading it again. This seems to not effect configs if they didn't contain credentials backed as type Secret. If the initial initial load works fine, is it possible to detect whether a config will obfuscate credentials during marshalling, and skip reloading/marshalling in those cases?

Copy link
Author

@erikburt erikburt Jun 19, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Attempted this solution in c8322a6

(It does fix the simpleprometheusreceiver unit tests)

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your workaround seems more elegant than mine, but I'm not sure if there's more usages of type Secret that haven't been accounted for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
receiver/prometheus Prometheus receiver
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[receiver/prometheusreceiver] basic authentication for urls stopped working
3 participants