-
Notifications
You must be signed in to change notification settings - Fork 2.8k
[receiver/prometheusreceiver] fix prom receiver config reload #40780
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
if cfg.ContainsScrapeConfigs() { | ||
return nil | ||
} | ||
|
||
// only reload if there are no scrape configs, implying that there is a target_allocator. | ||
// reloading existing scrape configs corrupts authentication configurations |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if cfg.ContainsScrapeConfigs() { | |
return nil | |
} | |
// only reload if there are no scrape configs, implying that there is a target_allocator. | |
// reloading existing scrape configs corrupts authentication configurations |
TL;DR;
Add these lines to reloadPromConfig before Marshal call instead at the top:
origMarshalSecretValue := commonconfig.MarshalSecretValue
commonconfig.MarshalSecretValue = true
defer func() {
commonconfig.MarshalSecretValue = origMarshalSecretValue
}()
Longer:
I've tried to understand what's going on here and in the failing simplreprometheusreceiver.
The reason why reloadPromConfig fails on reload is that it's taking the already UnMarshaled Prometheus config, Marshals it and that Marshal replaces sensitive information with <secret>
. So when it then loads the config again, the secret is wrong. So by commenting out this line we fix that.
And I tested that start/reload of the collector does work even if I have static scrape configs, since start/reload unmarshalls the whole collector config into a more generic collector config, which unmarshalls and loads the Prometheus config in it in a generic way. And the difference is that in that case reloadPromConfig takes a generic map[string]any
as input , not the Prometheus config type, so the Prometheus marshalling doesn't kick in, meaning the secrets aren't covered up.
In the simpleprometheusreceiver case we are giving Prometheus config to the receiver directly without it being loaded from marshalled text, so it would be reloadPromConfig's job to load it, but that never happens.
The most dumb thing I can think of is to not create the Prometheus config in the simpleprometheusreceiver as a struct, but as a string and load it with Prometheus config load. Basically avoid having to use Marshal on the Prometheus config since that scrambles the secrets.
By the way, this means that currently secrets don't work at all in the simpleprometheusreceiver.
However I then realized that there's a global variable you can set to turn off marshalling secrets to secret
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure if we can have multiple receivers at the same time, maybe it's simpler to just set that variable to true all the time.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've discussed this in the OTEL Prometheus WG meeting.
The problem with using the global variable to not scramble the secrets is that concurrent goroutines might expose the secrets by mistake (e.g. /api/v1/status/config of Prometheus API). So we need to come up with a different solution here or in the simpleprometheusreceiver.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've tested the current solution in the PR (early return) with the simpleprometheusreceiver for real (not in unit test), and I get the same error I we get in CI:
2025/06/19 14:42:21 collector server run finished with error: cannot start pipelines: failed to start "prometheus_simple" receiver: could not get scrape configs: scrape config cannot be fetched, main config was not validated and loaded correctly; should not happen
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've started to implement rendering YAML via basic types in the simpleprometheusreceiver so that I can use prometheus/config.Load on it, but it becomes complicated very fast when it comes to http config:
// Make a dumb representation so we can render the Prometheus
// config into YAML without secrets being replaced with
// <secret>. After that we can use the Prometheus code to load
// the YAML properly.
rawConfig := map[string]any{}
rawConfig["global"] = config.DefaultGlobalConfig
rawScrapeConfig := map[string]any{}
rawScrapeConfig["scrape_interval"] = model.Duration(cfg.CollectionInterval)
rawScrapeConfig["scrape_timeout"] = model.Duration(cfg.CollectionInterval)
rawScrapeConfig["job_name"] = jobName
rawScrapeConfig["honor_timestamps"] = true
rawScrapeConfig["scheme"] = scheme
rawScrapeConfig["metrics_path"] = cfg.MetricsPath
rawScrapeConfig["params"] = cfg.Params
rawScrapeConfig["static_configs"] = []map[string]any{
{
"targets": []string{cfg.Endpoint},
"labels": labels,
},
}
rawConfig["scrape_configs"] = []any{rawScrapeConfig}
renderedConfig, err := yaml.Marshal(rawConfig)
if err != nil {
return nil, err
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Created issue in Prometheus for a long term solution: prometheus/prometheus#16756
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for taking the lead on this, I am well out of my depth here.
Feel free to close this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
From what I understood about the problem which could very well be wrong:
The issue is when marshalling the config, the fields of type Secret
are replaced with a placeholder value, so they're not leaked.
The Reload()
method was responsible for marshalling the config and then loading it again. This seems to not effect configs if they didn't contain credentials backed as type Secret
. If the initial initial load works fine, is it possible to detect whether a config will obfuscate credentials during marshalling, and skip reloading/marshalling in those cases?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Attempted this solution in c8322a6
(It does fix the simpleprometheusreceiver unit tests)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Your workaround seems more elegant than mine, but I'm not sure if there's more usages of type Secret
that haven't been accounted for.
Description
Fixing a bug where reloading the Prometheus config, caused config entries of type Secret to be replaced with the literal value of
<secret>
.Link to tracking issue
Fixes #40520
Testing
Reload()
method doesn't mask secrets for existingscrape_configs
target_allocator
config, which was the original reason for theReload()
method, which introduced the bug (https://github.com/open-telemetry/opentelemetry-collector-contrib/pull/40103/files#r2097520578)Error: cannot start pipelines: failed to start "prometheus" receiver: Get "http://my-targetallocator-service/scrape_configs": dial tcp: lookup my-targetallocator-service: no such host
which matches existing behaviour for thetarget_allocator
configDocumentation
N/A