Description
Component(s)
cmd/opampsupervisor
What happened?
Description
While writing some tests that run the Supervisor with race detection on I got this report:
==================
WARNING: DATA RACE
Read at 0x00c00038bf18 by goroutine 466:
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor.(*Supervisor).runAgentProcess()
/Users/douglas/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/cmd/[email protected]/supervisor/supervisor.go:1392 +0xe80
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor.(*Supervisor).Start.func1()
/Users/douglas/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/cmd/[email protected]/supervisor/supervisor.go:344 +0x78
Previous write at 0x00c00038bf18 by goroutine 497:
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor.(*Supervisor).handleAgentOpAMPMessage()
/Users/douglas/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/cmd/[email protected]/supervisor/supervisor.go:774 +0x720
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor.(*Supervisor).handleAgentOpAMPMessage-fm()
<autogenerated>:1 +0x4c
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor.flattenedSettings.OnMessage()
/Users/douglas/go/pkg/mod/github.com/open-telemetry/opentelemetry-collector-contrib/cmd/[email protected]/supervisor/server.go:57 +0x6c
github.com/open-telemetry/opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor.flattenedSettings.OnMessage-fm()
<autogenerated>:1 +0x30
github.com/open-telemetry/opamp-go/server.(*server).handleWSConnection()
/Users/douglas/go/pkg/mod/github.com/open-telemetry/[email protected]/server/serverimpl.go:253 +0x3cc
github.com/open-telemetry/opamp-go/server.(*server).httpHandler.gowrap1()
/Users/douglas/go/pkg/mod/github.com/open-telemetry/[email protected]/server/serverimpl.go:204 +0x64
This is pointing to an access to Supervisor.lastHealthFromClient
from inside Supervisor.runAgentProcess()
, which runs in its own go-routine:
opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor/supervisor.go
Lines 1391 to 1396 in d71d656
And a concurrent access also here, from a different go-routine:
opentelemetry-collector-contrib/cmd/opampsupervisor/supervisor/supervisor.go
Lines 772 to 779 in d71d656
This happens without any synchronization, so a data race is possible.
Steps to Reproduce
Set up a Supervisor in a test, start an OpAMP server and let them talk. I think this should be enough. Here's a small snippet that I am using to start the Supervisor:
func setupSupervisor(t *testing.T, opampServerHostPort string) (*supervisor.Supervisor, string) {
t.Helper()
supervisorStorageDir := t.TempDir()
otelcolBinName := "otelcol-contrib"
otelcolBinPath := filepath.Join("..", "..", "bin", otelcolBinName)
if _, err := os.Stat(otelcolBinPath); err != nil {
t.Fatalf("failed to setup supervisor: %v", err)
}
supervisorConfig := supervisorconfig.Supervisor{
Server: supervisorconfig.OpAMPServer{
Endpoint: "http://" + opampServerHostPort + "/v1/opamp",
},
Capabilities: supervisorconfig.Capabilities{
AcceptsRemoteConfig: true,
AcceptsRestartCommand: true,
AcceptsOpAMPConnectionSettings: true,
ReportsEffectiveConfig: true,
ReportsOwnMetrics: true,
ReportsOwnLogs: true,
ReportsOwnTraces: true,
ReportsHealth: true,
ReportsRemoteConfig: true,
ReportsAvailableComponents: true,
},
Agent: supervisorconfig.Agent{
Executable: otelcolBinPath,
ConfigApplyTimeout: 3 * time.Second,
OrphanDetectionInterval: 3 * time.Second,
BootstrapTimeout: 3 * time.Second,
Arguments: []string{"--feature-gates=+service.AllowNoPipelines"},
},
Storage: supervisorconfig.Storage{
Directory: supervisorStorageDir,
},
}
logger, err := zap.NewDevelopment()
if err != nil {
t.Fatalf("failed to create logger: %v", err)
}
sup, err := supervisor.NewSupervisor(logger, supervisorConfig)
if err != nil {
t.Fatalf("failed to create supervisor: %v", err)
}
t.Cleanup(sup.Shutdown)
return sup, supervisorStorageDir
}
Expected Result
No data race. Any data potentially accessed concurrently by different go-routines is protected by mutexes or through the use of atomics.
Actual Result
A data race happens :(
Collector version
Environment information
Environment
OS: macOS
Compiler(if manually compiled): go version go1.24.3 darwin/arm64
OpenTelemetry Collector configuration
Irrelevant.
Log output
Additional context
No response