-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
Component(s)
processor/resourcedetection, processor/resourcedetection/internal/system
What happened?
Description
After introduction of the host cpuinfo attributes in #26533, the system
resource detection can fail catastrophically on Windows hosts, resulting in ALL configured system
resource attributes (Host name, Host ID, OS type, OS description ...) to become unavailable in all pipelines where the instance of resourcedection
processor is used.
The cause is a combination of:
cpuinfo
attribute collection is ALWAYS running on the processor'sStart()
phase, regardless of whether the cpuinfo attributes are configured to be added into the resource attributes.- The newly included external dependency introduced by the
cpuinfo
work in [processor/resourcedetection] Add support for host cpuinfo attributes #26533 uses a mechanism (WMI 1) for retrieving the CPU info that can often fail with a timeout (hence, thecontext deadline exceeded
error).
The issue is more likely to happen when the Otel collector starts up during host boot up (e.g. as a service launched by a service manager) as opposed to launching the Otel collector on demand after the Windows host is already running.
This due to parallelization of startup tasks (services) in the Operating System.
Steps to Reproduce
- Stop and disable the
winmgmgt
windows service to simulate the failure condition of not being able to collect the CPU Info:sc config winmgmt start= disabled
andnet stop winmgmt
- Run Otel Collector with a config that includes the
system
resourcedetection with at least one of the configs enabled (e.g. Host ID) in a pipeline. - Observe Otel Collector's logs
Expected Result
Otel Collector's logs contains the requested attribute in the resourcedetection processor's logs (e.g. Host ID)
e.g.
2024-06-26T13:31:37.391+1000 info internal/resourcedetection.go:125 began detecting resource information {"kind": "processor", "name": "resourcedetection", "pipeline": "traces"}
2024-06-26T13:31:38.698+1000 info internal/resourcedetection.go:139 detected resource information {"kind": "processor", "name": "resourcedetection", "pipeline": "traces", "resource": {"host.id":"5ac27508-7835-40fd-a8e3-541bb69b8f70","host.name":"DESKTOP-RHABMHR","os.type":"windows","service.name":"windows-dev"}}
Actual Result
Otel Collector's logs contain the error message 'failed getting host cpuinfo:
(NOTE: the simulated failure condition of completely disabling winmgmt
produces a slightly different exception instead of the 'context deadline exceeded'
error from a production system)
The requested attributes (e.g. Host ID) are missing from the resourcedetection processor's logs
e.g.
2024-06-26T13:38:26.012+1000 warn internal/resourcedetection.go:130 failed to detect resource {"kind": "processor", "name": "resourcedetection", "pipeline": "traces", "error": "failed getting host cpuinfo: Exception occurred. (The service cannot be started, either because it is disabled or because it has no enabled devices associated with it. )"}
2024-06-26T13:38:26.012+1000 info internal/resourcedetection.go:139 detected resource information {"kind": "processor", "name": "resourcedetection", "pipeline": "traces", "resource": {"service.name":"windows-dev"}}
Collector version
v0.103.1
Environment information
Environment
OS: Windows (10, 11, 2019, 2022)
OpenTelemetry Collector configuration
receivers:
otlp:
protocols:
grpc:
http:
processors:
resourcedetection:
detectors:
- system
- env
system:
resource_attributes:
host.id:
enabled: true
exporters:
logging:
service:
pipelines:
traces:
receivers:
- otlp
processors:
- resourcedetection
exporters:
- logging
Log output
2024-06-26T13:38:26.012+1000 warn internal/resourcedetection.go:130 failed to detect resource {"kind": "processor", "name": "resourcedetection", "pipeline": "traces", "error": "failed getting host cpuinfo:
Additional context
There should have been a Breaking Change note in the ChangeLog that makes all users of the resourcedetection
processor aware of the newly introduced hard dependency on the winmgmt
service.