Skip to content

[resourcedetection] windows: Error 'failed getting host cpuinfo: context deadline exceeded' #33768

@cwegener

Description

@cwegener

Component(s)

processor/resourcedetection, processor/resourcedetection/internal/system

What happened?

Description

After introduction of the host cpuinfo attributes in #26533, the system resource detection can fail catastrophically on Windows hosts, resulting in ALL configured system resource attributes (Host name, Host ID, OS type, OS description ...) to become unavailable in all pipelines where the instance of resourcedection processor is used.

The cause is a combination of:

  1. cpuinfo attribute collection is ALWAYS running on the processor's Start() phase, regardless of whether the cpuinfo attributes are configured to be added into the resource attributes.
  2. The newly included external dependency introduced by the cpuinfo work in [processor/resourcedetection] Add support for host cpuinfo attributes #26533 uses a mechanism (WMI 1) for retrieving the CPU info that can often fail with a timeout (hence, the context deadline exceeded error).

The issue is more likely to happen when the Otel collector starts up during host boot up (e.g. as a service launched by a service manager) as opposed to launching the Otel collector on demand after the Windows host is already running.
This due to parallelization of startup tasks (services) in the Operating System.

Steps to Reproduce

  1. Stop and disable the winmgmgt windows service to simulate the failure condition of not being able to collect the CPU Info: sc config winmgmt start= disabled and net stop winmgmt
  2. Run Otel Collector with a config that includes the system resourcedetection with at least one of the configs enabled (e.g. Host ID) in a pipeline.
  3. Observe Otel Collector's logs

Expected Result

Otel Collector's logs contains the requested attribute in the resourcedetection processor's logs (e.g. Host ID)

e.g.

2024-06-26T13:31:37.391+1000    info    internal/resourcedetection.go:125       began detecting resource information   {"kind": "processor", "name": "resourcedetection", "pipeline": "traces"}
2024-06-26T13:31:38.698+1000    info    internal/resourcedetection.go:139       detected resource information   {"kind": "processor", "name": "resourcedetection", "pipeline": "traces", "resource": {"host.id":"5ac27508-7835-40fd-a8e3-541bb69b8f70","host.name":"DESKTOP-RHABMHR","os.type":"windows","service.name":"windows-dev"}}

Actual Result

Otel Collector's logs contain the error message 'failed getting host cpuinfo:

(NOTE: the simulated failure condition of completely disabling winmgmt produces a slightly different exception instead of the 'context deadline exceeded' error from a production system)

The requested attributes (e.g. Host ID) are missing from the resourcedetection processor's logs

e.g.

2024-06-26T13:38:26.012+1000    warn    internal/resourcedetection.go:130       failed to detect resource       {"kind": "processor", "name": "resourcedetection", "pipeline": "traces", "error": "failed getting host cpuinfo: Exception occurred. (The service cannot be started, either because it is disabled or because it has no enabled devices associated with it. )"}
2024-06-26T13:38:26.012+1000    info    internal/resourcedetection.go:139       detected resource information   {"kind": "processor", "name": "resourcedetection", "pipeline": "traces", "resource": {"service.name":"windows-dev"}}

Collector version

v0.103.1

Environment information

Environment

OS: Windows (10, 11, 2019, 2022)

OpenTelemetry Collector configuration

receivers:
  otlp:
    protocols:
      grpc:
      http:
processors:
  resourcedetection:
    detectors:
      - system
      - env
    system:
      resource_attributes:
        host.id:
          enabled: true
exporters:
  logging:
service:
  pipelines:
    traces:
      receivers:
        - otlp
      processors:
        - resourcedetection
      exporters:
        - logging

Log output

2024-06-26T13:38:26.012+1000    warn    internal/resourcedetection.go:130       failed to detect resource       {"kind": "processor", "name": "resourcedetection", "pipeline": "traces", "error": "failed getting host cpuinfo:

Additional context

There should have been a Breaking Change note in the ChangeLog that makes all users of the resourcedetection processor aware of the newly introduced hard dependency on the winmgmt service.

Footnotes

  1. https://github.com/shirou/gopsutil/blob/e74324b6a726997ce756b8f79dbbd7a3a0999ba0/cpu/cpu_windows.go#L98-L127

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions