Skip to content

[receiver/windowsperfcounters] When collecting instances with multiple matches, data is lost #32319

Closed
@alxbl

Description

@alxbl

Component(s)

receiver/windowsperfcounters

What happened?

Description

Whenever a multi instance counter is scraped and there are multiple instances with the same name (e.g. Process\ID Process for notepad.exe) the receiver scrapes all instances, but puts the exact same label value in instance. This is incompatible with most backends as the metrics will be treated as the same time series and either aggregated or only the last datapoint will be kept.

The behavior also does not match what PerfMon shows, which would be notepad and notepad#1 in my example above.

Steps to Reproduce

  • Start Notepad.exe as your normal user
  • Start Notepad.exe as an administrator (to ensure you have 2 different Notepad.exe PIDs on Windows 11)
  • Use the provided configuration file (modify as needed)
    • Mimir optional, I was testing another issue with prometheusremotewrite

Expected Result

Windows Performance Monitor handles this by concatenating the instance name with its index when there are multiple occurrences of the same instance (usually when multiple instances of a process are running):

image

  • Metrics for instances notepad and notepad_1 as shown in Windows Performance Monitor
  • Two time series with each PID value

Actual Result

  • Two data points for instance notepad combined in the same time series:

Collector version

0.97

Environment information

Environment

Windows 11
go 1.22 on Ubuntu 22.04 (GOOS=windows)

OpenTelemetry Collector configuration

receivers:
  windowsperfcounters:
    metrics:
      process.pid:
        gauge:
    collection_interval: 5s
    perfcounters:
      - object: Process
        instances: "note*"
        counters:
          - name: "ID Process"
            metric: process.pid
processors:
  batch:
  memory_limiter:
    check_interval: 1s
    limit_mib: 500
    spike_limit_mib: 100
extensions:
exporters:
  prometheusremotewrite:
    endpoint: http://mimir:9009/api/v1/push
    tls:
      insecure: true

  debug:
    verbosity: detailed

service:
  extensions: []
  pipelines:
    metrics:
      receivers: [windowsperfcounters]
      processors: []
      exporters: [debug, prometheusremotewrite]

Log output

2024-04-11T07:05:15.889-0400	info	MetricsExporter	{"kind": "exporter", "data_type": "metrics", "name": "debug", "resource metrics": 1, "metrics": 1, "data points": 2} // <------ Two data points
2024-04-11T07:05:15.889-0400	info	ResourceMetrics #0
Resource SchemaURL: 
ScopeMetrics #0
ScopeMetrics SchemaURL: 
InstrumentationScope  
Metric #0
Descriptor:
     -> Name: process.pid
     -> Description: 
     -> Unit: 
     -> DataType: Gauge
NumberDataPoints #0
Data point attributes:
     -> instance: Str(Notepad) // <--------- 
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-04-11 11:03:44.8430194 +0000 UTC
Value: 16660.000000
NumberDataPoints #1
Data point attributes:
     -> instance: Str(Notepad) // <-------
StartTimestamp: 1970-01-01 00:00:00 +0000 UTC
Timestamp: 2024-04-11 11:03:44.8430194 +0000 UTC
Value: 21988.000000
	{"kind": "exporter", "data_type": "metrics", "name": "debug"}

Additional context

I already have a PR that I can submit for this. I understand that this might be a problem in terms of cardinality so I am open to gating this behind a config option for the receiver.

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions