Description
Component(s)
connector/servicegraph
What happened?
Description
Currently we are observing high volume of metrics generated by servicegraph
connector. We are using servicegraphs connector to build service graph. We have deployed a layer of Collectors containing the load-balancing exporter in front of traces Collectors doing the span metrics and service graph connector processing. The load-balancing exporter is used to hash the trace ID consistently and determine which collector backend should receive spans for that trace. The servicegraph exporting the metrics to Victoria metrics with prometheusremotewrite exporter. To give explanation about issues if we consider span rate received approximate 6.95K mean then servicegraph produces near to 18K metrics.
Steps to Reproduce
Expected Result
Metrics generation less in number or what will be expected by service graph correctly.
Actual Result
Span rate recieved
Metric point rate
Collector version
0.104.0
Environment information
No response
OpenTelemetry Collector configuration
config:
exporters:
prometheusremotewrite/mimir-default-processor-spanmetrics:
endpoint:
headers:
x-scope-orgid:
resource_to_telemetry_conversion:
enabled: true
timeout: 30s
tls:
insecure: true
remote_write_queue:
enabled: true
queue_size: 100000
num_consumers: 500
prometheusremotewrite/mimir-default-servicegraph:
endpoint:
headers:
x-scope-orgid:
resource_to_telemetry_conversion:
enabled: true
timeout: 30s
tls:
insecure: true
remote_write_queue:
enabled: true
queue_size: 100000
num_consumers: 500
connectors:
spanmetrics:
histogram:
explicit:
buckets: [100ms, 500ms, 2s, 5s, 10s, 20s, 30s]
aggregation_temporality: "AGGREGATION_TEMPORALITY_CUMULATIVE"
metrics_flush_interval: 15s
metrics_expiration: 5m
exemplars:
enabled: false
dimensions:
- name: http.method
- name: http.status_code
- name: cluster
- name: collector.hostname
events:
enabled: true
dimensions:
- name: exception.type
resource_metrics_key_attributes:
- service.name
- telemetry.sdk.language
- telemetry.sdk.name
servicegraph:
latency_histogram_buckets: [100ms, 250ms, 1s, 5s, 10s]
store:
ttl: 2s
max_items: 10
receivers:
otlp:
protocols:
http:
endpoint: ${env:MY_POD_IP}:4318
grpc:
endpoint: ${env:MY_POD_IP}:4317
service:
pipelines:
traces/connector-pipeline:
exporters:
- otlphttp/tempo-processor-default
- spanmetrics
- servicegraph
processors:
- batch
- memory_limiter
receivers:
- otlp
metrics/spanmetrics:
exporters:
- debug
- prometheusremotewrite/mimir-default-processor-spanmetrics
processors:
- batch
- memory_limiter
receivers:
- spanmetrics
metrics/servicegraph:
exporters:
- debug
- prometheusremotewrite/mimir-default-servicegraph
processors:
- batch
- memory_limiter
receivers:
- servicegraph
Log output
No response
Additional context
No response