Skip to content

[connector/spanmetricsconnector] add separate additional dimensions for calls and duration metrics #39134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 3 commits into from
May 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 29 additions & 0 deletions .chloggen/fix-36805.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
# Use this changelog template to create an entry for release notes.

# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: enhancement

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: spanmetricsconnector

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Separate Dimensions for calls and duration metrics

# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
issues: [36805]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext: |
Add two new fields to the settings: `histogram.dimensions` and `calls_dimensions`.
Use them to add independent dimensions to the duration and calls metrics.

# If your change doesn't affect end users or the exported elements of any package,
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
# Optional: The change log or logs in which this entry should be included.
# e.g. '[user]' or '[user, api]'
# Include 'user' if the change is relevant to end users.
# Include 'api' if there is a change to a library API.
# Default: '[user]'
change_logs: [user]
17 changes: 13 additions & 4 deletions connector/spanmetricsconnector/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -92,19 +92,22 @@ The following settings can be optionally configured:
- `disable` (default: `false`): Disable all histogram metrics.
- `unit` (default: `ms`): The time unit for recording duration measurements.
calculated from spans duration measurements. One of either: `ms` or `s`.
- `dimensions`: additional attributes to add as dimensions to the `traces.span.metrics.duration` metric,
which will be included _on top of_ the common and configured `dimensions` for span attributes and resource attributes.
- `explicit`:
- `buckets`: the list of durations defining the duration histogram time buckets. Default
buckets: `[2ms, 4ms, 6ms, 8ms, 10ms, 50ms, 100ms, 200ms, 400ms, 800ms, 1s, 1400ms, 2s, 5s, 10s, 15s]`
- `exponential`:
- `max_size` (default: `160`) the maximum number of buckets per positive or negative number range.
- `dimensions`: the list of dimensions to add together with the default dimensions defined above.

- `dimensions`: the list of dimensions to add to `traces.span.metrics.calls`, `traces.span.metrics.duration` and `traces.span.metrics.event` metrics with the default dimensions defined above.
Each additional dimension is defined with a `name` which is looked up in the span's collection of attributes or
resource attributes (AKA process tags) such as `ip`, `host.name` or `region`.

If the `name`d attribute is missing in the span, the optional provided `default` is used.

If no `default` is provided, this dimension will be **omitted** from the metric.
- `calls_dimensions`: additional attributes to add as dimensions to the `traces.span.metrics.calls` metric,
which will be included _on top of_ the common and configured `dimensions` for span attributes and resource attributes.
- `exclude_dimensions`: the list of dimensions to be excluded from the default set of dimensions. Use to exclude unneeded data from metrics.
- `dimensions_cache_size`: this setting is deprecated, please use aggregation_cardinality_limit instead.
- `include_instrumentation_scope`: a list of instrumentation scope names to include from the traces.
Expand All @@ -120,7 +123,7 @@ The following settings can be optionally configured:
- `enabled` (default: `false`): enabling will add spans as Exemplars to all metrics. Exemplars are only kept for one flush interval.rom the cache, its next data point will indicate a "reset" in the series. Downstream components converting from delta to cumulative, like `prometheusexporter`, may handle these resets by setting cumulative counters back to 0.
- `events`: Use to configure the events metric.
- `enabled`: (default: `false`): enabling will add the events metric.
- `dimensions`: (mandatory if `enabled`) the list of the span's event attributes to add as dimensions to the events metric, which will be included _on top of_ the common and configured `dimensions` for span and resource attributes.
- `dimensions`: (mandatory if `enabled`) the list of the span's event attributes to add as dimensions to the `traces.span.metrics.events` metric, which will be included _on top of_ the common and configured `dimensions` for span attributes and resource attributes.
- `resource_metrics_key_attributes`: Filter the resource attributes used to produce the resource metrics key map hash. Use this in case changing resource attributes (e.g. process id) are breaking counter metrics.
- `aggregation_cardinality_limit` (default: `0`): Defines the maximum number of unique combinations of dimensions that will be tracked for metrics aggregation. When the limit is reached, additional unique combinations will be dropped but registered under a new entry with `otel.metric.overflow="true"`. A value of `0` means no limit is applied.

Expand All @@ -144,12 +147,18 @@ exporters:
connectors:
spanmetrics:
histogram:
dimensions:
- name: url.scheme
default: https
explicit:
buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
buckets: [100us, 1ms, 2ms, 6ms, 10ms, 100ms, 250ms]
dimensions:
- name: http.method
default: GET
- name: http.status_code
calls_dimensions:
- name: http.url
default: /ping
exemplars:
enabled: true
exclude_dimensions: ['status.code']
Expand Down
2 changes: 2 additions & 0 deletions connector/spanmetricsconnector/config.go
Original file line number Diff line number Diff line change
Expand Up @@ -43,6 +43,7 @@ type Config struct {
// The dimensions will be fetched from the span's attributes. Examples of some conventionally used attributes:
// https://github.com/open-telemetry/opentelemetry-collector/blob/main/model/semconv/opentelemetry.go.
Dimensions []Dimension `mapstructure:"dimensions"`
CallsDimensions []Dimension `mapstructure:"calls_dimensions"`
ExcludeDimensions []string `mapstructure:"exclude_dimensions"`

// DimensionsCacheSize defines the size of cache for storing Dimensions, which helps to avoid cache memory growing
Expand Down Expand Up @@ -98,6 +99,7 @@ type HistogramConfig struct {
Unit metrics.Unit `mapstructure:"unit"`
Exponential *ExponentialHistogramConfig `mapstructure:"exponential"`
Explicit *ExplicitHistogramConfig `mapstructure:"explicit"`
Dimensions []Dimension `mapstructure:"dimensions"`
// prevent unkeyed literal initialization
_ struct{}
}
Expand Down
16 changes: 16 additions & 0 deletions connector/spanmetricsconnector/config_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,22 @@ func TestLoadConfig(t *testing.T) {
id: component.NewIDWithName(metadata.Type, "invalid_delta_timestamp_cache_size"),
errorMessage: "invalid delta timestamp cache size: 0, the maximum number of the items in the cache should be positive",
},
{
id: component.NewIDWithName(metadata.Type, "separate_calls_and_duration_dimensions"),
expected: &Config{
AggregationTemporality: "AGGREGATION_TEMPORALITY_CUMULATIVE",
Histogram: HistogramConfig{Disable: false, Unit: defaultUnit, Dimensions: []Dimension{{Name: "http.status_code", Default: (*string)(nil)}}},
Dimensions: []Dimension{
{Name: "http.method", Default: &defaultMethod},
},
CallsDimensions: []Dimension{
{Name: "http.url", Default: (*string)(nil)},
},
ResourceMetricsCacheSize: defaultResourceMetricsCacheSize,
MetricsFlushInterval: 60 * time.Second,
Namespace: DefaultNamespace,
},
},
}

for _, tt := range tests {
Expand Down
26 changes: 21 additions & 5 deletions connector/spanmetricsconnector/connector.go
Original file line number Diff line number Diff line change
Expand Up @@ -74,6 +74,12 @@ type connectorImp struct {
// Event dimensions to add to the events metric.
eDimensions []utilattri.Dimension

// Calls dimensions to add to the events metric.
callsDimensions []utilattri.Dimension

// duration dimensions to add to the events metric.
durationDimensions []utilattri.Dimension

events EventsConfig

// Tracks the last TimestampUnixNano for delta metrics so that they represent an uninterrupted series. Unused for cumulative span metrics.
Expand Down Expand Up @@ -144,6 +150,8 @@ func newConnector(logger *zap.Logger, config component.Config, clock clockwork.C
ticker: clock.NewTicker(cfg.MetricsFlushInterval),
done: make(chan struct{}),
eDimensions: newDimensions(cfg.Events.Dimensions),
callsDimensions: newDimensions(cfg.CallsDimensions),
durationDimensions: newDimensions(cfg.Histogram.Dimensions),
events: cfg.Events,
}, nil
}
Expand Down Expand Up @@ -390,9 +398,11 @@ func (p *connectorImp) aggregateMetrics(traces ptrace.Traces) {
duration = float64(endTime-startTime) / float64(unitDivider)
}

key := p.buildKey(serviceName, span, p.dimensions, resourceAttr)
callsDimensions := p.dimensions
callsDimensions = append(callsDimensions, p.callsDimensions...)
key := p.buildKey(serviceName, span, callsDimensions, resourceAttr)
attributesFun := func() pcommon.Map {
return p.buildAttributes(serviceName, span, resourceAttr, p.dimensions, ils.Scope())
return p.buildAttributes(serviceName, span, resourceAttr, callsDimensions, ils.Scope())
}

// aggregate sums metrics
Expand All @@ -404,7 +414,13 @@ func (p *connectorImp) aggregateMetrics(traces ptrace.Traces) {

// aggregate histogram metrics
if !p.config.Histogram.Disable {
h, durationLimitReached := histograms.GetOrCreate(key, attributesFun, startTimestamp)
durationDimensions := p.dimensions
durationDimensions = append(durationDimensions, p.durationDimensions...)
durationKey := p.buildKey(serviceName, span, durationDimensions, resourceAttr)
attributesFun = func() pcommon.Map {
return p.buildAttributes(serviceName, span, resourceAttr, durationDimensions, ils.Scope())
}
h, durationLimitReached := histograms.GetOrCreate(durationKey, attributesFun, startTimestamp)
if !durationLimitReached && p.config.Exemplars.Enabled && !span.TraceID().IsEmpty() {
p.addExemplar(span, duration, h)
}
Expand All @@ -422,9 +438,9 @@ func (p *connectorImp) aggregateMetrics(traces ptrace.Traces) {

rscAndEventAttrs.EnsureCapacity(resourceAttr.Len() + event.Attributes().Len())
resourceAttr.CopyTo(rscAndEventAttrs)
// We cannot use CopyTo because it overrides the existing keys.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should keep this comment because what cannot be done is to do event.Attributes().CopyTo(rscAdnEventAttrs)

Copy link
Contributor Author

@Frapschen Frapschen Apr 30, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note that event.Attributes().CopyTo(rscAdnEventAttrs) will overwrite the existing data in rscAdnEventAttrs.

On the other hand, using rscAndEventAttrs.PutStr(k, v.Str()) may lead to loss of type information due to v.Str().

A better alternative is to use v.CopyTo(rscAndEventAttrs.PutEmpty(k)), which preserves the type information, making the comment unnecessary.

For a similar use case:

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with your comments and fine changing this from rscAndEventAttrs.PutStr to v.CopyTo(rscAndEventAttrs.PutEmpty(k)) (didn't know we can do this) I changed from event.Attributes().CopyTo(rscAdnEventAttrs) to the current code in my previous PR becuase of the issue you mentioned.

The reason why I would like to preserve the comment is because I don't want somebody in the future replace this:

event.Attributes().Range(func(k string, v pcommon.Value) bool {
							v.CopyTo(rscAndEventAttrs.PutEmpty(k))
							return true
						})

with this:

event.Attributes().CopyTo(rscAdnEventAttrs)

and introduce a regression. Because it is not obvious that .CopyTo replaces evertyhing.

// We cannot use event.Attributes().CopyTo(rscAdnEventAttrs) because it overrides the existing keys.
event.Attributes().Range(func(k string, v pcommon.Value) bool {
rscAndEventAttrs.PutStr(k, v.Str())
v.CopyTo(rscAndEventAttrs.PutEmpty(k))
return true
})

Expand Down
44 changes: 42 additions & 2 deletions connector/spanmetricsconnector/connector_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -400,8 +400,8 @@ func initSpan(span span, s ptrace.Span) {
s.Attributes().PutEmpty(nullAttrName)
s.Attributes().PutEmptyMap(mapAttrName)
s.Attributes().PutEmptySlice(arrayAttrName)
s.SetTraceID(pcommon.TraceID(span.traceID))
s.SetSpanID(pcommon.SpanID(span.spanID))
s.SetTraceID(span.traceID)
s.SetSpanID(span.spanID)

e := s.Events().AppendEmpty()
e.SetName("exception")
Expand Down Expand Up @@ -1871,6 +1871,46 @@ func TestDeltaTimestampCacheExpiry(t *testing.T) {
assert.Greater(t, serviceAStartTimestamp2, serviceATimestamp1) // These would be the same if nothing was evicted from the cache
}

func TestSeparateDimensions(t *testing.T) {
factory := NewFactory()
cfg := factory.CreateDefaultConfig().(*Config)
cfg.Namespace = ""
cfg.Dimensions = []Dimension{{Name: stringAttrName, Default: nil}}
cfg.CallsDimensions = []Dimension{{Name: intAttrName, Default: stringp("0")}}
cfg.Histogram.Dimensions = []Dimension{{Name: doubleAttrName, Default: stringp("0.0")}}
c, err := newConnector(zaptest.NewLogger(t), cfg, clockwork.NewFakeClock())
require.NoError(t, err)
err = c.ConsumeTraces(context.Background(), buildSampleTrace())
require.NoError(t, err)
metrics := c.buildMetrics()
for i := 0; i < metrics.ResourceMetrics().Len(); i++ {
rm := metrics.ResourceMetrics().At(i)
ism := rm.ScopeMetrics()
for ilmC := 0; ilmC < ism.Len(); ilmC++ {
m := ism.At(ilmC).Metrics()
for mC := 0; mC < m.Len(); mC++ {
metric := m.At(mC)
if metric.Name() == metricNameCalls {
assert.Equal(t, pmetric.MetricTypeSum, metric.Type())
for idp := 0; idp < metric.Sum().DataPoints().Len(); idp++ {
attrs := metric.Sum().DataPoints().At(idp).Attributes()
assert.Contains(t, attrs.AsRaw(), stringAttrName)
assert.Contains(t, attrs.AsRaw(), intAttrName) // only in traces.span.metrics.calls metric
}
}
if metric.Name() == metricNameDuration {
assert.Equal(t, pmetric.MetricTypeHistogram, metric.Type())
for idp := 0; idp < metric.Histogram().DataPoints().Len(); idp++ {
attrs := metric.Histogram().DataPoints().At(idp).Attributes()
assert.Contains(t, attrs.AsRaw(), stringAttrName)
assert.Contains(t, attrs.AsRaw(), doubleAttrName) // only in traces.span.metrics.duration
}
}
}
}
}
}

// Clock where Now() always returns a greater value than the previous return value
type alwaysIncreasingClock struct {
clockwork.Clock
Expand Down
10 changes: 10 additions & 0 deletions connector/spanmetricsconnector/testdata/config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -88,3 +88,13 @@ spanmetrics/invalid_delta_timestamp_cache_size:

spanmetrics/default_delta_timestamp_cache_size:
aggregation_temporality: "AGGREGATION_TEMPORALITY_DELTA"

spanmetrics/separate_calls_and_duration_dimensions:
histogram:
dimensions:
- name: http.status_code
dimensions:
- name: http.method
default: GET
calls_dimensions:
- name: http.url
Loading