Skip to content

Commit 154ba2b

Browse files
Fiery-FenixChrsMarkjinja2
authored
[receiver/kubeletstats] Collect network metrics from all interfaces (#38737)
#### Description Revives #34287 by picking it's changes. Also changed changed approach from feature gate to configuration parameters (disabled by default). In context of network metrics - default behavior of component hasn't changed, so I consider it as a non-breaking change. #### Link to tracking issue Fixes #30196 #### Testing Unit test were updated to cover new functionality #### Documentation Documentation updated to cover new opt-in functionality Original authors are included in the commit: Co-authored-by: ChrsMark <[email protected]> Co-authored-by: jinja2 <[email protected]> --------- Co-authored-by: Christos Markou <[email protected]> Co-authored-by: Jina Jain <[email protected]>
1 parent a2d3ab7 commit 154ba2b

File tree

12 files changed

+3620
-15
lines changed

12 files changed

+3620
-15
lines changed
Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
# Use this changelog template to create an entry for release notes.
2+
3+
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
4+
change_type: enhancement
5+
6+
# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
7+
component: kubeletstatsreceiver
8+
9+
# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
10+
note: Adds support for collecting Node and Pod network IO/error metrics for all network interfaces
11+
12+
# Mandatory: One or more tracking issues related to the change. You can use the PR number here if no issue exists.
13+
issues: [30196]
14+
15+
# (Optional) One or more lines of additional information to render under the primary note.
16+
# These lines will be padded with 2 spaces and then inserted directly into the document.
17+
# Use pipe (|) for multiline entries.
18+
subtext:
19+
20+
# If your change doesn't affect end users or the exported elements of any package,
21+
# you should instead start your pull request title with [chore] or use the "Skip Changelog" label.
22+
# Optional: The change log or logs in which this entry should be included.
23+
# e.g. '[user]' or '[user, api]'
24+
# Include 'user' if the change is relevant to end users.
25+
# Include 'api' if there is a change to a library API.
26+
# Default: '[user]'
27+
change_logs: [user]

receiver/kubeletstatsreceiver/README.md

Lines changed: 26 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -32,7 +32,7 @@ to connect and authenticate to the API server and how often to collect data
3232
and send it to the next consumer.
3333

3434
Kubelet Stats Receiver supports both secure Kubelet endpoint exposed at port 10250 by default and read-only
35-
Kubelet endpoint exposed at port 10255. If `auth_type` set to `none`, the read-only endpoint will be used. The secure
35+
Kubelet endpoint exposed at port 10255. If `auth_type` set to `none`, the read-only endpoint will be used. The secure
3636
endpoint will be used if `auth_type` set to any of the following values:
3737

3838
- `tls` tells the receiver to use TLS for auth and requires that the fields
@@ -109,14 +109,14 @@ node's network namespace.
109109

110110
#### Custom CA
111111

112-
The service account client, by default, uses the CA certificate located at
113-
`/var/run/secrets/kubernetes.io/serviceaccount/ca.crt` to validate the kubelet certificate.
114-
If the kubelet server uses a certificate issued by a different CA,
112+
The service account client, by default, uses the CA certificate located at
113+
`/var/run/secrets/kubernetes.io/serviceaccount/ca.crt` to validate the kubelet certificate.
114+
If the kubelet server uses a certificate issued by a different CA,
115115
specify the custom CA certificate path using the `ca_file` option.
116116

117117
##### AKS Custom CA example
118118

119-
This use case applies to AKS cluster, where the kubelet certificate is issued by
119+
This use case applies to AKS cluster, where the kubelet certificate is issued by
120120
`/etc/kubernetes/certs/kubeletserver.crt`
121121

122122
```yaml
@@ -169,6 +169,7 @@ service:
169169
receivers: [kubeletstats]
170170
exporters: [file]
171171
```
172+
172173
Note that using `auth_type` `kubeConfig`, the endpoint should only be the node name as the communication to the kubelet is proxied by the API server configured in the `kubeConfig`.
173174
`insecure_skip_verify` still applies by overriding the `kubeConfig` settings.
174175
If no `context` is specified, the current context or the default context is used.
@@ -244,6 +245,23 @@ receivers:
244245
- pod
245246
```
246247

248+
### Network metrics from all interfaces for Node and Pod
249+
250+
By default, `k8s.[node|pod].network.*` metrics are collected only for the default network interface (e.g. `eth0`). To enable network IO/error metrics collection from all available interfaces on Node/Pod level - you can use `collect_all_network_interfaces` configuration parameters. Please be aware that enabling this options will increase the amount of produced network metrics and increase network metrics cardinality, because of `interface` attribute.
251+
For example, if you would like to have network IO/error metrics from all network interfaces for both Pod and Node level you can use the following configuration.
252+
253+
```yaml
254+
receivers:
255+
kubeletstats:
256+
collection_interval: 10s
257+
auth_type: "serviceAccount"
258+
endpoint: "${env:K8S_NODE_NAME}:10250"
259+
insecure_skip_verify: true
260+
collect_all_network_interfaces:
261+
pod: true
262+
node: true
263+
```
264+
247265
### Collect `k8s.{container,pod}.{cpu,memory}.node.utilization` as ratio of total node's capacity
248266

249267
In order to calculate the `k8s.container.cpu.node.utilization`, `k8s.pod.cpu.node.utilization`,
@@ -259,6 +277,7 @@ env:
259277
fieldRef:
260278
fieldPath: spec.nodeName
261279
```
280+
262281
Then set `node` value to `${env:K8S_NODE_NAME}` in the receiver's configuration:
263282

264283
```yaml
@@ -304,7 +323,7 @@ rules:
304323
- apiGroups: [""]
305324
resources: ["nodes/stats"]
306325
verbs: ["get"]
307-
326+
308327
# Only needed if you are using extra_metadata_labels or
309328
# are collecting the request/limit utilization metrics
310329
- apiGroups: [""]
@@ -333,4 +352,4 @@ You can enable the usage of the deprecated metrics by disabling the `receiver.ku
333352
- removed three releases after stable.
334353

335354
More information about the deprecation plan and
336-
the background reasoning can be found at https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27885.
355+
the background reasoning can be found at <https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/27885>.

receiver/kubeletstatsreceiver/config.go

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -55,6 +55,15 @@ type Config struct {
5555

5656
// MetricsBuilderConfig allows customizing scraped metrics/attributes representation.
5757
metadata.MetricsBuilderConfig `mapstructure:",squash"`
58+
59+
// NetworkCollectAllInterfaces allows to enable collecting metrics from all network interfaces instead of default one
60+
// Can be set separately for Pod and Node network metrics
61+
NetworkCollectAllInterfaces NetworkInterfacesEnablerConfig `mapstructure:"collect_all_network_interfaces"`
62+
}
63+
64+
type NetworkInterfacesEnablerConfig struct {
65+
PodMetrics bool `mapstructure:"pod"`
66+
NodeMetrics bool `mapstructure:"node"`
5867
}
5968

6069
// getReceiverOptions returns scraperOptions is the config is valid,
@@ -70,6 +79,11 @@ func (cfg *Config) getReceiverOptions() (*scraperOptions, error) {
7079
return nil, err
7180
}
7281

82+
ifaces := map[kubelet.MetricGroup]bool{
83+
kubelet.NodeMetricGroup: cfg.NetworkCollectAllInterfaces.NodeMetrics,
84+
kubelet.PodMetricGroup: cfg.NetworkCollectAllInterfaces.PodMetrics,
85+
}
86+
7387
var k8sAPIClient kubernetes.Interface
7488
if cfg.K8sAPIConfig != nil {
7589
k8sAPIClient, err = k8sconfig.MakeClient(*cfg.K8sAPIConfig)
@@ -82,6 +96,7 @@ func (cfg *Config) getReceiverOptions() (*scraperOptions, error) {
8296
collectionInterval: cfg.CollectionInterval,
8397
extraMetadataLabels: cfg.ExtraMetadataLabels,
8498
metricGroupsToCollect: mgs,
99+
allNetworkInterfaces: ifaces,
85100
k8sAPIClient: k8sAPIClient,
86101
}, nil
87102
}

receiver/kubeletstatsreceiver/config_test.go

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -339,6 +339,10 @@ func TestGetReceiverOptions(t *testing.T) {
339339
kubelet.NodeMetricGroup: true,
340340
kubelet.PodMetricGroup: true,
341341
},
342+
allNetworkInterfaces: map[kubelet.MetricGroup]bool{
343+
kubelet.NodeMetricGroup: false,
344+
kubelet.PodMetricGroup: false,
345+
},
342346
collectionInterval: 10 * time.Second,
343347
},
344348
},

receiver/kubeletstatsreceiver/internal/kubelet/accumulator.go

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,7 @@ type metricDataAccumulator struct {
3838
metadata Metadata
3939
logger *zap.Logger
4040
metricGroupsToCollect map[MetricGroup]bool
41+
allNetworkInterfaces map[MetricGroup]bool
4142
time time.Time
4243
mbs *metadata.MetricsBuilders
4344
}
@@ -59,7 +60,7 @@ func (a *metricDataAccumulator) nodeStats(s stats.NodeStats) {
5960
addCPUMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeCPUMetrics, s.CPU, currentTime, resources{}, 0)
6061
addMemoryMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeMemoryMetrics, s.Memory, currentTime, resources{}, 0)
6162
addFilesystemMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeFilesystemMetrics, s.Fs, currentTime)
62-
addNetworkMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeNetworkMetrics, s.Network, currentTime)
63+
addNetworkMetrics(a.mbs.NodeMetricsBuilder, metadata.NodeNetworkMetrics, s.Network, currentTime, a.allNetworkInterfaces[NodeMetricGroup])
6364
// todo s.Runtime.ImageFs
6465
rb := a.mbs.NodeMetricsBuilder.NewResourceBuilder()
6566
rb.SetK8sNodeName(s.NodeName)
@@ -79,7 +80,7 @@ func (a *metricDataAccumulator) podStats(s stats.PodStats) {
7980
addCPUMetrics(a.mbs.PodMetricsBuilder, metadata.PodCPUMetrics, s.CPU, currentTime, a.metadata.podResources[s.PodRef.UID], a.metadata.nodeInfo.CPUCapacity)
8081
addMemoryMetrics(a.mbs.PodMetricsBuilder, metadata.PodMemoryMetrics, s.Memory, currentTime, a.metadata.podResources[s.PodRef.UID], a.metadata.nodeInfo.MemoryCapacity)
8182
addFilesystemMetrics(a.mbs.PodMetricsBuilder, metadata.PodFilesystemMetrics, s.EphemeralStorage, currentTime)
82-
addNetworkMetrics(a.mbs.PodMetricsBuilder, metadata.PodNetworkMetrics, s.Network, currentTime)
83+
addNetworkMetrics(a.mbs.PodMetricsBuilder, metadata.PodNetworkMetrics, s.Network, currentTime, a.allNetworkInterfaces[PodMetricGroup])
8384

8485
rb := a.mbs.PodMetricsBuilder.NewResourceBuilder()
8586
rb.SetK8sPodUID(s.PodRef.UID)

receiver/kubeletstatsreceiver/internal/kubelet/metrics.go

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,12 +17,14 @@ func MetricsData(
1717
logger *zap.Logger, summary *stats.Summary,
1818
metadata Metadata,
1919
metricGroupsToCollect map[MetricGroup]bool,
20+
allNetworkInterfaces map[MetricGroup]bool,
2021
mbs *metadata.MetricsBuilders,
2122
) []pmetric.Metrics {
2223
acc := &metricDataAccumulator{
2324
metadata: metadata,
2425
logger: logger,
2526
metricGroupsToCollect: metricGroupsToCollect,
27+
allNetworkInterfaces: allNetworkInterfaces,
2628
time: time.Now(),
2729
mbs: mbs,
2830
}

receiver/kubeletstatsreceiver/internal/kubelet/metrics_test.go

Lines changed: 16 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -39,12 +39,16 @@ func TestMetricAccumulator(t *testing.T) {
3939
ContainerMetricsBuilder: metadata.NewMetricsBuilder(metadata.DefaultMetricsBuilderConfig(), receivertest.NewNopSettings(metadata.Type)),
4040
OtherMetricsBuilder: metadata.NewMetricsBuilder(metadata.DefaultMetricsBuilderConfig(), receivertest.NewNopSettings(metadata.Type)),
4141
}
42-
requireMetricsOk(t, MetricsData(zap.NewNop(), summary, k8sMetadata, ValidMetricGroups, mbs))
42+
ifaces := map[MetricGroup]bool{
43+
NodeMetricGroup: true,
44+
PodMetricGroup: true,
45+
}
46+
requireMetricsOk(t, MetricsData(zap.NewNop(), summary, k8sMetadata, ValidMetricGroups, ifaces, mbs))
4347
// Disable all groups
4448
mbs.NodeMetricsBuilder.Reset()
4549
mbs.PodMetricsBuilder.Reset()
4650
mbs.OtherMetricsBuilder.Reset()
47-
require.Empty(t, MetricsData(zap.NewNop(), summary, k8sMetadata, map[MetricGroup]bool{}, mbs))
51+
require.Empty(t, MetricsData(zap.NewNop(), summary, k8sMetadata, map[MetricGroup]bool{}, map[MetricGroup]bool{}, mbs))
4852
}
4953

5054
func requireMetricsOk(t *testing.T, mds []pmetric.Metrics) {
@@ -142,6 +146,10 @@ func TestUptime(t *testing.T) {
142146
PodMetricGroup: true,
143147
NodeMetricGroup: true,
144148
}
149+
ifaces := map[MetricGroup]bool{
150+
NodeMetricGroup: true,
151+
PodMetricGroup: true,
152+
}
145153

146154
cfg := metadata.DefaultMetricsBuilderConfig()
147155
cfg.Metrics.K8sNodeUptime.Enabled = true
@@ -154,7 +162,7 @@ func TestUptime(t *testing.T) {
154162
ContainerMetricsBuilder: metadata.NewMetricsBuilder(cfg, receivertest.NewNopSettings(metadata.Type)),
155163
}
156164

157-
metrics := indexedFakeMetrics(MetricsData(zap.NewNop(), summary, Metadata{}, mgs, mbs))
165+
metrics := indexedFakeMetrics(MetricsData(zap.NewNop(), summary, Metadata{}, mgs, ifaces, mbs))
158166

159167
requireContains(t, metrics, "k8s.node.uptime")
160168
requireContains(t, metrics, "k8s.pod.uptime")
@@ -214,11 +222,15 @@ func fakeMetrics() []pmetric.Metrics {
214222
PodMetricGroup: true,
215223
NodeMetricGroup: true,
216224
}
225+
ifaces := map[MetricGroup]bool{
226+
NodeMetricGroup: true,
227+
PodMetricGroup: true,
228+
}
217229
mbs := &metadata.MetricsBuilders{
218230
NodeMetricsBuilder: metadata.NewMetricsBuilder(metadata.DefaultMetricsBuilderConfig(), receivertest.NewNopSettings(metadata.Type)),
219231
PodMetricsBuilder: metadata.NewMetricsBuilder(metadata.DefaultMetricsBuilderConfig(), receivertest.NewNopSettings(metadata.Type)),
220232
ContainerMetricsBuilder: metadata.NewMetricsBuilder(metadata.DefaultMetricsBuilderConfig(), receivertest.NewNopSettings(metadata.Type)),
221233
OtherMetricsBuilder: metadata.NewMetricsBuilder(metadata.DefaultMetricsBuilderConfig(), receivertest.NewNopSettings(metadata.Type)),
222234
}
223-
return MetricsData(zap.NewNop(), summary, Metadata{}, mgs, mbs)
235+
return MetricsData(zap.NewNop(), summary, Metadata{}, mgs, ifaces, mbs)
224236
}

receiver/kubeletstatsreceiver/internal/kubelet/network.go

Lines changed: 33 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -12,11 +12,23 @@ import (
1212

1313
type getNetworkDataFunc func(s *stats.NetworkStats) (rx *uint64, tx *uint64)
1414

15-
func addNetworkMetrics(mb *metadata.MetricsBuilder, networkMetrics metadata.NetworkMetrics, s *stats.NetworkStats, currentTime pcommon.Timestamp) {
15+
type getInterfaceDataFunc func(s *stats.InterfaceStats) (rx *uint64, tx *uint64)
16+
17+
func addNetworkMetrics(mb *metadata.MetricsBuilder, networkMetrics metadata.NetworkMetrics, s *stats.NetworkStats, currentTime pcommon.Timestamp, allInterfaces bool) {
1618
if s == nil {
1719
return
1820
}
1921

22+
if allInterfaces {
23+
for i := range s.Interfaces {
24+
recordInterfaceDataPoint(mb, networkMetrics.IO, &s.Interfaces[i], getInterfaceIO, currentTime)
25+
recordInterfaceDataPoint(mb, networkMetrics.Errors, &s.Interfaces[i], getInterfaceErrors, currentTime)
26+
}
27+
// Because stats.NetworkStats.Interfaces contains metrics for all interfaces, including default,
28+
// we don't need to iterate over stats.NetworkStats.InterfaceStats for it, hence we return here
29+
return
30+
}
31+
2032
recordNetworkDataPoint(mb, networkMetrics.IO, s, getNetworkIO, currentTime)
2133
recordNetworkDataPoint(mb, networkMetrics.Errors, s, getNetworkErrors, currentTime)
2234
}
@@ -40,3 +52,23 @@ func getNetworkIO(s *stats.NetworkStats) (*uint64, *uint64) {
4052
func getNetworkErrors(s *stats.NetworkStats) (*uint64, *uint64) {
4153
return s.RxErrors, s.TxErrors
4254
}
55+
56+
func recordInterfaceDataPoint(mb *metadata.MetricsBuilder, recordDataPoint metadata.RecordIntDataPointWithDirectionFunc, s *stats.InterfaceStats, getData getInterfaceDataFunc, currentTime pcommon.Timestamp) {
57+
rx, tx := getData(s)
58+
59+
if rx != nil {
60+
recordDataPoint(mb, currentTime, int64(*rx), s.Name, metadata.AttributeDirectionReceive)
61+
}
62+
63+
if tx != nil {
64+
recordDataPoint(mb, currentTime, int64(*tx), s.Name, metadata.AttributeDirectionTransmit)
65+
}
66+
}
67+
68+
func getInterfaceIO(s *stats.InterfaceStats) (*uint64, *uint64) {
69+
return s.RxBytes, s.TxBytes
70+
}
71+
72+
func getInterfaceErrors(s *stats.InterfaceStats) (*uint64, *uint64) {
73+
return s.RxErrors, s.TxErrors
74+
}

receiver/kubeletstatsreceiver/scraper.go

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -30,6 +30,7 @@ type scraperOptions struct {
3030
collectionInterval time.Duration
3131
extraMetadataLabels []kubelet.MetadataLabel
3232
metricGroupsToCollect map[kubelet.MetricGroup]bool
33+
allNetworkInterfaces map[kubelet.MetricGroup]bool
3334
k8sAPIClient kubernetes.Interface
3435
}
3536

@@ -39,6 +40,7 @@ type kubeletScraper struct {
3940
logger *zap.Logger
4041
extraMetadataLabels []kubelet.MetadataLabel
4142
metricGroupsToCollect map[kubelet.MetricGroup]bool
43+
allNetworkInterfaces map[kubelet.MetricGroup]bool
4244
k8sAPIClient kubernetes.Interface
4345
cachedVolumeSource map[string]v1.PersistentVolumeSource
4446
mbs *metadata.MetricsBuilders
@@ -80,6 +82,7 @@ func newKubeletScraper(
8082
logger: set.Logger,
8183
extraMetadataLabels: rOptions.extraMetadataLabels,
8284
metricGroupsToCollect: rOptions.metricGroupsToCollect,
85+
allNetworkInterfaces: rOptions.allNetworkInterfaces,
8386
k8sAPIClient: rOptions.k8sAPIClient,
8487
cachedVolumeSource: make(map[string]v1.PersistentVolumeSource),
8588
mbs: &metadata.MetricsBuilders{
@@ -138,7 +141,7 @@ func (r *kubeletScraper) scrape(context.Context) (pmetric.Metrics, error) {
138141

139142
metaD := kubelet.NewMetadata(r.extraMetadataLabels, podsMetadata, nodeInfo, r.detailedPVCLabelsSetter())
140143

141-
mds := kubelet.MetricsData(r.logger, summary, metaD, r.metricGroupsToCollect, r.mbs)
144+
mds := kubelet.MetricsData(r.logger, summary, metaD, r.metricGroupsToCollect, r.allNetworkInterfaces, r.mbs)
142145
md := pmetric.NewMetrics()
143146
for i := range mds {
144147
mds[i].ResourceMetrics().MoveAndAppendTo(md.ResourceMetrics())

receiver/kubeletstatsreceiver/scraper_test.go

Lines changed: 36 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -82,6 +82,42 @@ func TestScraper(t *testing.T) {
8282
pmetrictest.IgnoreMetricsOrder()))
8383
}
8484

85+
func TestScraperWithInterfacesMetrics(t *testing.T) {
86+
options := &scraperOptions{
87+
metricGroupsToCollect: allMetricGroups,
88+
allNetworkInterfaces: map[kubelet.MetricGroup]bool{
89+
kubelet.NodeMetricGroup: true,
90+
kubelet.PodMetricGroup: true,
91+
},
92+
}
93+
r, err := newKubeletScraper(
94+
&fakeRestClient{},
95+
receivertest.NewNopSettings(metadata.Type),
96+
options,
97+
metadata.DefaultMetricsBuilderConfig(),
98+
"worker-42",
99+
)
100+
require.NoError(t, err)
101+
102+
md, err := r.ScrapeMetrics(context.Background())
103+
require.NoError(t, err)
104+
105+
require.Equal(t, dataLen+numPods*4+numNodes*4, md.DataPointCount())
106+
expectedFile := filepath.Join("testdata", "scraper", "test_scraper_with_interfaces_metrics.yaml")
107+
108+
// Uncomment to regenerate '*_expected.yaml' files
109+
// golden.WriteMetrics(t, expectedFile, md)
110+
111+
expectedMetrics, err := golden.ReadMetrics(expectedFile)
112+
require.NoError(t, err)
113+
require.NoError(t, pmetrictest.CompareMetrics(expectedMetrics, md,
114+
pmetrictest.IgnoreStartTimestamp(),
115+
pmetrictest.IgnoreResourceMetricsOrder(),
116+
pmetrictest.IgnoreMetricDataPointsOrder(),
117+
pmetrictest.IgnoreTimestamp(),
118+
pmetrictest.IgnoreMetricsOrder()))
119+
}
120+
85121
func TestScraperWithCPUNodeUtilization(t *testing.T) {
86122
watcherStarted := make(chan struct{})
87123
// Create the fake client.

receiver/kubeletstatsreceiver/testdata/config.yaml

Lines changed: 6 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,3 +52,9 @@ kubeletstats/pod_memory_node_utilization:
5252
metrics:
5353
k8s.pod.memory.node.utilization:
5454
enabled: true
55+
kubeletstats/all_network_interfaces_metrics:
56+
collection_interval: 10s
57+
metric_groups: [ container, pod, node ]
58+
collect_all_network_interfaces:
59+
pod: true
60+
node: true

0 commit comments

Comments
 (0)