Skip to content

[receiver/kafka] 0.124.0 release broke default log text encoding #39793

Closed
@kuiperda

Description

@kuiperda

Component(s)

receiver/kafka

What happened?

Description

Release 0.124.0 updated the kafka receiver's topic and encoding fields.

0.124.0+ Collectors using text_utf-8 as their log::encoding encounter this error:

receiver: invalid component type: invalid character(s) in type "text_utf-8"

As part of the update, this PR made a change that errors if the - character is used in the encoding. See this function, specifically component.NewType(encoding):

// encodingToComponentID converts an encoding string to a component ID using the given encoding as type.
func encodingToComponentID(encoding string) (*component.ID, error) {
	componentType, err := component.NewType(encoding)
	if err != nil {
		return nil, fmt.Errorf("invalid component type: %w", err)
	}
	id := component.NewID(componentType)
	return &id, nil
}
NewType creates a type. It returns an error if the type is invalid. A type must - have at least one character, - start with an ASCII alphabetic character and - can only contain ASCII alphanumeric characters and '_'.

Looking at the func newLogsUnmarshaler in the same file, it looks like utf8 and utf16 are the expected format now, but the readme still recommends utf-8 and the default appears to still be utf-8. There is a test validating usage of utf16 but not utf8 or utf-8.

I do not have Kafka set up but a collector will error because of this even before complaining that there are no brokers to connect to.

Steps to Reproduce

Run a 0.124.0 collector with a kafkareceiver using text_utf-8 as the log encoding. It will immediately error due to the - in the encoding. The config I shared is still using the old encoding/topic fields, but nesting them under logs: instead still hits the same error.

Expected Result

The receiver should not error when using a hyphenated value like text_utf-8 as the log encoding.

Furthermore, the recommended and default format for text encoding should work. If the breaking change was intentional, the documentation should be updated accordingly.

Tests should be added to cover this case.

Actual Result

The collector errors due to the - in the log encoding.

Collector version

0.124.0

Environment information

Environment

OS: macOS/darwin Sequoia 15.0.1
Compiler(if manually compiled): go 1.24.0

OpenTelemetry Collector configuration

receivers:
    kafka/logs:
        brokers:
            - localhost:9092
        client_id: otel-collector
        encoding: text_utf-8
        group_id: otel-collector
        metadata:
            full: true
        protocol_version: 2.0.0
        topic: otlp_logs
exporters:
    nop/devnull: null
service:
    pipelines:
        logs:
            receivers:
                - kafka/logs
            processors: []
            exporters:
                - nop/devnull
    telemetry:
        metrics:
            readers:
                - pull:
                    exporter:
                        prometheus:
                            host: localhost
                            port: 8888

Log output

cannot start pipelines: failed to start "kafka/logs" receiver: invalid component type: invalid character(s) in type "text_utf-8"

Additional context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions