Skip to content

[pkg/stanza/operator/recombine] aggregate the latter part of the split-log due to triggering the size limit #21241

Closed
@yutingcaicyt

Description

@yutingcaicyt

Component(s)

pkg/stanza

Is your feature request related to a problem? Please describe.

At present, when using the recombine operator to combine the log entries with 'is_first_entry', if a batch triggers the size limit, it will aggregate the entries in the batch and flush it. However, when subsequent entries arrive, they will be sent one by one. For example, if we set the "max_batch_size" as 50, a complete log that consists of 100 entries comes into the recombine operator will be split like 50 + 1 + 1...+ 1, but I hope it will be 50 + 50.

Describe the solution you'd like

The reason for this phenomenon is this code:
`
// When matching on first entry, never batch partial first. Just emit immediately

case !matches && r.matchIndicatesFirst() && r.batchMap[s] == nil:
	r.addToBatch(ctx, e, s)
	return r.flushSource(s)

`
One way to solve this problem is that we can control the 'flushSource' and not delete the 'sourceBatch' from the batchMap when the size limit is triggered.

Describe alternatives you've considered

`
// When matching on first entry, never batch partial first. Just emit immediately

case !matches && r.matchIndicatesFirst() && r.batchMap[s] == nil:
	r.addToBatch(ctx, e, s)
	return r.flushSource(s)

`
Another way is deleting these codes directly. I think the code may be not necessary if only the 'is_first_entry' is correctly configured. But this requires more thinking because it will change the original logic.

Additional context

If possible I would like to work on this feature.

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions