Skip to content

3.x: unbounded requests from first, take and others #6569

Closed
@davidmoten

Description

@davidmoten

In 2.x we talked about some surprising request patterns from operators like first, take and others where despite only one or a limited number of items being needed, Long.MAX_VALUE was requested of upstream and then cancelled after the desired number arrived (#5077). I believe this was a micro-optimization performance boost that improved the Scrabble benchmarks. Any change to the pattern was rejected based on it being a breaking API change.

In essence I'd like us not to be opinionated about the effect on the upstream of over-requesting (particularly over a network boundary). By over-requesting we are fundamentally losing information that can be useful to optimizing upstream processing. An example that springs to mind is that requesting a large number may be translated to an api call to a remote upstream that does a full sort (O(nlogn)) whereas requesting only one can be implemented upstream with a max scan (O(n)). This of course assumes one and only request to create the stream so is not a run-of-the-mill streaming case. I'd also suggest we are not opinionated about the ability of upstream to respond to cancellation (upstream may be performing cpu intensive actions in third-party libraries that aren't cancellable).

I think the effect on the benchmarks of reverting to naturally bounded requests where obvious (first, take, etc) will be very small.

Can we revisit this one for 3.x?

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions