Add ThreadMapperIterDatapipe

### 🚀 The feature

Similar to #1044 (thanks @ejguan!) I propose to add a new datapipe that uses `ThreadPoolExecutor` to multithread mapping.

### Motivation, pitch

Speed up mapping by using Multithreading

### Alternatives

 Three possible implementations come to my mind.

- Similar to #1044 construct batches and use [Executor.map()](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.Executor.map) and then unbatch again. One disadvantage of this is that the first item can only be returned once all operations in the batch have finished. 
This may change in a future python version see https://github.com/python/cpython/issues/74028 and https://github.com/python/cpython/pull/18566
- Only allow batches as input and apply the operation to each element in the batch. Then return the processed batch.
- Use [concurrent.futures.as_completed](https://docs.python.org/3/library/concurrent.futures.html#concurrent.futures.as_completed) with a parameter like `scheduled_tasks` to schedule a finite number of tasks. This would return results as soon as they are completed but not preserve order.


Which option do you prefer? We can of course also implement e.g. both option 1 and 3.


### Additional context

I am not sure how (if at all) the ThreadPoolExecutor interferes/interacts with multiprocessing used in the Dataloader.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add ThreadMapperIterDatapipe #1045

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Add ThreadMapperIterDatapipe #1045

Description

🚀 The feature

Motivation, pitch

Alternatives

Additional context

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions