DDP: why does every process allocate memory of GPU 0 and how to avoid it?

Run [this](https://github.com/pytorch/examples/tree/main/imagenet) example with 2 GPUs.
process 2 will allocate some memory on GPU 0.
```
python main.py --multiprocessing-distributed --world-size 1 --rank 0
```

![image](https://user-images.githubusercontent.com/34199488/157247908-a2f6be5a-a2f2-46f0-b3da-4cdee956470d.png)


I have carefully checked the sample code and there seems to be no obvious error that would cause process 2 to transfer data to GPU 0.

So: 
1. Why does process 2 allocate memory of GPU 0?
2. Is this part of the data involved in the calculation? I think if this part of the data is involved in the calculation when the number of processes becomes large, it will cause GPU 0 to be seriously overloaded?
3. Is there any way to avoid it?

Thanks in advance to partners in the PyTorch community for their hard work.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

DDP: why does every process allocate memory of GPU 0 and how to avoid it? #969

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

DDP: why does every process allocate memory of GPU 0 and how to avoid it? #969

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions