Open
Description
Run this example with 2 GPUs.
process 2 will allocate some memory on GPU 0.
python main.py --multiprocessing-distributed --world-size 1 --rank 0
I have carefully checked the sample code and there seems to be no obvious error that would cause process 2 to transfer data to GPU 0.
So:
- Why does process 2 allocate memory of GPU 0?
- Is this part of the data involved in the calculation? I think if this part of the data is involved in the calculation when the number of processes becomes large, it will cause GPU 0 to be seriously overloaded?
- Is there any way to avoid it?
Thanks in advance to partners in the PyTorch community for their hard work.