Training on Multiple GPUs

astonzhang · June 29, 2020, 10:24pm

https://d2l.ai/chapter_computational-performance/multiple-gpus.html

yannis · September 4, 2020, 6:28pm

Given we increase the effective batch size by a factor of k when training with k GPUs, shouldn’t we be decreasing (instead of increasing as stated) the LR by a factor of k to make up for the approximately k-times larger weight update that results from the increased batch size per iteration?

StevenJokes · September 4, 2020, 6:36pm

Why?
I think LR should increase to catch up with batch size increasing.

Pratik_Pratik · February 15, 2021, 2:23pm

why Large minibatches may require a slightly increased learning rate.