Training on Multiple GPUs

Given we increase the effective batch size by a factor of k when training with k GPUs, shouldn’t we be decreasing (instead of increasing as stated) the LR by a factor of k to make up for the approximately k-times larger weight update that results from the increased batch size per iteration?

I think LR should increase to catch up with batch size increasing.