https://d2l.ai/chapter_linear-regression/linear-regression-scratch.html

The size of the update step is determined by the learning rate

`lr`

. Because our loss is calculated as a sum over the minibatch of examples, we normalize our step size by the batch size (`batch_size`

), so that the magnitude of a typical step size does not depend heavily on our choice of the batch size.

I didn’t get this, can someone explain in simpler words?

I hope my words are simpler . From my understanding of the passage, in the weight update equation (w:=w - lr * D, where D is the gradient ) after each step of training on a minibatch (let’s say m examples per minibatch) we divide the total minibatch gradient with the size of the minibatch (which is m, so D=minibatch_grad/m) and then multiply by the learning rate, thus the greater effect on our step size towards the minimum is heavily depend on lr rather than m.

I agree with you, instead of using w:=w - D, which is heavily depends on m, we introduce lr to set the limit for D, which is now less influance of m.

From my understanding, we divide the **total loss** to batch size in order to get **average loss** for a given batch. Contrary to total loss, average loss does not depend on the batch size.