Linear Regression from Scratch

The size of the update step is determined by the learning rate lr . Because our loss is calculated as a sum over the minibatch of examples, we normalize our step size by the batch size ( batch_size ), so that the magnitude of a typical step size does not depend heavily on our choice of the batch size.

I didn’t get this, can someone explain in simpler words?

I hope my words are simpler :smile:. From my understanding of the passage, in the weight update equation (w:=w - lr * D, where D is the gradient ) after each step of training on a minibatch (let’s say m examples per minibatch) we divide the total minibatch gradient with the size of the minibatch (which is m, so D=minibatch_grad/m) and then multiply by the learning rate, thus the greater effect on our step size towards the minimum is heavily depend on lr rather than m.

1 Like

I agree with you, instead of using w:=w - D, which is heavily depends on m, we introduce lr to set the limit for D, which is now less influance of m.