Linear Regression Implementation from Scratch

https://d2l.ai/chapter_linear-regression/linear-regression-scratch.html

The size of the update step is determined by the learning rate lr . Because our loss is calculated as a sum over the minibatch of examples, we normalize our step size by the batch size ( batch_size ), so that the magnitude of a typical step size does not depend heavily on our choice of the batch size.

I didn’t get this, can someone explain in simpler words?

I hope my words are simpler :smile:. From my understanding of the passage, in the weight update equation (w:=w - lr * D, where D is the gradient ) after each step of training on a minibatch (let’s say m examples per minibatch) we divide the total minibatch gradient with the size of the minibatch (which is m, so D=minibatch_grad/m) and then multiply by the learning rate, thus the greater effect on our step size towards the minimum is heavily depend on lr rather than m.

1 Like

I agree with you, instead of using w:=w - D, which is heavily depends on m, we introduce lr to set the limit for D, which is now less influance of m.

From my understanding, we divide the total loss to batch size in order to get average loss for a given batch. Contrary to total loss, average loss does not depend on the batch size.

While implementing this code, I got an error
image
which I realized is might be because of ‘loss’ variable in LinearRegressionScratch class, checked using model.dict.keys(). To get away with this error I defined a new loss function with name ‘loss_’ and changed it in training_step. But in this case my model is not getting trained properly.
Am I missing something here?

Looks like the error was because if keras version. going back to keras=2.15.0 solved it.