Minibatch Stochastic Gradient Descent

astonzhang · September 17, 2020, 4:59am

https://d2l.ai/chapter_optimization/minibatch-sgd.html

Manos_Angelis · June 25, 2022, 9:25am

Instead of timer.avg() I think you should write timer.sum()/num_epochs ,in the print statement of function train_ch11 .
If I am right, then it is not true that the time required per epoch for minibatch sgd (when batch_size=100) is shorter than the time needed for batch gradient descent.

yoderj · March 23, 2023, 6:35pm

I think it would be good to mention the linear scaling rule and square root scaling rule in this chapter – or link to a chapter that discusses them. The linear scaling rule is promoted, for example, by Horovod, IIRC.