Minibatch Stochastic Gradient Descent

https://d2l.ai/chapter_optimization/minibatch-sgd.html

Instead of timer.avg() I think you should write timer.sum()/num_epochs ,in the print statement of function train_ch11 .
If I am right, then it is not true that the time required per epoch for minibatch sgd (when batch_size=100) is shorter than the time needed for batch gradient descent.