Instead of timer.avg() I think you should write timer.sum()/num_epochs ,in the print statement of function train_ch11 .
If I am right, then it is not true that the time required per epoch for minibatch sgd (when batch_size=100) is shorter than the time needed for batch gradient descent.