https://d2l.ai/chapter_convolutional-modern/nin.html
Just out of curiosity, why the difference in training accuracy/loss per epoch when training on MxNet vs Pytorch vs Tensorflow?
After a few hours trial running, the hyper tuning spat out these parameters with best accuracy: