Image Augmentation

The loss was lower for the net trained on train_with_data_aug(no_aug, no_aug) while at the same time the test accuracy was worse. With train_augs the loss was a bit higher, but the test accuracy was better. I would say, the fact that the loss was higher with train_augs means that it didn’t overfit and generalized better (due to the better test acc).

In “Flipping and Cropping” section, the names of pytorch-version functions are wrong in the text. You are writing mxnet-version functions.

Great catch @Aaron_L! Fixed in this PR. Feel free to open a PR and contribute to our project!