Image Augmentation

https://d2l.ai/chapter_computer-vision/image-augmentation.html

The loss was lower for the net trained on train_with_data_aug(no_aug, no_aug) while at the same time the test accuracy was worse. With train_augs the loss was a bit higher, but the test accuracy was better. I would say, the fact that the loss was higher with train_augs means that it didn’t overfit and generalized better (due to the better test acc).

In “Flipping and Cropping” section, the names of pytorch-version functions are wrong in the text. You are writing mxnet-version functions.

Great catch @Aaron_L! Fixed in this PR. Feel free to open a PR and contribute to our project!

#@save
def train_ch13(net, train_iter, test_iter, loss, trainer, num_epochs, devices=d2l.try_all_gpus()):

net = nn.DataParallel(net, device_ids=devices).to(devices[0])

Can anybody tell me why the nn.DataParallel() here is followed by .to(devices[0]) ?
Don’t we train a model on multiple GPUs in function train_ch13()?

If you want to train on CPU anyway, I suggest this edit. Warning: this runs for a while.

batch_size, devices, net = 256, 1, d2l.resnet18(10, 3)
net.apply(d2l.init_cnn)

batch_size, devices, net = 256, 1, d2l.resnet18(10, 3)
net.apply(d2l.init_cnn)

def train_with_data_aug_cpu(train_augs, test_augs, net, lr=0.001):
    train_iter = load_cifar10(True, train_augs, batch_size)
    test_iter = load_cifar10(False, test_augs, batch_size)
    loss = nn.CrossEntropyLoss(reduction="none")
    trainer = torch.optim.Adam(net.parameters(), lr=lr)
    net(next(iter(train_iter))[0])
    train_ch13(net, train_iter, test_iter, loss, trainer, 10, devices=[torch.device('cpu')])

train_with_data_aug_cpu(train_augs, test_augs, net)