Image Classification Dataset

http://d2l.ai/chapter_linear-networks/image-classification-dataset.html

I have tested that num_workers parameter in torch DataLoader does work. By selecting num_workers=4 reduce the read time to half.

  1. batch size = 1, stochastic gradient descent (SGD)
    batch size = 256, mini-batch gradient descent (MBGD)
    Because using GPU to parallel read data, so MBGD is quicker.
    Reducing the batch_size will make overall read performance slower.
    :face_with_monocle:Does my guess right?
  2. I’m a Windows user. Try it next time!
  3. https://pytorch.org/docs/stable/torchvision/datasets.html

Datasets:

I suggest using %%timeit -r1, which is a built-in function in Jupyter, instead of the d2l timer.

%%time is better. One time is enough :grinning:

Hi friends,
I dont understand resize argument.
I cant show images after resize.



Read again.

@StevenJokess you need to change the arguments when calling show_images() method according to your chosen batch_size and resize arguments in load_data_fashion_mnist() method

something like this
show_images(X.reshape(32, 64, 64), 2, 9, scale=1.5, titles=get_fashion_mnist_labels(y))

For q1, I don’t think SGD or MSGD would affect the performance of reading dataset, since it has nothing to do with updating params.
However it’s really slower when batch_size is set to 1 to read data. May the I/O limitation of data reading is the reason of difference?


In PyTorch, when loading the dataset, there is a warning. I find I can’t use the dataset.
when running “mnist_train[0][0].shape” would give an error:
TypeError: array() takes 1 positional argument but 2 were given

How to solve this?::pensive:

应该是接口变了吧
参考https://blog.csdn.net/weixin_42468475/article/details/108714940

X, y = next(iter(data.DataLoader(data.TensorDataset(mnist_train.data, mnist_train.targets), batch_size=18)))