图像分类数据集

http://zh-v2.d2l.ai/chapter_linear-networks/image-classification-dataset.html

1.Yes. With the batch_size 1, it took me 37.22 seconds to read the dataset.
2.Pass.
3.e.g. ImageNet, Qmnist and Kinetics-400.

It is so confusing that you answered the question in English, but you didn’t read the English version directly.
https://d2l.ai/

我本来觉得阅读中文版速度会快一些,但是目前看来中文版的翻译有些地方还是有点晦涩。可能之后转向看英文版?

读取性能随batch_size大小的影响图。可以看到256确实是个很合理的batch_size。
Figure_1

3 Likes

mnist_train[0][0].shape
I want to know what the numbers in the two square brackets represent. Can someone help me?

It’s the index of the tensor, which is line 0 col 0.
mnist_train[0][0].shape #this line is used to print the shape of a typical tensor in the data set(which is train set in this case).
The output is torch.Size([1, 28, 28]) .Represents ([channal,height,width]) .
It’s a grayscale image, so it’s channal will be 1.

train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,
num_workers=get_dataloader_workers())

运行到这段程序时,一旦 num_workers>0就会报错,请问如何解决?
使用if name == ‘main’:也不行

download

Ax_x is batch size, Ax_y is the time cost. It’s true larger batch size the time cost is much less.

我也遇到了相同的问题

把含有多线程操作的部分放在这个里面可以实现

if __name__ == '__main__':
    train_iter = data.DataLoader(mnist_train, batch_size, shuffle=True,
                             num_workers=get_dataloader_workers())

    timer = d2l.Timer()
    for X, y in train_iter:
        continue
    print(f'{timer.stop():.2f} sec')

104.9%
Downloading http://fashion-mnist.s3-website.eu-central-1.amazonaws.com/train-images-idx3-ubyte.gz
Downloading http://xxx.xx.xxx.xx/ to …/data/FashionMNIST/raw/train-images-idx3-ubyte.gz


RuntimeError Traceback (most recent call last)
/tmp/ipykernel_3542705/3693593615.py in
2 # 并除以255使得所有像素的数值均在0到1之间
3 trans = transforms.ToTensor()
----> 4 mnist_train = torchvision.datasets.FashionMNIST(
5 root="…/data", train=True, transform=trans, download=True)
6 mnist_test = torchvision.datasets.FashionMNIST(

~/miniconda3/envs/d2l/lib/python3.8/site-packages/torchvision/datasets/mnist.py in init(self, root, train, transform, target_transform, download)
85
86 if download:
—> 87 self.download()
88
89 if not self._check_exists():

~/miniconda3/envs/d2l/lib/python3.8/site-packages/torchvision/datasets/mnist.py in download(self)
174 try:
175 print(“Downloading {}”.format(url))
–> 176 download_and_extract_archive(
177 url, download_root=self.raw_folder,
178 filename=filename,

~/miniconda3/envs/d2l/lib/python3.8/site-packages/torchvision/datasets/utils.py in download_and_extract_archive(url, download_root, extract_root, filename, md5, remove_finished)
411 filename = os.path.basename(url)
412
–> 413 download_url(url, download_root, filename, md5)
414
415 archive = os.path.join(download_root, filename)

~/miniconda3/envs/d2l/lib/python3.8/site-packages/torchvision/datasets/utils.py in download_url(url, root, filename, md5, max_redirect_hops)
149 # check integrity of downloaded file
150 if not check_integrity(fpath, md5):
–> 151 raise RuntimeError(“File not found or corrupted.”)
152
153

RuntimeError: File not found or corrupted.

The mnist_train[index] is a turple. The turple contains the image data and its label.So you need set 0or1 in the second square bracket to get the image data or its label.