使用块的网络(VGG)

不要用默认的github版本的d2l,限定一个版本号:
!pip install d2l==0.17

def vgg(conv_arch):
conv_blks = []
in_channels = 1
# 卷积层部分
for (num_convs, out_channels) in conv_arch:
conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
in_channels = out_channels

return nn.Sequential(
    *conv_blks, nn.Flatten(),
    # 全连接层部分
    nn.Linear(out_channels * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
    nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
    nn.Linear(4096, 10))

net = vgg(conv_arch)

请问在这部分中,in_channels有变化吗?我理解的python参数传递单个变量是不会修改的,但如果没有变化,第二块开始时输入通道就不再是1了

有没有尝试用mps跑啊,感觉会快一点,不过我当时也花了一个小时左右

图像尺寸改为96后训练同样轮数准确率变低了,这是为什么

question1:剩余三层包含在后面三个模块里面
question2:参数大小决定了占用显存大小,VGG11与AlexNet区别主要是卷积层和第一层线性层的大小
AlexNet中
第一层卷积层卷积核参数个数:11x11x3x96=34848
汇聚层没有参数所以几乎不占任何显存
第二层卷积层卷积核参数个数:5x5x96x256=614400
第三层卷积层卷积核参数个数:3x3x256x384=884736
第四层卷积层卷积核参数个数:3x3x384x384=1327104
第五层卷积层卷积核参数个数:3x3x384x256=884736
第一层全连层参数(权重+偏移):6400x4096+4096=26218496

参数总数为=3745824+26218496=29964320
VGG11中
第一层卷积层卷积核参数个数:3x3x3x64=1728
第二层卷积层卷积核参数个数:3x3x64x128=73728
第三层卷积层卷积核参数个数:3x3x128x256=294912
第四层卷积层卷积核参数个数:3x3x256x256=589824
第五层卷积层卷积核参数个数:3x3x256x512=1179648
第六层卷积层卷积核参数个数:3x3x512x512=2359296
第七层卷积层卷积核参数个数:3x3x512x512=2359296
第八层卷积层卷积核参数个数:3x3x512x512=2359296
第一层全连接层参数(权重+偏移):7x7x512x4096+4096=102764544
参数总数=9216000+102764544=111980544
可以看出VGG11参数总数是AlexNet的三倍左右,所以需要占用更多显存。
question4:
按照论文构建VGG16网络
image

def vgg_block(num_convs, in_channels, out_channels):
    layers = []
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels, out_channels,
                                kernel_size=3, padding=1))
        layers.append(nn.ReLU())

        in_channels = out_channels
    if out_channels >= 256:
        layers.append(nn.Conv2d(out_channels, out_channels, kernel_size=1))
        layers.append(nn.ReLU())
    layers.append(nn.MaxPool2d(kernel_size=2, stride=2))
    return nn.Sequential(*layers)
def vgg16(conv_arch):
    conv_blks = []
    in_channels = 1
    # 卷积层部分
    for (num_convs, out_channels) in conv_arch:
        conv_blks.append(vgg_block(num_convs, in_channels, out_channels))
        in_channels = out_channels

    return nn.Sequential(
        *conv_blks, nn.Flatten(),
        # 全连接层部分
        nn.Linear(out_channels * 7 * 7, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 4096), nn.ReLU(), nn.Dropout(0.5),
        nn.Linear(4096, 10))
conv_arch = ((2, 64), (2, 128), (2, 256), (2, 512), (2, 512))
ratio = 4
small_conv_arch = [(pair[0], pair[1] // ratio) for pair in conv_arch]
net = vgg16(small_conv_arch)

训练结果比VGG11效果要好
loss 0.114, train acc 0.958, test acc 0.929
Figure_1

def vgg_block(num_convs, in_channels, out_channels):
    layers = []
    for _ in range(num_convs):
        layers.append(nn.Conv2d(in_channels, out_channels,
                                kernel_size=3, padding=1))
        layers.append(nn.ReLU())
        in_channels = out_channels
    layers.append(nn.MaxPool2d(kernel_size=2,stride=2))
    return nn.Sequential(*layers)

上述代码中,in_channels = out_channels有必要吗,因为每次调用vgg_block,in_channels只用一次,再更新它有必要吗