Densely Connected Networks (DenseNet)

My weird answers:


  1. Why do we use average pooling rather than maximum pooling in the transition layer?
  • so that theother cells may not be eliminated and have some value addition in the model
  1. One of the advantages mentioned in the DenseNet paper is that its model parameters are

smaller than those of ResNet. Why is this the case?

Densenet -is an extension of resnet

difference is densenet uses conactenation while resnet uses addition

  • because of less number of linear layers, more memory is utilised instead disk space( I am ssuming because of the follow up question given below)

Total params: 758,226

Trainable params: 758,226

Non-trainable params: 0


Input size (MB): 0.04

Forward/backward pass size (MB): 14.29

Params size (MB): 2.89

Estimated Total Size (MB): 17.22

This is the disk space, while the memory used in 3.1 GB.

  1. One problem for which DenseNet has been criticized is its high memory consumption.

    1. Is this really the case? Try to change the input shape to 224 × 224 to see the actual GPU

    memory consumption.

    Total params: 758,226
    Trainable params: 758,226
    Non-trainable params: 0
    Input size (MB): 0.19
    Forward/backward pass size (MB): 77.81
    Params size (MB): 2.89
    Estimated Total Size (MB): 80.89
    1. Can you think of an alternative means of reducing the memory consumption? How

    would you need to change the framework?

    • no Idea. Maybe focus on storing varibale rather than using it in the memory.
  2. Implement the various DenseNet versions presented in Table 1 of the DenseNet paper

(Huang et al., 2017).

  1. Design an MLP-based model by applying the DenseNet idea. Apply it to the housing price

prediction task in Section 4.10.

I guess one way to make densenet work for mlp would be replacing the conv2d blocks by linear ones.

class dense_block_linear(nn.Module):
    def __init__(self, num_convs, input_channels, num_channels):
        super(dense_block, self).__init__()
        layers = []
        for i in range(num_convs):
            layers.append(nn.Linear(input_channels + num_channels* i, num_channels)) # Linear
        = nn.Sequential(*layers)
    def forward(self, X):
        for layer in
            y = layer(X)
            X =,y), dim=1)
        return X

Thats one Idea, though I wonde rif we can use a convolutional network for use in regression.