Densely Connected Networks (DenseNet)

My weird answers:


  1. Why do we use average pooling rather than maximum pooling in the transition layer?
  • so that theother cells may not be eliminated and have some value addition in the model
  1. One of the advantages mentioned in the DenseNet paper is that its model parameters are

smaller than those of ResNet. Why is this the case?

Densenet -is an extension of resnet

difference is densenet uses conactenation while resnet uses addition

  • because of less number of linear layers, more memory is utilised instead disk space( I am ssuming because of the follow up question given below)

Total params: 758,226

Trainable params: 758,226

Non-trainable params: 0


Input size (MB): 0.04

Forward/backward pass size (MB): 14.29

Params size (MB): 2.89

Estimated Total Size (MB): 17.22

This is the disk space, while the memory used in 3.1 GB.

  1. One problem for which DenseNet has been criticized is its high memory consumption.

    1. Is this really the case? Try to change the input shape to 224 × 224 to see the actual GPU

    memory consumption.

    Total params: 758,226
    Trainable params: 758,226
    Non-trainable params: 0
    Input size (MB): 0.19
    Forward/backward pass size (MB): 77.81
    Params size (MB): 2.89
    Estimated Total Size (MB): 80.89
    1. Can you think of an alternative means of reducing the memory consumption? How

    would you need to change the framework?

    • no Idea. Maybe focus on storing varibale rather than using it in the memory.
  2. Implement the various DenseNet versions presented in Table 1 of the DenseNet paper

(Huang et al., 2017).

  1. Design an MLP-based model by applying the DenseNet idea. Apply it to the housing price

prediction task in Section 4.10.

1 Like

I guess one way to make densenet work for mlp would be replacing the conv2d blocks by linear ones.

class dense_block_linear(nn.Module):
    def __init__(self, num_convs, input_channels, num_channels):
        super(dense_block, self).__init__()
        layers = []
        for i in range(num_convs):
            layers.append(nn.Linear(input_channels + num_channels* i, num_channels)) # Linear
        = nn.Sequential(*layers)
    def forward(self, X):
        for layer in
            y = layer(X)
            X =,y), dim=1)
        return X

Thats one Idea, though I wonde rif we can use a convolutional network for use in regression.

IMO, It should work. I am a newbie in this field so please let me know if my understanding is flawed. The main thing I understood is that the purpose of residual connection is not to lose the basis functions of the previous layer (we can think output of a hidden layer to be basis functions/ features in the hyperspace in the neighborhood of the input). Without it, when you apply your non-linear transformation of a layer on it’s inputs you create new & more complex basis functions/ features, but you lose the original features.

One analogy that comes to my mind is the basis expansion when we do polynomial regression. We do not throw away the linear features when we expand our basis to higher order polynomials. We use all of the features in regression (linear + higher order). Residual connection perform a similar function in the context of NeuralNet.

My solutions to the exs: 8.7

1 Like