Densely Connected Networks (DenseNet)

Why do we use average pooling rather than max pooling in the transition layer?

Hi @smizerex, great question! Max pooling was shown to have better performance than average pooling in AlexNet paper. While this is an architecture design choice, you are not limited to it. :wink:

1 Like

The notation in ( 7.7.1) is misleading. It says that:

For the point x=0 it can be written as
f(x) = f(0) + f’(0)x …

therefore x on the RHS is confusing (as we’ve replaced it with 0).
The correct notation should use two variables, for example:

For the point a=0 it can be written as
f(x) = f(a) + f’(a)x + …
f(x) = f(0) + f’(0)x …

  1. Performance reasons. Other than that, , I think we’d like to take contributions from all other earlier layers, since the outputs are concatenated. Is that a right idea?

  2. Model params are small because Each input and output works with small parameters. They combine to form bigger values instead of larger parameters. (There are a lot of parameters, so this works yeah?)

    1. I dont have a working GPU :sweat_smile:
    2. Dropout? One way to change the framework would be to add at some places and concatenate at other places. How will that work out? :confused:

similar to PageRank, use the word ‘contribution’ is very correct, or say ‘information’.
I think this DenseNet is the NN which closest to human brain in previous CNN cases.

@goldpiggy I found this call costs most time and saw that it will copy data back to cpu, right? but even if it was that it still seems unacceptable .