Why do we use average pooling rather than max pooling in the transition layer?
Hi @smizerex, great question! Max pooling was shown to have better performance than average pooling in AlexNet paper. While this is an architecture design choice, you are not limited to it.
The notation in ( 7.7.1) is misleading. It says that:
For the point x=0 it can be written as
f(x) = f(0) + f’(0)x …
therefore x on the RHS is confusing (as we’ve replaced it with 0).
The correct notation should use two variables, for example:
For the point a=0 it can be written as
f(x) = f(a) + f’(a)x + …
f(x) = f(0) + f’(0)x …

Performance reasons. Other than that, , I think we’d like to take contributions from all other earlier layers, since the outputs are concatenated. Is that a right idea?

Model params are small because Each input and output works with small parameters. They combine to form bigger values instead of larger parameters. (There are a lot of parameters, so this works yeah?)

 I dont have a working GPU
 Dropout? One way to change the framework would be to add at some places and concatenate at other places. How will that work out?
similar to PageRank, use the word ‘contribution’ is very correct, or say ‘information’.
I think this DenseNet is the NN which closest to human brain in previous CNN cases.
@goldpiggy I found this call costs most time and saw that it will copy data back to cpu, right? but even if it was that it still seems unacceptable .