according to https://arxiv.org/pdf/1512.03385.pdf i think there is a 1000-d fully connected softmax layer missing at in the last part of the model
Hi Ehsan. Maybe I’m correct about this, so take it with a grain of salt. The ResNet architecture was trained with the ImageNet 2012 classification dataset, this dataset includes 1000 different classes, so that is way the last Fully connected layer they used is a 1000-FC. In the case of the book, we are using a 10 classes dataset for faster learning times, that’s why we use a 10-FC layer instead. Hope this helps !!!
In relation of the softmax function, I don’t know why this was skipped.
I the last layer the resnet use GlobalAveragePooling2D instead of Dense layer? anyone know the reason why do we use GlobalAveragePooling2D is getting better?
@thainq ResNet actually has one dense layer after pooling to compute logits, btw.
GlobalAveragePooling2D
aggregates all the computed hidden representations prior to Dense
. You can replace it with another Dense
layer but this will flatten all the representations without aggregating them across channels. As a result, you have even more parameters to train. Give this a try and compare your results with the original implementation.
that mean the GlobalAveragePooling2D is able to retain the spatial information of input better than Den layer?