Convolutional Neural Networks (LeNet)

http://d2l.ai/chapter_convolutional-neural-networks/lenet.html

“Each 2×2 pooling operation (stride 2) reduces dimensionality by a factor of 4 via spatial downsampling”. From 28x28 to 14x14, how does it reduce by a factor of 4? Since the dimensionality of a matrix is rows x cols, is it, 28x28=784; 14x14=196, hence 784-196=588 and 588 is divisible by 4, so it reduces by a factor of 4? Sorry for asking a silly question.

Hey @rezahabibi96, your question is not silly at all. Asking question is always better than keeping quiet! The origin size is 28x28=784, and the after pooling size is 14x14=196. If we calculate 784 / 196 = 4, that is where the factor “4” coming from!

1 Like

How would we do #4 for the exercises in 6.6?

“display the activation functions” (ie sweaters and coats)?

Can we just interject visualize_activation(mx.gluon.nn.Activation('sigmoid')) somewhere and it work?

Hey @smizerex, sorry for a bit confusing here. It was asking “Display the features after the first and second convolution layers of LeNet for different inputs (e.g., sweaters and coats).” Let me know if that makes sense to you.

The input was 28x28 , We are applying a 2x2 pooling operation of stride 2 , Here stride simply means how many cells were shifted horizontally after the first pooling and vertically after the first horizontal pooling is finished . The output after performing pooling is 14x14,
The formula behind is (28-2)/2+1=14
Here 1 is the bias term and the denominator term denotes the stride .So we finally get a output of size 14x14.


I have a problem with the activations of the first two conv layers; it appears that the first layer shows sort of a complete object (e.g. shoe) rather than showing edges. I understand that the first layers are meant to capture simple features like edges??!

Hi @osamaGkhafagy, excellent question! In general, the earlier layers try to capture the local features (such as the edge by the color contrast). However, it doesn’t need to be necessary an edge, it can be any low level coarse features for the later layers to learn.

2 Likes