自己模拟了一下AlexNet,
设置的0.01学习率,为什么这么震荡呢?是学习率太大的原因吗
I have a question that the input is 224x224(r), and the k=11, p=1, s=4, resulting an output tensor 54x54 in the article.(x means multiply)
But as the formula is:
(r+px2-k)/s+1=(224+1x2-11)/4+1=54.75, so how to deal with the 0.75 in the real condition, or just ignore the little question?
thank you!