Momentum

https://d2l.ai/chapter_optimization/momentum.html

Is it really due to noisy gradients that optimization can fail if the learning rate is too small or big? Even if there is no noise, too big or small learning rate can fail to converge.