Numerical Stability and Initialization

Dear all, may I know why do we want to keep the variance fixed? In other words, how keeping variance fixed help to solve the Vanishing and Exploding Gradients issue? Thanks.

Hi @Gavin, great question! If weights are too small of too large, their gradients will be problematic as we elaborate here. Let me know if it is not clear enough.

1 Like

Hi, awesome and detailed explanation of the numerical stability concept ! I have one question though: isn’t the Xavier initialization outdated since the tanh activation function was used during its creation? Isn’t the He initialization more suited for the mentioned relu activation function? Thanks in advance