Chapter 8.5: Are the gradients reset to zero every training step?

sstroobants · April 12, 2021, 10:35am

I am trying to implementing my own RNN/GRU, but if I don’t reset the gradients (with .zero_grad()), the network is not converging.
Am I correct that in the implementation in chapter 8.5 the gradients are not reset, so it is not necessary since the hidden states are detached, or are they actually reset?