Implementation of Recurrent Neural Networks from Scratch

so, where is the code which have the function of detaching the gradient

@terrytangyuan, in TF do we need to use ?

Why tensorflow version’s PPL keeps so high and bumpy? even sets lr=0.0001 and uses Adam optimizer?
Something goes wrong?

I have fixed the bug, Just transpose Y accordingly (because we have transposed X):

Then the training result is normal! (perplexity =1.0)

great. PR please:


It was a nightmare when I read this chapter’s source code. Why did you make things so complicated?