Long Short Term Memory (LSTM)


Please add tensorflow implementation.

请问沐神,我自己构建了一个模型,但是进行梯度裁剪的时候,报错 是 param.grad is NoneType 是为什么呢

post in English and all your code next time…
It is hard to understand what you are running.

1 Like

How to understand LSTMs are the prototypical latent variable autoregressive model with nontrivial state control?
The sentence is in the fourth line above 9.2.4 Summary

I’ll try and break it down a bit:

prototypical ~= first
latent variable ~= hidden variable
autoregressive ~= depends on its own previous values
nontrivial ~= non-simple
state control ~= LSTMs use a gating system to control the hidden state. A vanilla RNN doesn’t have any gating controls. LSTM gating is more complex than GRU gating.

Hope this helps!

1 Like

Could you provide an explanation why LSTM networks don’t suffer (or at least suffer less) from vanishing gradient problem ?
I understand the idea with gates and how we can control the information flow thanks to it, but still I feel that the problem with gradients may still occur… After all, it’s a recurrent computation so maybe there still is the factor with many matrix multiplications ( I mean W^t)