Long Short Term Memory (LSTM)

astonzhang · June 29, 2020, 10:01pm

https://d2l.ai/chapter_recurrent-modern/lstm.html

rohitatp · September 20, 2020, 11:45am

Please add tensorflow implementation.

Yuzhi · October 16, 2020, 7:10am

请问沐神，我自己构建了一个模型，但是进行梯度裁剪的时候，报错是 param.grad is NoneType 是为什么呢

StevenJokess · October 17, 2020, 3:02am

post in English and all your code next time…
It is hard to understand what you are running.
@Yuzhi

Youarerare · November 28, 2020, 11:54am

How to understand LSTMs are the prototypical latent variable autoregressive model with nontrivial state control?
The sentence is in the fourth line above 9.2.4 Summary

six · March 4, 2021, 3:48pm

I’ll try and break it down a bit:

prototypical ~= first
latent variable ~= hidden variable
autoregressive ~= depends on its own previous values
nontrivial ~= non-simple
state control ~= LSTMs use a gating system to control the hidden state. A vanilla RNN doesn’t have any gating controls. LSTM gating is more complex than GRU gating.

Hope this helps!

Felipe · March 27, 2022, 10:28am

Could you provide an explanation why LSTM networks don’t suffer (or at least suffer less) from vanishing gradient problem ?
I understand the idea with gates and how we can control the information flow thanks to it, but still I feel that the problem with gradients may still occur… After all, it’s a recurrent computation so maybe there still is the factor with many matrix multiplications ( I mean W^t)