Bahdanau Attention

sach · December 29, 2022, 8:23pm

Because the last hidden state of the encoder serves as the initial hidden state of the decoder: $$s_0$$.

pandalabme · September 10, 2023, 9:07am

My solutions to the exs: 11.4

AhmedOumar · March 7, 2024, 5:44pm

How did you change from GRU to LSTM? I got an error!
Thanks for your help.

Riezmann75 · December 9, 2024, 4:33am

In this chapter, I got confused when comparing the implementation with the chapter 10.7. Specifically, this chapter uses a for loop in decoder step to predict the next token using the previous token hidden state instead of parallel computing as in the previous chapter. This may leads to significant higher training time.