Bahdanau Attention

astonzhang · September 17, 2020, 4:53am

https://d2l.ai/chapter_attention-mechanisms-and-transformers/bahdanau-attention.html

Lefan · March 3, 2021, 5:14pm

Hi,

In chapter 10.4.1:

where the decoder hidden state st′−1st′−1 at time step t′−1t′−1 is the query, and the encoder hidden states htht are both the keys and values,

However, in the code implementation:

context = self.attention(
query, enc_outputs, enc_outputs, enc_valid_lens)

I think that the keys and values are enc_outputs instead of the encoder hidden states h_t. Is it a mistake?
Please correct me if I am wrong!

xiaoxi · March 27, 2021, 8:57am

Yes，I’m confused，too!

jwyao · April 8, 2021, 7:35pm

I think you have already pointed out the difference: s_{t’-1} is the decoder hidden state. It is the input of
" out, hidden_state = self.rnn(x.permute(1, 0, 2), hidden_state) "
I don’t think the encoder hidden state h_t is used as a decoder hidden state. It is clearly stated that h_t is a key.

min_xu · April 16, 2021, 2:48am

in my opinion

stephenllh · May 28, 2021, 3:06pm

I wonder why there is an arrow pointing from the recurrent layer of the encoder towards to recurrent layer of the decoder. I thought the context variable is already handled by the attention mechanism?

HKAB · June 6, 2021, 4:20am

I find out that we can save a lot of GPU memory by using:

hidden_state.detach()

in Seq2SeqAttentionDecoder class, forward function when we loop for each num_steps.
More information: Michael Jungo answer

BH_L · June 7, 2021, 10:53am

Change from adaptive attention to dot product attention will increase the training speed. Change from GRU to LSTM will decrease the training speed.

HeartSea15 · June 24, 2021, 1:20am

Moon · July 3, 2021, 1:53pm

The enc_outputs in the encoder are totally equal to the hidden states, because they haven’t gone into the FC layer.
You can’t use the variable “hidden_state” of encoder since it only record the state of the final time step.

Wind · July 30, 2021, 3:05am

I wonder why state[0] and state[1] have different shapes.

shawn_yuan · August 12, 2021, 8:59am

there is a problem in Seq2SeqAttentionDecoder.init_state ‘enc_outputs’ should be changed as ‘outputs’

lihaobit · August 16, 2021, 12:28pm

中文版能上tensorflow的代码吗？
另外，tensorflow对版本有要求吗，我用的是1.15，运行代码有报错

Bay_Leaf · October 10, 2021, 1:53am

same feeling here. Moreover if you want to have different number of hidden dim in encoder and decoder , then you cannot do this operation.

BlickWinkel · October 11, 2021, 5:21am

You are right. I was also confused, then I checked that the final element of outputs and hidden_state are actually equal. Thank you!

howardchina · November 1, 2021, 11:24am

The enc_outputs in the encoder actually are a little different from hidden_states, because the enc_outputs is the last layer of hidden_states at each time step. And hidden_states have only kept the parameters on the last timestep.
More specifically, enc_outputs shape is (num_timestep, num_batch, num_hiddens), while hidden_states shape is (num_layer, num_batch, num_hiddens) here.
The enc_outputs[-1, …] equals to hidden_state[-1, …].

Moon · November 1, 2021, 12:04pm

Yes, you are right. We should figure out the true hidden state and the variable “hidden_state”. You know, every time step has a hidden state, and it equals to the output if there isn’t a FC layer. Then the variable “hidden_state” just mean the state of last time step.

chengbaitai · November 8, 2021, 9:56am

d2l模块里面没有Decoder呢？大家的都有吗，我去github上看也没有找到呢

KevinYan · November 18, 2021, 1:26pm

do you mean the arrow with a red cross? same feeling here.

striker · March 9, 2022, 8:12am

conda activate d2l
在d2l这个env里就有了