Sequence to Sequence Learning

https://d2l.ai/chapter_recurrent-modern/seq2seq.html

I am curious in predict_s2s_ch9, the context information is different from that of training process. In training process, we use the last hidden layer of encoder as context information. However, in predict process, we in fact use the previous hidden layer as the context for current step. Isn`t it a discrepancy?

I don’t think it is a discrepancy.
In my opinion, :grimacing:
We use hidden layer to remember what happened in the past.
When we train, it looks like we are remembering because all context information stored in hidden layer.
When we test, it likes we are recalling because we use the context information stored in hidden layer to predict.

I agree with you that, no matter in training or in test, we use hidden layer to remember the context information. However, in the training process, the context is never updatated after it was generated at the last step of encoder(You can see it was broadcast to num_steps copies and concatenate with the deconder input). In the test process, predict_s2s_ch9 updates this context information every time step. If we see Fig. 9.7.3, predict_s2s_ch9 updates the context information every time, rather than keep it as constant all the way in prediction.

@jackychen718

As far as I understand, for prediction, each input sequence is a new sequence with
batch size = 1. And decoder context is updated once and only once after input
sequence has been encoded.

During training, as I can see in source code, each new source sequence is encoded then decoder context is updated from part of this encoded state. Each new source sequence leads to decoder context update.

For decoder context update with encoder hidden state,
I think the hard-coded layer value, 1, in Seq2SeqDecoder.init_state() method represents the latest layer when and only when number of RNN layers is 2.

1 Like

Is there a tensorflow implementation for seq2seq, I try to find it but didn’t see.

Hi @KhoiLe, maybe this one can help: https://www.tensorflow.org/tutorials/text/nmt_with_attention!

1 Like