Large-Scale Pretraining with Transformers

astonzhang · June 27, 2022, 2:36am

https://d2l.ai/chapter_attention-mechanisms-and-transformers/large-pretraining-transformers.html

tryst · January 9, 2023, 5:45am

Fig. 11.9.7.

Je suis malade = I am sick

Pantalaymon · January 13, 2023, 4:26pm

Yes.
And “He’s calm” = “il est calme” | “Elle court” = “She’s running”

ubayram · April 5, 2023, 1:53pm

I’m pretty sure there will be some additions to this section about GPT-4 and I’m looking forward to reading that! While that happens, may I ask the admins to check whether the Figure with the description “Zero-shot, one-shot, few-shot in-context learning with language models…” is correctly depicting what’s happening in the Decoder when it says “(no parameter update).”

If I’m not misunderstanding the concepts, I think when we’re giving the pre-trained Decoder - a task, an example, and a prompt - and ask it to learn, thus fine-tune the model, we are indeed updating the Decoder parameters so that it “learns” how to translate sequences. Is that not true? What am I missing here?

ToddMorrill · June 8, 2023, 4:41pm

I think the decoder in Fig. 11.9.3 should start with a beginning of sentence tag (e.g. <bos>). At inference time, we won’t know what the first decoded token will be (currently it shows <X>), hence why we would start decoding with <bos>.

unlike · June 21, 2023, 1:16am

I think the key is that, this doesn’t involving back propagating process.

pandalabme · September 11, 2023, 10:42am

My solutions to the exs: 11.9

mitchmatic · November 23, 2023, 2:30pm

I might be misunderstanding, but shouldn’t the sentence:

Following the autoregressive language model training as described in Section 9.3.3, Fig. 11.9.6 illustrates GPT pretraining with a Transformer encoder, …

…say decoder?

DerJustus · June 19, 2024, 9:31am

Thought the same, requested here: Transformer encoder -> Transformer decoder by MassEast · Pull Request #2606 · d2l-ai/d2l-en · GitHub

liuyunqinghkust · June 25, 2024, 8:12am

Yes, I think you are right.