https://d2l.ai/chapter_attention-mechanisms-and-transformers/large-pretraining-transformers.html
Fig. 11.9.7.
Je suis malade = I am sick
Yes.
And “He’s calm” = “il est calme” | “Elle court” = “She’s running”
I’m pretty sure there will be some additions to this section about GPT-4 and I’m looking forward to reading that! While that happens, may I ask the admins to check whether the Figure with the description “Zero-shot, one-shot, few-shot in-context learning with language models…” is correctly depicting what’s happening in the Decoder when it says “(no parameter update).”
If I’m not misunderstanding the concepts, I think when we’re giving the pre-trained Decoder - a task, an example, and a prompt - and ask it to learn, thus fine-tune the model, we are indeed updating the Decoder parameters so that it “learns” how to translate sequences. Is that not true? What am I missing here?