Large-Scale Pretraining with Transformers

https://d2l.ai/chapter_attention-mechanisms-and-transformers/large-pretraining-transformers.html