Automatic Parallelism

anirudh · January 4, 2021, 10:14pm

https://d2l.ai/chapter_computational-performance/auto-parallelism.html

Nima_Tajbakhsh · April 27, 2021, 5:49pm

For multi-SPU training, shouldn’t we use the aggregated gradients to update the weights? The 3rd and 6th blue boxes in Fig. 12.3.1 suggest that weight update is based on unaggregated grad from GPU0. Please advise.

imflash217 · February 26, 2022, 3:21am

@Nima_Tajbakhsh, Yes, I was also confused by this. I think its a typo.
I have attached the corrected version below (in yellow boxes).
Hope it helps.