Parameter Servers

astonzhang · June 29, 2020, 10:25pm

https://d2l.ai/chapter_computational-performance/parameterserver.html

gembancud · May 4, 2021, 12:53pm

I’m a bit confused as to how ring synchronization becomes (n-1)/n. From what I understand, If you aggregate chunks you will need to have a full pass (n-1) on all gpus to aggregate a specific chunk. Could you please clarify?

Hyeonggyu_Kim · July 12, 2021, 3:03pm

I think you should know each GPU sends a chunk to next GPU ‘simultaneously.’
Thus, it suffices to consider the overhead of one GPU.
‘Each chunk is of size 1/n’
Each GPU sends the chunk n-1 times.
Thus, time is O((n-1)/n) w.r.t the number of GPUs.

xiaojinghu93 · February 5, 2022, 5:16pm

I am a bit confused of the ring synchronization algorithm. After (n-1)/n time, each GPU should have only an aggregated chunk, not the full gradient. Is that right?

ning_ke · October 20, 2022, 6:40am

If we use the same example of synchronizing 160 MB across 8 V100 GPUs we arrive at approximately 2⋅160MB/(3⋅18GB/s)≈6ms.can you explain why？ I think the value is 2⋅160MB/(18GB/s)≈24 ms