Adagrad

astonzhang · June 29, 2020, 10:17pm

https://d2l.ai/chapter_optimization/adagrad.html

Tan_Phan · December 3, 2021, 5:21pm

Beginning of this chapter, we have:

Imagine that we are training a language model. To get good accuracy we typically want to decrease the learning rate as we keep on training, usually at a rate of O(t−12)O(t−12) or slower.

It make me confuse about this number. Can anyone show more explained?