Beginning of this chapter, we have:
Imagine that we are training a language model. To get good accuracy we typically want to decrease the learning rate as we keep on training, usually at a rate of O(t−12)O(t−12) or slower.
It make me confuse about this number. Can anyone show more explained?