随机梯度下降

https://zh.d2l.ai/chapter_optimization/sgd.html

11.4.6 应该是\partial_x f(\xi_t, x_t)吧,导数都是针对特定点的x_t or x_{t+1}