Hi, I wonder whether the expectation operator of the term E[R[w_t]] in equations (11.4.12) and (11.4.13) is unnecessary. And the “E” in (11.4.15) and (11.4.16) seems to be “R”. Thanks a lot.
In inequality 11.4.12, I guess we imply that
E_wt[l(xt, wt)] >= E_wt[E_xt[l(xt, wt)]] = E_wt[R(wt)]
If this is the case, I would appreciate to see a more thorough explanation.
In 11.4.15 and 16, it should be E[R(\bar{wt})]
instead of E[\bar{wt}]
. After all, we seek an upper bound for the deviation of the expected value of the risk from the minimum risk, which we obtain in 11.4.16.
Hi @wwwu and @sanjaradylov, thanks for the discussions. We’ve just revised the proof and it can be previewed at http://preview.d2l.ai.s3-website-us-west-2.amazonaws.com/d2l-en/master/chapter_optimization/sgd.html
Just let me know if you have any further questions on it.
I think the first term in Eq.(11.4.17) should be:
E[R(x¯)]-R* ≤ …