Gradient Descent - mxnet - D2L Discussion

Jul '20

nxby

Hi, I would like to ask a question on the formula (11.3.12). Why do we calculate the gradient of the vector x instead of the function f(x)? Should it be “x←x−ηdiag(Hf)^(−1)∇f(x)” here? Thank you!

1 reply

Jul '20 ▶ nxby

goldpiggy

Great catch @nxby! Would you like to post a PR and be a contributor?

1 reply

Jul '20 ▶ goldpiggy

nxby

Thank you @goldpiggy! I’ve just made a pull request.

Dec '20

Hi, I wonder if there is a typo in the sentence just below the formula (11.3.11): "Plugging in the update equations leads to the following bound e_{k+1} <= e^2_k f'''(\xi_k)/f'(x_k)‘’, instead of “e_{k+1} <= e^2_k f'''(\xi_k)/f'(x_k)”, shouldn’t it be “e_{k+1} <= \frac{1}{2} e^2_k f'''(\xi_k)/f''(x_k)”? Thanks a lot for your attention.

1 reply

Apr '21 ▶ wwwu

astonzhang

Thanks. I revised this part recently and made it slightly different from the previous version. Just let me know if you spot any issue.

Aug '21

JXCpNTDBU

The Peano R_n of Taylor expansion got one extra power, which was wrong.

May '24

Denis_Kazakov

Re 12.3.3.4. Gradient Descent with Line Search
What is meant by binary search here? It is normally used for ordered sequences. Boyd and Vandenberghe [2004] do not mention binary search.
Exercise 2.1 is also not clear. Why do we need to pick half-intervals while the method requires selecting learning rate (a number)?