Gradient Descent

astonzhang · June 29, 2020, 10:16pm

https://d2l.ai/chapter_optimization/gd.html

nxby · July 30, 2020, 8:47am

Hi, I would like to ask a question on the formula (11.3.12). Why do we calculate the gradient of the vector x instead of the function f(x)? Should it be “x←x−ηdiag(Hf)^(−1)∇f(x)” here? Thank you!

goldpiggy · July 30, 2020, 5:55pm

Great catch @nxby! Would you like to post a PR and be a contributor?

nxby · July 30, 2020, 6:55pm

Thank you @goldpiggy! I’ve just made a pull request.

wwwu · December 25, 2020, 8:03am

Hi, I wonder if there is a typo in the sentence just below the formula (11.3.11): "Plugging in the update equations leads to the following bound e_{k+1} <= e^2_k f'''(\xi_k)/f'(x_k)‘’, instead of “e_{k+1} <= e^2_k f'''(\xi_k)/f'(x_k)”, shouldn’t it be “e_{k+1} <= \frac{1}{2} e^2_k f'''(\xi_k)/f''(x_k)”? Thanks a lot for your attention.

astonzhang · April 2, 2021, 10:50pm

Thanks. I revised this part recently and made it slightly different from the previous version. Just let me know if you spot any issue.

JXCpNTDBU · August 30, 2021, 10:01am

The Peano R_n of Taylor expansion got one extra power, which was wrong.

Denis_Kazakov · May 22, 2024, 8:37pm

Re 12.3.3.4. Gradient Descent with Line Search
What is meant by binary search here? It is normally used for ordered sequences. Boyd and Vandenberghe [2004] do not mention binary search.
Exercise 2.1 is also not clear. Why do we need to pick half-intervals while the method requires selecting learning rate (a number)?