Gradient Descent

https://d2l.ai/chapter_optimization/gd.html

Hi, I would like to ask a question on the formula (11.3.12). Why do we calculate the gradient of the vector x instead of the function f(x)? Should it be “xx−ηdiag(Hf)^(−1)∇f(x)” here? Thank you!

Great catch @nxby! Would you like to post a PR and be a contributor?

Thank you @goldpiggy! I’ve just made a pull request.

Hi, I wonder if there is a typo in the sentence just below the formula (11.3.11): "Plugging in the update equations leads to the following bound e_{k+1} <= e^2_k f'''(\xi_k)/f'(x_k)‘’, instead of “e_{k+1} <= e^2_k f'''(\xi_k)/f'(x_k)”, shouldn’t it be “e_{k+1} <= \frac{1}{2} e^2_k f'''(\xi_k)/f''(x_k)”? Thanks a lot for your attention.

Thanks. I revised this part recently and made it slightly different from the previous version. Just let me know if you spot any issue.

The Peano R_n of Taylor expansion got one extra power, which was wrong.

Re 12.3.3.4. Gradient Descent with Line Search
What is meant by binary search here? It is normally used for ordered sequences. Boyd and Vandenberghe [2004] do not mention binary search.
Exercise 2.1 is also not clear. Why do we need to pick half-intervals while the method requires selecting learning rate (a number)?