Derive the analytic solution to the optimization problem for linear regression with squared error. To keep things simple, you can omit the bias b from the problem (we can do this in principled fashion by adding one column to X consisting of all ones).
- Write out the optimization problem in matrix and vector notation (treat all the data as a single matrix, and all the target values as a single vector).
- Compute the gradient of the loss with respect to w
- Find the analytic solution by setting the gradient equal to zero and solving the matrix equation.
- When might this be better than using stochastic gradient descent? When might this method break?
I have understood first three part of this question.Can anyone help me in exlpaning the fourth part of this question?
When might this be better than using stochastic gradient descent? When might this method break?