Linear Regression

I think one way to circumvent this is to use the square loss function when the error is small (near stationary point) and use absolute error otherwise. This loss function is known as the Huber Loss function
Info on Huber loss

I think one way to circumvent this is to use the square loss function when the error is small (near stationary point) and use absolute error otherwise. This loss function is known as the Huber Loss function
Info on Huber loss

Wanted to point out that this chapter is missing a fairly important point. Although linear regression is often used on linear data, it can also be used on exponential data. With Gradient Descent on Mean Squared Error - the derivative of MSE is always linear with respect to the weights. Although often labeled “polynomial regression” - it is essentially linear regression under the hood.

Quick note regarding incorrect usage of the dimension terminology: