Others (like error rate) are difficult to optimize directly, owing to non-differentiability or other complications. In these cases, it is common to optimize a surrogate objective
Itās not quite clear to me from reading what exactly is meant with āerror rateā. I think it would be great if an example could be given.
Hi @manuel-arno-korfmann, āerror rateā means āhow much mistake the model makesā. Is that more clear?
I am still unable to understand error rate.
āHow much mistake to model makesā is not clear enough, did you mean how much mistake ātheā model makes, which is L1 distance(y-yĀ¹).
Also can you please explain what is surrogate objective?
Iām having a difficult time understanding
Hence, the loss šæL incurred by eating the mushroom is šæ(š=eat|š„)=0.2āā+0.8ā0=āL(a=eat|x)=0.2āā+0.8ā0=ā, whereas the cost of discarding it is šæ(š=discard|š„)=0.2ā0+0.8ā1=0.8L(a=discard|x)=0.2ā0+0.8ā1=0.8.
Is it possible to explain it in more depth via 1 or 2 paragraphs?
Ok, so a person in the reading group explained that the error rate is the accumulated loss for all examples, is that correct?
Hey @syedmech47, Sorry for the typo here. Yes you got the idea here - the error rate is to measure the distance between y (the truth) and the $\hat{y}$ (the estimate). However the measurement metrics (which measure the error) does not limit to L1 distance, but also can accuracy, precision, recall, f1, etc.
A surrogate is a function that approximates an objective function. There are lots of measurement metrics are not differentiable (like f1 etc.), hence we need some other functions (i.e., the loss function ) to approximate the objective function.
Let me know if this is clear enough!
It can be the accumulated loss, or average loss. It doesnāt make a lot difference here for optimization.
Thanks a lot. It totally made sense.
Side Note: I just want to thank each and every personās effort in making this wonderful resource open for all and also providing such wonderful support through discussion forums.
Fantastic! Itās our pleasure to enable more talents learn, apply and benefit from deep learning!
The last exercise mentions āthe end-to-end learning approachā, but it is nowhere explained in the section what is āend-to-end learningā.
Great call @manuel-arno-korfmann. I suspect it refers to " Fig. 1.1.2 A typical training process".
Hi @goldpiggy, iām reading this thread and Iād like to further clarify. My understanding is:
-
there is a difference between error rate and cost/loss function
-
error rate is the number of errors in predictions for a given set of inputs X. Perhaps its the total number of errors divided by the total examples in the input data?
-
the loss function is a quantification of the ādistanceā between right and wrong predictions, which is different from the error rate which is percentage of errors as described in 2 above?
Thank you for these resources and your guidance.
My understanding is as follows -
- Loss function and cost functions (in this context) mean the same thing
- Loss functions are a family of functions that could be relevant for a problem. For classification problems, the error rate is one such loss function. But as it isnāt differentiable. So, cross-entropy is used as a surrogate loss function
Ashok
Tailors have developed a small number of parameters that describe human body shape fairly accurately for the purpose of fitting clothes. These problems are referred to as subspace estimation problems. If the dependence is linear, it is called principal component analysis.
In the above sentence, what is the word dependence referring to - dependence between what and what?
Ashok
yes.
yes, we also referred it as the average error rate
Sometimes this two can be the same if error rate function is differentiable. While most of the time, it is not differentiable. For example, most of the classification error rate function is not differentiable, so we use a loss function. Check more details here.
Hi @meetashok, great question! The dependence between the principal components and the original data. So PCA transforms the data to a new coordinate system (ordering by the principal components) by orthogonal linear transformation. It tries to capture and recreate the new features from the data.
This text mentioned few times about āfrom first principlesā, what does this really mean here?
"As we will see later, this loss corresponds to the assumption that our data were corrupted by Gaussian noise ". This sentence appears in the Regression subsection. However, to my knowledge, the Gaussian noise assumption is not necessary for the least-square method for linear regression. The Gaussian assumption is very useful for statistical inference but not necessary for parameter estimation.
First principles thinking is the act of boiling a process down to the fundamental parts that you know are true and building up from there
First Principles: Elon Musk on the Power of Thinking for Yourself (jamesclear.com)