Introduction

mli · May 22, 2020, 1:11am

http://d2l.ai/chapter_introduction/index.html

manuel-arno-korfmann · June 9, 2020, 3:23pm

Others (like error rate) are difficult to optimize directly, owing to non-differentiability or other complications. In these cases, it is common to optimize a surrogate objective

It’s not quite clear to me from reading what exactly is meant with “error rate”. I think it would be great if an example could be given.

goldpiggy · June 9, 2020, 4:54pm

Hi @manuel-arno-korfmann, “error rate” means “how much mistake the model makes”. Is that more clear?

syedmech47 · June 9, 2020, 5:23pm

I am still unable to understand error rate.
“How much mistake to model makes” is not clear enough, did you mean how much mistake ‘the’ model makes, which is L1 distance(y-y¹).
Also can you please explain what is surrogate objective?

manuel-arno-korfmann · June 9, 2020, 5:55pm

I’m having a difficult time understanding

Hence, the loss 𝐿L incurred by eating the mushroom is 𝐿(𝑎=eat|𝑥)=0.2∗∞+0.8∗0=∞L(a=eat|x)=0.2∗∞+0.8∗0=∞, whereas the cost of discarding it is 𝐿(𝑎=discard|𝑥)=0.2∗0+0.8∗1=0.8L(a=discard|x)=0.2∗0+0.8∗1=0.8.

Is it possible to explain it in more depth via 1 or 2 paragraphs?

manuel-arno-korfmann · June 9, 2020, 5:59pm

Ok, so a person in the reading group explained that the error rate is the accumulated loss for all examples, is that correct?

goldpiggy · June 10, 2020, 4:23am

Hey @syedmech47, Sorry for the typo here. Yes you got the idea here - the error rate is to measure the distance between y (the truth) and the $\hat{y}$ (the estimate). However the measurement metrics (which measure the error) does not limit to L1 distance, but also can accuracy, precision, recall, f1, etc.

A surrogate is a function that approximates an objective function. There are lots of measurement metrics are not differentiable (like f1 etc.), hence we need some other functions (i.e., the loss function ) to approximate the objective function.

Let me know if this is clear enough!

goldpiggy · June 10, 2020, 4:27am

It can be the accumulated loss, or average loss. It doesn’t make a lot difference here for optimization.

syedmech47 · June 10, 2020, 7:21am

Thanks a lot. It totally made sense.

Side Note: I just want to thank each and every person’s effort in making this wonderful resource open for all and also providing such wonderful support through discussion forums.

goldpiggy · June 10, 2020, 4:21pm

Fantastic! It’s our pleasure to enable more talents learn, apply and benefit from deep learning!

manuel-arno-korfmann · August 1, 2020, 6:12pm

The last exercise mentions “the end-to-end learning approach”, but it is nowhere explained in the section what is “end-to-end learning”.

goldpiggy · August 3, 2020, 10:16pm

Great call @manuel-arno-korfmann. I suspect it refers to " Fig. 1.1.2 A typical training process".

zeuslawyer · August 16, 2020, 1:33am

Hi @goldpiggy, i’m reading this thread and I’d like to further clarify. My understanding is:

there is a difference between error rate and cost/loss function
error rate is the number of errors in predictions for a given set of inputs X. Perhaps its the total number of errors divided by the total examples in the input data?
the loss function is a quantification of the “distance” between right and wrong predictions, which is different from the error rate which is percentage of errors as described in 2 above?

Thank you for these resources and your guidance.

meetashok · August 23, 2020, 3:28pm

@zeuslawyer

My understanding is as follows -

Loss function and cost functions (in this context) mean the same thing
Loss functions are a family of functions that could be relevant for a problem. For classification problems, the error rate is one such loss function. But as it isn’t differentiable. So, cross-entropy is used as a surrogate loss function

Ashok

meetashok · August 23, 2020, 4:04pm

Tailors have developed a small number of parameters that describe human body shape fairly accurately for the purpose of fitting clothes. These problems are referred to as subspace estimation problems. If the dependence is linear, it is called principal component analysis.

In the above sentence, what is the word dependence referring to - dependence between what and what?

Ashok

goldpiggy · August 24, 2020, 5:06am

yes.

yes, we also referred it as the average error rate

Sometimes this two can be the same if error rate function is differentiable. While most of the time, it is not differentiable. For example, most of the classification error rate function is not differentiable, so we use a loss function. Check more details here.

goldpiggy · August 24, 2020, 5:14am

Hi @meetashok, great question! The dependence between the principal components and the original data. So PCA transforms the data to a new coordinate system (ordering by the principal components) by orthogonal linear transformation. It tries to capture and recreate the new features from the data.

rzwck · February 7, 2021, 1:40am

This text mentioned few times about “from first principles”, what does this really mean here?

jioyoung · April 27, 2021, 4:00am

"As we will see later, this loss corresponds to the assumption that our data were corrupted by Gaussian noise ". This sentence appears in the Regression subsection. However, to my knowledge, the Gaussian noise assumption is not necessary for the least-square method for linear regression. The Gaussian assumption is very useful for statistical inference but not necessary for parameter estimation.

bravi · May 17, 2021, 9:36pm

First principles thinking is the act of boiling a process down to the fundamental parts that you know are true and building up from there

First Principles: Elon Musk on the Power of Thinking for Yourself (jamesclear.com)