Softmax Regression

sheey · July 10, 2020, 2:01am

In exercise 1:

Compute the second derivative of the cross-entropy loss 𝑙(𝐲,𝐲̂) for the softmax.

Is that means

?
And where can I find exercises answer?

goldpiggy · July 10, 2020, 11:47pm

Hi @sheey, the second derivative will be:

$\frac{\partial^2 l(\mathbf{y}, \hat{\mathbf{y}})}{{{o_j^2}}} = … = \mathrm{softmax}(\mathbf{o})_j \cdot (1- \mathrm{softmax}(\mathbf{o})_j)$

i.e.,

Sorry we currently don’t provide the solutions. But feel free to ask question at the discussion forum

StevenJokes · July 11, 2020, 2:17am

j should be in the bracket?

or
When o is vector,j should be outside?
When o_j,j should be in?

StevenJokes · July 11, 2020, 2:48am

We only need to calculate the the derivative of softmax(o)_j （j is in or out?) to get the second derivative of the cross-entropy loss 𝑙(𝐲,𝐲̂) ?
I noticed that the second derivative of the cross-entropy loss 𝑙(𝐲,𝐲̂) is exactly the derivative of softmax(o)_j .
The derivative of y_j is 0 ? y_j is 1 or 0? j represents the label?

goldpiggy · July 12, 2020, 4:33pm

Hi @StevenJokes, good question! Actually, $j$ should be outside, since we first calculate softmax of the vector $o$, then take its j’s component.

goldpiggy · July 12, 2020, 4:38pm

Yes. I don’t fully understand your question though.

StevenJokes · July 13, 2020, 9:45am

I got it. Thanks @goldpiggy

StevenJokes · July 13, 2020, 9:49am

When we calculate the derivative of $y_j$ is 0, does it mean that we think $y_j$ has no relationship about $o_j$.

goldpiggy · July 13, 2020, 8:47pm

Hi @StevenJokes, $y_j$ is the real label while $o_j$ is the target, i.e. $y_j$ is not a function of $o_j$.

StevenJokes · July 14, 2020, 4:13am

How do we judge whether $a$ is a function of $b$ or not?
Or we just judge by that we haven’t defined it before, rather than whether $a$ has a relationship with $b$ in reality or not.

StevenJokes · July 14, 2020, 4:16am

I think it is a function in reality. But newton’s calculus can’t calculate its derivative, just because the function is discrete.

StevenJokes · July 14, 2020, 4:21am

Please check https://github.com/d2l-ai/d2l-en/issues/1141 quickly, I think it maybe makes all eval wrong.

goldpiggy · July 14, 2020, 4:23pm

How do we judge whether $a$ is a function of $b$ or not?
Or we just judge by that we haven’t defined it before, rather than whether $a$ has a relationship with $b$ in reality or not.

Hi @StevenJokes, $y$ is the true label, while $\hat{y}$ is the estimated label. Hence $\hat{y}$ is a function of $o$, while $y$ is not.

StevenJokes · July 14, 2020, 6:04pm

I already have understand $y_j$ is the true label, such as one-hot. But the diverce of our thinking is that I think the true label has a certain relationship with $o_j$, so I think $y_j$ is also $o_j$'s function. When we get same $o_j$, we get only one and $y_j$. Doesn’t it conform the defination of function.
But the function is discrete.

StevenJokes · July 20, 2020, 12:40pm

As we saw in “Freshness and Distribution Shift”, if production data is different from the data a model was trained on, a model may struggle to perform. To help with this, you should check the inputs to your pipeline.

In 10. Build Safeguards for Models - Building Machine Learning Powered Applications [Book]

Harvinder_singh · July 29, 2020, 4:31am

Although softmax is a nonlinear function, the outputs of softmax regression are still determined by an affine transformation of input features; thus, softmax regression is a linear model.

Can anyone explain this why is it so? because when we say that a model is linear model , then it means model is linear in the parameter but in softmax regression , we are applying softmax function which is non linear so our model parameter will become non linear.

StevenJokes · July 29, 2020, 9:52am

Just a statistical speaking!

ccpvirus · August 11, 2020, 10:37pm

exercise 1.1

apparrently i copied the answer above, 1.2 the variance is a vector and the j-th element is exactly the form of above which is softmax(0)_j(1-softmax(0)_j)
exercise 3.3 use the squeeze theorem and it’s easy to prove
3.4 softmin could be softmin(-x)? i dont know
3.5 pass (too hard to type on the computer)

Gavin · August 21, 2020, 8:11am

In formula 3.4.7,

I couldn’t understand why the later 2 equations are equal, could someone explain a bit more to me? Thanks.

StevenJokes · August 21, 2020, 8:24am

@goldpiggy,
I can’t understand too.