Softmax Regression

Harvinder_singh · July 29, 2020, 4:31am

Although softmax is a nonlinear function, the outputs of softmax regression are still determined by an affine transformation of input features; thus, softmax regression is a linear model.

Can anyone explain this why is it so? because when we say that a model is linear model , then it means model is linear in the parameter but in softmax regression , we are applying softmax function which is non linear so our model parameter will become non linear.

StevenJokes · July 29, 2020, 9:52am

Just a statistical speaking!

ccpvirus · August 11, 2020, 10:37pm

exercise 1.1

apparrently i copied the answer above, 1.2 the variance is a vector and the j-th element is exactly the form of above which is softmax(0)_j(1-softmax(0)_j)
exercise 3.3 use the squeeze theorem and it’s easy to prove
3.4 softmin could be softmin(-x)? i dont know
3.5 pass (too hard to type on the computer)

Gavin · August 21, 2020, 8:11am

In formula 3.4.7,

I couldn’t understand why the later 2 equations are equal, could someone explain a bit more to me? Thanks.

StevenJokes · August 21, 2020, 8:24am

@goldpiggy,
I can’t understand too.

goldpiggy · August 21, 2020, 8:25pm

Hi @Gavin, great question. A simple answer is:

For more details, please check 22.7. Maximum Likelihood — Dive into Deep Learning 1.0.3 documentation

StevenJokes · August 22, 2020, 3:12am

@goldpiggy
The simple answer seems to be Tautology.

I have read URL you give.
But I think it didn’t solve this question.
I can’t find anything in it.

Gavin · August 22, 2020, 3:40am

@goldpiggy Many thanks! Finally understood it!

StevenJokes · August 23, 2020, 5:49am

Really? @Gavin

What is it related to ?
Could you explain it?

goldpiggy · August 24, 2020, 4:43am

It’s explained in 3.4.8 @StevenJokes

StevenJokes · August 24, 2020, 5:00am

@goldpiggy.
ok…
just log to

Abinash_Sahu · August 27, 2020, 5:05pm

Hello. I am still not able to understand clearly how these 2 equations are related. Can you please explain, how for a particular observation i, the probability y given x is related to the entropy definition overall classes?

StevenJokes · August 28, 2020, 2:38am

The green thing is same.

Abinash_Sahu · August 28, 2020, 4:11am

Thank you for your response. My question was more specifically why
is same as
l(y,y_hat)

Is this because y when 1-hot encoded has only single position with 1 and hence when we sum up the y * log(y_hat) over the entire class, we are left with the probability y_hat corresponding to true y. Please advise.

StevenJokes · August 28, 2020, 6:59am

@Abinash_Sahu
l (y，y _ hat)

Cross entropy loss
Only one type of these losses we often use.
https://ml-cheatsheet.readthedocs.io/en/latest/loss_functions.html

JMianT · September 13, 2020, 12:40pm

Q1.2. Compute the variance of the distribution given by softmax(𝐨)softmax(o) and show that it matches the second derivative computed above.

Can someone point me in the right direction? I tried to use Var[𝑋]=𝐸[(𝑋−𝐸[𝑋])^2]=𝐸[𝑋^2]−𝐸[𝑋]^2 to find the variance but I ended up having the term 1/q^2… it doesn’t look like the second derivative from Q1.1.

Thanks!

Premkumar_Devanbu · September 15, 2020, 11:33pm

It appears that there is a reference that remained unresolved:
:eqref: eq_l_cross_entropy

in 3.4.5.3

astonzhang · September 18, 2020, 2:39am

Thanks. Now it’s fixed. See comments in https://github.com/d2l-ai/d2l-en/issues/1448

JH.Lam · December 21, 2020, 8:24am

to keep unified form,should the yj in later two equations should have an upper right mark (i) ?

JH.Lam · December 21, 2020, 8:28am

me too. is there sth wrong?