Information Theory

astonzhang · June 29, 2020, 11:06pm

https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/information-theory.html

neoaurion · December 23, 2020, 12:01pm

Hi. Maximum likelihood is searching for theta that maximize the probability of the observations knowing the parameters P(X|theta). This is not what is written in your book but it’s ok in this appendix. However when it comes to cross entropy both in the book and in the appendix you maximize P(y|x) which is not the maximum likelihood definition, this is very disturbing. Can anyone please explain why this should be ok ? Thanks

neoaurion · January 10, 2021, 2:43pm

Hi, is there please anyone to answer this ? This won’t change the result but improve a lot the understanding. Thanks

goldpiggy · January 11, 2021, 10:38pm

Hi @neoaurion, sorry for the delay reply. Can i know where do you quote this sentence? (Maximum likelihood is searching for the parameter (i.e. theta) given the current observation data (i.e. X).)

neoaurion · March 5, 2021, 7:12am

Hi @goldpiggy,
This is what I’ve learned from my classical signal processing background but you can find it here https://www.analyticsvidhya.com/blog/2018/07/introductory-guide-maximum-likelihood-estimation-case-study-r/ . Where f(X|theta) is the probability of X (data) knowing theta, the parameters.

goldpiggy · March 5, 2021, 5:59pm

Hi @neoaurion, I see your questions now. Fundamentally, we still want to maximize l(\theta) , i.e., find the optimal “\theta”, as we state in Cross Entropy section. While the only difference for cross entropy is that we have two sets of data X, Y, which come from two different distributions. That looks like maximizing P(X, Y | \theta) now. Does it make sense?

neoaurion · March 5, 2021, 9:25pm

Yes it’s right with P(X, Y | \theta). I think that the result on the cross entropy is right but the explanation is not.

particle1331 · March 6, 2021, 6:46pm

H(X) ≥ 0 in 18.11.2.4. is only true for discrete distributions. For example, the continuous distribution U[0, 0.5] has negative entropy.

goldpiggy · March 9, 2021, 9:55pm

Great catch! Corrected in this PR.

mike · March 17, 2021, 4:20pm

Hi @goldpiggy,

Thanks for putting together this great appendix. Can you explain why eq 18.11.7 is true?