Information Theory

astonzhang · September 17, 2020, 5:24am

https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/information-theory.html

sushmit86 · October 29, 2020, 7:46pm

In the cross-entropy implementation

def cross_entropy(y_hat, y):
ce = -torch.log(y_hat[range(len(y_hat)), y])
return ce.mean()

why are we returning a mean instead of sum?

goldpiggy · November 2, 2020, 10:04pm

Hi @sushmit86, great question! Cross entropy loss is defined as the “expectation” of the probability distribution of a random variable 𝑋, and that’s why we use mean instead of sum.

bearjoejoe · June 30, 2021, 3:00am

In my point of view, in one-hot encoding mode (0, 1, …, 0, 0), the y_i = 1 is a probability, and the equation sum_i {-y_i*log(y_hat_i)} is cross-entropy (not mean) as in equation (3.4.8) in the ‘loss function’ subsection of “softmax regression” in chapter 3 “linear neural networks.”

lukoshkin · August 29, 2021, 9:44am

Great article! And it can be slightly improved:
18.11.2.4. Properties of Entropy - fix latex rendering
18.11.3.6. Applications of Mutual Information - fix typo in the 1st sentence: in it pure definition --> its

ToddMorrill · July 22, 2022, 2:36pm

Is there a typo in equation 19.11.18? If you treat the $E_x$ and $E_y$ as expectations, you wind up with two extra probability terms $p_X(x)$ and $p_Y(y)$, respectively. Then under section 19.11.4.2, where it says “$I(X, Y)$ is also numerically equivalent with the following terms:” - items 2 and 3 in that list are taking expectations over constants. $D_{KL}$ is a real-valued function so there appears to be no need for the outer expectations.

Can you help me see what I’m missing if I’ve gotten this wrong?

prateeky2806 · November 4, 2022, 2:33am

Yes, it seems like a typo to me as well. I am also not sure how they go from equation 19.11.28 to 19.11.29.