https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/information-theory.html

In the cross-entropy implementation

def cross_entropy(y_hat, y):

ce = -torch.log(y_hat[range(len(y_hat)), y])

return ce.mean()

why are we returning a mean instead of sum?

Hi @sushmit86, great question! Cross entropy loss is defined as the âexpectationâ of the probability distribution of a random variable đ, and thatâs why we use mean instead of sum.

In my point of view, in one-hot encoding mode (0, 1, âŚ, 0, 0), the y_i = 1 is a probability, and the equation **sum_i** {-y_i*log(y_hat_i)} is cross-entropy (not **mean**) as in equation (3.4.8) in the âloss functionâ subsection of âsoftmax regressionâ in chapter 3 âlinear neural networks.â

Great article! And it can be slightly improved:

18.11.2.4. Properties of Entropy - fix latex rendering

18.11.3.6. Applications of Mutual Information - fix typo in the 1st sentence: in it pure definition --> its