Information Theory

https://d2l.ai/chapter_appendix-mathematics-for-deep-learning/information-theory.html

In the cross-entropy implementation

def cross_entropy(y_hat, y):
ce = -torch.log(y_hat[range(len(y_hat)), y])
return ce.mean()

why are we returning a mean instead of sum?

Hi @sushmit86, great question! Cross entropy loss is defined as the “expectation” of the probability distribution of a random variable 𝑋, and that’s why we use mean instead of sum.