I don’t get why we need torch.mean in the function above. Isn’t the loss, which is MSELoss has the “mean” already? Why do we need to do the mean again?

Hi @swg104, great catch. Would like to post a PR and be a contributor?

(However, since the final loss is divided by “n” double times, it won’t affect the weights optimization.)

Thanks @swg104. Feel free to PR if anything else doesn’t look right. We appreciate you effort to promote the community!