# Softmax回归的从零实现

Thanks, @Linhan_Wu! 我们已经 fix here。欢迎下次contribute 到PR!

Thanks, @goldpiggy! 非常荣幸可以做出贡献，我 contribute 了关于其他章节的 PR !

https://blog.csdn.net/wangxiaobei2017/article/details/104770519

1.计算训练损失要乘上len(y)，是因为pytorch会自动对loss取均值？
2.计算样本数的公式为y.size().numel(), 若y为向量，该式结果不恒为1吗？

3.6.6训练中，else后面的注释，应该是custom built 优化器和损失函数，不是pytorch内置的优化器和损失函数，建议修改

Hey @loras_Zhang! Thanks! We have fixed in https://github.com/d2l-ai/d2l-zh/commit/dd8924ea46df23842d16e780d7cebb9ce0c6e2b6

use animator to show the image:

and plt.show() can just show one image

def net(X):
return softmax(torch.matmul(X.reshape((-1,w.shape【0】)),W)+b)

3.6.6节中, 关于累加计算损失和预测正确数的问题.

y_hat[range(len(y_hat)), y]
This is indexing with the right label y.
say y = [0, 2] means there are 2 sample, and the right label for sample[0] is label 0, and sample 1 is label 2. (so len(y) is batch size).
and then y_hat means for every batch/sample, get the predict probability with the indexed label in y.

for tf, it’s tf.boolean_mask(y_hat, tf.one_hot(y, depth=y_hat.shape[-1])) (sparse sampling)

torch.exp(torch.tensor([50]))
tensor([5.1847e+21])

torch.exp(torch.tensor([100]))
tensor([inf])

part 3.6.4
if the one-hot codeing is hired, y is supposed to be [0,0,1]. However, the form of [0,2] is used instead. What’s the consideration behind this?