实战 Kaggle 比赛:狗的品种识别(ImageNet Dogs)

https://zh-v2.d2l.ai/chapter_computer-vision/kaggle-dog.html

代码运行报错 :grimacing:
TypeError: can’t convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.evaluate_loss

是否应该将
l_sum=l.sum()
修改为
l_sum = l.sum().cpu().detach().numpy()

如下代码:
def evaluate_loss(data_iter, net, devices):
l_sum, n = 0.0, 0
for features, labels in data_iter:
features, labels = features.to(devices[0]), labels.to(devices[0])
outputs = net(features)
l = loss(outputs, labels)
l_sum = l.sum().cpu().detach().numpy()
n += labels.numel()
return l_sum / n

我的话是在train函数中,将animator.add(epoch + 1, (None, valid_loss.detach()))改成animator.add(epoch + 1, (None, valid_loss.cpu().detach()))。原理就是将cuda的tensor先转到cpu上再转成numpy。就能正常运行了。

在代码output = torch.nn.functional.softmax(net(data.to(devices[0])), dim=0)中,dim应该错了,正确的dim应该是1,dim=1,如果你dim=0,submission.csv每一行的概率和都不会等于1