NayeeC
我想使用多GPU对SSD训练,修改了部分代码:
原代码如下:
device, net = d2l.try_gpu(), TinySSD(num_classes=1)
net = net.to(device)
将原码全部注释以后,修改的代码如下:
net = TinySSD(num_classes=1)
devices = d2l.try_all_gpus()
net = nn.DataParallel(net, device_ids=devices)
此外,每个epoch训练中的X、Y也做了调整:
X, Y = features.to(devices[0]), target.to(devices[0])
运行后报以下错误:
RuntimeError Traceback (most recent call last)
<ipython-input-50-edd9900ab59b> in <module>
14 X, Y = features.to(devices[0]), target.to(devices[0])
15 # 生成多尺度的锚框,为每个锚框预测类别和偏移量
---> 16 anchors, cls_preds, bbox_preds = net(X)
17 # 为每个锚框标注类别和偏移量
18 bbox_labels, bbox_masks, cls_labels = d2l.multibox_target(anchors, Y)
D:\Anaconda3\envs\chtorch\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
887 result = self._slow_forward(*input, **kwargs)
888 else:
--> 889 result = self.forward(*input, **kwargs)
890 for hook in itertools.chain(
891 _global_forward_hooks.values(),
D:\Anaconda3\envs\chtorch\lib\site-packages\torch\nn\parallel\data_parallel.py in forward(self, *inputs, **kwargs)
153 raise RuntimeError("module must have its parameters and buffers "
154 "on device {} (device_ids[0]) but found one of "
--> 155 "them on device: {}".format(self.src_device_obj, t.device))
156
157 inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)
RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu
我应该如何修正,谢谢大家!