单发多框检测（SSD） - mxnet

Jul '21

NayeeC

我想使用多GPU对SSD训练，修改了部分代码：

原代码如下：

device, net = d2l.try_gpu(), TinySSD(num_classes=1)
net = net.to(device)

将原码全部注释以后，修改的代码如下：

net = TinySSD(num_classes=1)
devices = d2l.try_all_gpus()
net = nn.DataParallel(net, device_ids=devices)

此外，每个epoch训练中的X、Y也做了调整：

X, Y = features.to(devices[0]), target.to(devices[0])

运行后报以下错误：

RuntimeError                              Traceback (most recent call last)
<ipython-input-50-edd9900ab59b> in <module>
     14         X, Y = features.to(devices[0]), target.to(devices[0])
     15         # 生成多尺度的锚框，为每个锚框预测类别和偏移量
---> 16         anchors, cls_preds, bbox_preds = net(X)
     17         # 为每个锚框标注类别和偏移量
     18         bbox_labels, bbox_masks, cls_labels = d2l.multibox_target(anchors, Y)

D:\Anaconda3\envs\chtorch\lib\site-packages\torch\nn\modules\module.py in _call_impl(self, *input, **kwargs)
    887             result = self._slow_forward(*input, **kwargs)
    888         else:
--> 889             result = self.forward(*input, **kwargs)
    890         for hook in itertools.chain(
    891                 _global_forward_hooks.values(),

D:\Anaconda3\envs\chtorch\lib\site-packages\torch\nn\parallel\data_parallel.py in forward(self, *inputs, **kwargs)
    153                 raise RuntimeError("module must have its parameters and buffers "
    154                                    "on device {} (device_ids[0]) but found one of "
--> 155                                    "them on device: {}".format(self.src_device_obj, t.device))
    156 
    157         inputs, kwargs = self.scatter(inputs, kwargs, self.device_ids)

RuntimeError: module must have its parameters and buffers on device cuda:0 (device_ids[0]) but found one of them on device: cpu

我应该如何修正，谢谢大家！

Jul '21

Delphin

有个疑问，

anchors, cls_preds, bbox_preds = net(X)

这句中的anchors是将多个batch中不同层输出的锚框合并在了一起的结果

def multibox_target(anchors, labels):
…
for i in range(batch_size):
label = labels[i, :, :]
anchors_bbox_map = assign_anchor_to_bbox(label[:, 1:], anchors,
device)
…

但是当匹配label和anchors时候，将某张照片的label和batch中所有的锚框进行了匹配？按我的理解，应该是只有这张照片对应的锚框才能参与匹配。这里是为什么呢？

NayeeC

Delphin

Clay