Single Shot Multibox Detection (SSD)

https://d2l.ai/chapter_computer-vision/ssd.html

Based on this plot below, I think SSD predicts the classes and bounding boxes of anchor boxes (based on the , so I expect the anchor boxes generated by multibox_prior should be explicitly passed into cls_predictor and bbox_predictor.

ssd structure

However, in blk_forward, only the feature map Y are passed in:

def blk_forward(X, blk, size, ratio, cls_predictor, bbox_predictor):
    Y = blk(X)
    anchors = d2l.multibox_prior(Y, sizes=size, ratios=ratio)
    cls_preds = cls_predictor(Y)
    bbox_preds = bbox_predictor(Y)
    return (Y, anchors, cls_preds, bbox_preds)

So, the SSD model forward propogation outputs classes & labels solely based on feature maps. The anchor boxes are only used during the loss evaluation. Is this statement correct?

Thanks for any correction & thoughts!


in def multibox_prior(data, sizes, ratios):… , sizes[0] should change to size_tensor[0], this can improve speed!