Anchor Boxes

https://d2l.ai/chapter_computer-vision/anchor.html


Then I found
It didn’t show when I switched to tab "pytorch"at first.
@anirudh

So does match_anchor_to_bbox.
So does offset_inverse and multibox_detection.
@anirudh
Is there some way to fix these?

This is a known issue. The tab rendering is managed by d2lbook.

match_anchor_to_bbox doesn’t faithfully implement the algorithm presented in the text.

# Find the largest iou for each bbox
anc_i = torch.argmax(jaccard, dim=0)
box_j = torch.arange(num_gt_boxes, device=device)
anchors_bbox_map[anc_i] = box_j

In particular, there might be cases where anc_i is the max of multiple columns (j). The code above will assign the last j to anc_i. It should be noted that the code is a simplification of the algorithm.

Hi @gcy, can you please elaborate the same? Also, feel free to open a PR to fix something that might be wrong! :slight_smile:

@anirudh Using the algorithm, we will have the result in the left image, whereas the code gives the right. The table has the same layout as in Fig. 13.4.2.

Thanks @gcy for raising this issue. I think it should be fixed with the following change. Let me know what do you think about the same. I’ll then make a PR.

#@save
def match_anchor_to_bbox(ground_truth, anchors, device, iou_threshold=0.5):
    """Assign ground-truth bounding boxes to anchor boxes similar to them."""
    num_anchors, num_gt_boxes = anchors.shape[0], ground_truth.shape[0]
    # Element `x_ij` in the `i^th` row and `j^th` column is the IoU
    # of the anchor box `anc_i` to the ground-truth bounding box `box_j`
    jaccard = box_iou(anchors, ground_truth)
    # Initialize the tensor to hold assigned ground truth bbox for each anchor
    anchors_bbox_map = torch.full((num_anchors,), -1, dtype=torch.long,
                                  device=device)
    # Assign ground truth bounding box according to the threshold
    max_ious, indices = torch.max(jaccard, dim=1)
    anc_i = torch.nonzero(max_ious >= 0.5).reshape(-1)
    box_j = indices[max_ious >= 0.5]
    anchors_bbox_map[anc_i] = box_j
    # Find the largest iou for each bbox
    col_discard = torch.full((num_anchors,), -1)
    row_discard = torch.full((num_gt_boxes,), -1)
    for _ in range(num_gt_boxes):
      max_idx = torch.argmax(jaccard)
      gt_idx = (max_idx % num_gt_boxes).long()
      anc_idx = (max_idx / num_gt_boxes).long()
      anchors_bbox_map[anc_idx] = gt_idx
      jaccard[:, gt_idx] = col_discard
      jaccard[anc_idx, :] = row_discard
    return anchors_bbox_map

Yes, exactly! I would suggest changing gt_idx to something like box_idx to make the naming more consistent.

Why do we discard the columns from the box intersections if all we’re returning are the mappings fo the anchors to GT labels? Is it unnecessary code?