Anchor Boxes

I have doubts regarding the following code block -

# Generate boxes_per_pixel number of heights and widths which are later
# used to create anchor box corner coordinates (xmin, xmax, ymin, ymax)
w = torch.cat((size_tensor, sizes[0] * torch.sqrt(ratio_tensor[1:])))\
            * in_height / in_width / 2
h = torch.cat((size_tensor, sizes[0] / torch.sqrt(ratio_tensor[1:]))) / 2
anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                    in_height * in_width, 1)

Here’s what the text specifies -

Should the formula be more explicit, something like w=... and h=... because that’s what the code is doing I think.

During the calculation of the w and h in the code, we are adding extra terms. I wanted to know the reason behind that. Specifically, * in_height / in_width / 2 during w and / 2 during h.

Hi @sayakpaul,
We divide by 2 in both the cases because we are dealing with the coordinates from the centre and hence later to create anchor box corner coordinates (xmin, xmax, ymin, ymax) we need to divide by 2. This can be done once (for brevity) while creating the anchor_manipulations tensor as shown below.

Thanks for raising the question with the other part, there is a small bug which is now fixed in master. The correct code should be:

    # concatenate (various sizes, first ratio) and (first size,  various ratios)
    w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]), # 𝑠 * √r
                   sizes[0] * torch.sqrt(ratio_tensor[1:])))\  # 𝑠 * √r 
                   * in_height / in_width
    h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]),  # 𝑠 / √r
                   sizes[0] / torch.sqrt(ratio_tensor[1:])))  # 𝑠 / √r
    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                        in_height * in_width, 1) / 2

in_height / in_width is there to handle rectangular inputs since ssd was originally developed for square images (300x300).

You can look at this for a better understanding of the algorithm.
This can be thought of as a vectorized version of the same to achieve performance improvements. Implementing multibox_prior with nested python loops will be very slow.

1 Like

Thank you so much! It’s clear now. Maybe reflecting all of this in the text would make it more comprehensive.

Sure! We’ll try to update the text, mentioning more implementation details.

Therefore, we are usually only interested in a combination containing  s1  or  r1  sizes and aspect ratios, that is:


(s1,r1),(s1,r2),…,(s1,rm),(s2,r1),(s3,r1),…,(sn,r1).
 
That is, the number of anchor boxes centered on the same pixel is  n+m−1 . For the entire input image, we will generate a total of  wh(n+m−1)  anchor boxes.

I think, if we have n sizes and m ratios then all combinations will be (n*m) for the same pixel, why (n+m-1) ?

why do they take the square root of the ratios ?