Anchor Boxes

sayakpaul · January 13, 2021, 3:19am

I have doubts regarding the following code block -

# Generate boxes_per_pixel number of heights and widths which are later
# used to create anchor box corner coordinates (xmin, xmax, ymin, ymax)
w = torch.cat((size_tensor, sizes[0] * torch.sqrt(ratio_tensor[1:])))\
            * in_height / in_width / 2
h = torch.cat((size_tensor, sizes[0] / torch.sqrt(ratio_tensor[1:]))) / 2
anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                    in_height * in_width, 1)

Here’s what the text specifies -

Should the formula be more explicit, something like w=... and h=... because that’s what the code is doing I think.

During the calculation of the w and h in the code, we are adding extra terms. I wanted to know the reason behind that. Specifically, * in_height / in_width / 2 during w and / 2 during h.

anirudh · January 16, 2021, 1:55am

Hi @sayakpaul,
We divide by 2 in both the cases because we are dealing with the coordinates from the centre and hence later to create anchor box corner coordinates (xmin, xmax, ymin, ymax) we need to divide by 2. This can be done once (for brevity) while creating the anchor_manipulations tensor as shown below.

Thanks for raising the question with the other part, there is a small bug which is now fixed in master. The correct code should be:

    # concatenate (various sizes, first ratio) and (first size,  various ratios)
    w = torch.cat((size_tensor * torch.sqrt(ratio_tensor[0]), # 𝑠 * √r
                   sizes[0] * torch.sqrt(ratio_tensor[1:])))\  # 𝑠 * √r 
                   * in_height / in_width
    h = torch.cat((size_tensor / torch.sqrt(ratio_tensor[0]),  # 𝑠 / √r
                   sizes[0] / torch.sqrt(ratio_tensor[1:])))  # 𝑠 / √r
    anchor_manipulations = torch.stack((-w, -h, w, h)).T.repeat(
                                        in_height * in_width, 1) / 2

in_height / in_width is there to handle rectangular inputs since ssd was originally developed for square images (300x300).

You can look at this for a better understanding of the algorithm.
This can be thought of as a vectorized version of the same to achieve performance improvements. Implementing multibox_prior with nested python loops will be very slow.

sayakpaul · January 16, 2021, 2:24am

Thank you so much! It’s clear now. Maybe reflecting all of this in the text would make it more comprehensive.

anirudh · January 16, 2021, 2:26am

Sure! We’ll try to update the text, mentioning more implementation details.

Peter_Boshra · January 18, 2021, 7:47pm

Therefore, we are usually only interested in a combination containing  s1  or  r1  sizes and aspect ratios, that is:


(s1,r1),(s1,r2),…,(s1,rm),(s2,r1),(s3,r1),…,(sn,r1).
 
That is, the number of anchor boxes centered on the same pixel is  n+m−1 . For the entire input image, we will generate a total of  wh(n+m−1)  anchor boxes.

I think, if we have n sizes and m ratios then all combinations will be (n*m) for the same pixel, why (n+m-1) ?

HAITHAM · May 28, 2021, 10:35am

why do they take the square root of the ratios ?