Should the formula be more explicit, something like w=... and h=... because that’s what the code is doing I think.
During the calculation of the w and h in the code, we are adding extra terms. I wanted to know the reason behind that. Specifically, * in_height / in_width / 2 during w and / 2 during h.
Hi @sayakpaul,
We divide by 2 in both the cases because we are dealing with the coordinates from the centre and hence later to create anchor box corner coordinates (xmin, xmax, ymin, ymax) we need to divide by 2. This can be done once (for brevity) while creating the anchor_manipulations tensor as shown below.
Thanks for raising the question with the other part, there is a small bug which is now fixed in master. The correct code should be:
in_height / in_width is there to handle rectangular inputs since ssd was originally developed for square images (300x300).
You can look at this for a better understanding of the algorithm.
This can be thought of as a vectorized version of the same to achieve performance improvements. Implementing multibox_prior with nested python loops will be very slow.
Therefore, we are usually only interested in a combination containing s1 or r1 sizes and aspect ratios, that is:
(s1,r1),(s1,r2),…,(s1,rm),(s2,r1),(s3,r1),…,(sn,r1).
That is, the number of anchor boxes centered on the same pixel is n+m−1 . For the entire input image, we will generate a total of wh(n+m−1) anchor boxes.
I think, if we have n sizes and m ratios then all combinations will be (n*m) for the same pixel, why (n+m-1) ?