Anchor Boxes

astonzhang · June 29, 2020, 10:32pm

https://d2l.ai/chapter_computer-vision/anchor.html

Wolf_Rage · August 7, 2020, 5:55am

Can anyone plz explain this statement more…

Therefore, we are usually only interested in a combination containing s1 or r1 sizes and aspect ratios, that is

And this…

That is, the number of anchor boxes centered on the same pixel is n+m−1

Thanks for your time.

goldpiggy · August 7, 2020, 11:29pm

Hey @Wolf_Rage, great question!

It can be interpreted as

Therefore, we only cross combine size s1 with all the ratios {r1, r2, …r_m}, and cross combine ratio r1 with all the sizes {s1, s2, …s_n}, this is (𝑠1,𝑟1),(𝑠1,𝑟2),…,(𝑠1,𝑟𝑚),(𝑠2,𝑟1),(𝑠3,𝑟1),…,(𝑠𝑛,𝑟1).

Hence, we has n+m-1 combinations in total. Does it make sense to you now?

dayekuaipao · August 17, 2020, 3:33am

Hello, I have questions about the way that labeling anchors. It seems we need a loop to generate labels in the step as Figure 13.4.2 shows, and it might be slow. I find one other way to generate labels, it’s just find the largest iou and check if it’s larger than threshold.What’s the difference between the two ways?

Wolf_Rage · August 23, 2020, 4:10am

Yes, now it totally makes sense. Thanks a lot.

Aman_Singh · September 24, 2020, 2:22pm

Assume the size is s∈(0,1]

What is meant by size ‘s’ of anchor boxes as stated in section 13.4.1? How should I visualize it in my head? I can understand that a box might have its height and width but what does this number called size denote?

Thank You.

goldpiggy · September 25, 2020, 12:01am

Hi @Aman_Singh, great question! As we stated in the 13.4.1,

Assume the size is 𝑠∈(0,1], the aspect ratio is 𝑟>0, and the width and height of the anchor box are 𝑤𝑠√𝑟and ℎ𝑠/√𝑟, respectively.

You can think of the size as the ratio to enlarge the height or width of the original bbox.

Aman_Singh · September 25, 2020, 6:43pm

Got it. Thanks a lot !!

rezahabibi96 · September 26, 2020, 12:18pm

Hi, in the training mode, it is not hard to assign a category label (by Fig. 13.4.2) and offset value (by Eq. 13.4.3) of an anchor box because we have ground truth. But how do we set category label and offset value in the prediction mode but we don’t have ground truth b-box? Though we need the category label and offset value to calculate the probability class for each anchor box. The example given in prediction mode is trivial since it set all of the anchor boxes to prediction b-boxes. Thank you.

goldpiggy · September 28, 2020, 4:47am

Hey @rezahabibi96, great question! If I understand correctly, you were asking about how to deal with no “ground truth” label for the training set? In that case, the most efficient way is to label some manually and then apply transfer learning. If it is a commonly seen object, you only need ~100 labels with transfer learning.

rezahabibi96 · September 28, 2020, 5:56am

No, I am not asking about annotating training set, I am asking for the prediction mode, how to calculate the offset value and assign the category label in the prediction mode?

goldpiggy · September 29, 2020, 3:58am

Check the YOLO section here.

tomsoya · October 24, 2020, 7:00pm

I’m not sure I understand the following line of code:

X = np.random.uniform(size=(1, 3, h, w))  # Construct input data

What exactly are the input data, and why is Y not affected by the first two dimensions?

goldpiggy · October 27, 2020, 12:41am

Hi @tomsoya, great question. In this line, we randomly simulate some number which helps you compute the size of input and output. X should be a 4d tensor (i.e., a list of RGB images): (batch_size, RGB_channels, height, width).

Let me know if that makes sense to you.

tomsoya · October 27, 2020, 11:20am

I understand. Thank you!

smizerex · November 21, 2020, 9:29pm

How do we do exercise 2 in this section?

chris_elgoog · December 15, 2020, 12:47pm

In 13.4.1. I understand that

w is the width of the image
h is the height of the image
r is the aspect-ratio of the anchor box.
s is the size of the anchor box
if this would be true and the width and height of the anchor box are computed as described in the text by:
w_b = ws\sqrt{r}
h_b = hs / \sqrt{r}
There would be

r = w_b/h_b= r w/h = w/h * w/h
which is obviously wrong. With the definitions above the width and height of the anchor box must be computed by:
g := \sqrt{w*h}

w_b = g*s\sqrt{r}
h_b = g*s / \sqrt{r}
So, probably the definitions of s or r are not as I assume. How are they defined exactly?

chris_elgoog · December 29, 2020, 3:50pm

from https://github.com/apache/incubator-mxnet/blob/master/src/operator/contrib/multibox_prior.cc
I got how to compute the height and width of the bounding boxes, here as python code for readability

hb_factor = s / np.sqrt(r) 
hb = hb_factor * h # height of bounding box
wb_factor = s * np.sqrt(r) * h/w 
wb = wb_factor * w # width of bounding box

So, the width of the bounding box is computed by w_b = w * s * h/w * \sqrt{r}. This is contrary to the text.

mwu · January 2, 2021, 1:42am

Do you compare the pixels for both when determine the intersection of bounding box and anchor box? e.g. compare pixel value at position (x,y) for one anchor box with pixel value at position (i,j) for one bounding box on the intersection region.

Is intersection the same as overlapping? Namely, they are exactly the same. picture for that portion.

Let us say, I take 2 pictures from different angles for one cat. What is the logic/criteria or mathematically tell which parts are overlapping for these two pictures?

The formula and calculation seems simple, however I am puzzled to understand how to determine the overlapping portion.

Thanks !

sayakpaul · January 13, 2021, 3:12am

Thanks for providing this additional explanation. I think it should be included in the chapter itself