 # Anchor Boxes

Can anyone plz explain this statement more…

Therefore, we are usually only interested in a combination containing s1 or r1 sizes and aspect ratios, that is

And this…

That is, the number of anchor boxes centered on the same pixel is n+m−1

Hey @Wolf_Rage, great question!

It can be interpreted as

Therefore, we only cross combine size s1 with all the ratios {r1, r2, …r_m}, and cross combine ratio r1 with all the sizes {s1, s2, …s_n}, this is (𝑠1,𝑟1),(𝑠1,𝑟2),…,(𝑠1,𝑟𝑚),(𝑠2,𝑟1),(𝑠3,𝑟1),…,(𝑠𝑛,𝑟1).

Hence, we has n+m-1 combinations in total. Does it make sense to you now?

1 Like

Hello, I have questions about the way that labeling anchors. It seems we need a loop to generate labels in the step as Figure 13.4.2 shows, and it might be slow. I find one other way to generate labels, it’s just find the largest iou and check if it’s larger than threshold.What’s the difference between the two ways?

Yes, now it totally makes sense. Thanks a lot.

Assume the size is s∈(0,1]

What is meant by size ‘s’ of anchor boxes as stated in section 13.4.1? How should I visualize it in my head? I can understand that a box might have its height and width but what does this number called size denote?

Thank You.

Hi @Aman_Singh, great question! As we stated in the 13.4.1,

Assume the size is 𝑠∈(0,1], the aspect ratio is 𝑟>0, and the width and height of the anchor box are 𝑤𝑠√𝑟and ℎ𝑠/√𝑟, respectively.

You can think of the size as the ratio to enlarge the height or width of the original bbox.

Got it. Thanks a lot !!

Hi, in the training mode, it is not hard to assign a category label (by Fig. 13.4.2) and offset value (by Eq. 13.4.3) of an anchor box because we have ground truth. But how do we set category label and offset value in the prediction mode but we don’t have ground truth b-box? Though we need the category label and offset value to calculate the probability class for each anchor box. The example given in prediction mode is trivial since it set all of the anchor boxes to prediction b-boxes. Thank you.

Hey @rezahabibi96, great question! If I understand correctly, you were asking about how to deal with no “ground truth” label for the training set? In that case, the most efficient way is to label some manually and then apply transfer learning. If it is a commonly seen object, you only need ~100 labels with transfer learning.

No, I am not asking about annotating training set, I am asking for the prediction mode, how to calculate the offset value and assign the category label in the prediction mode?

Check the YOLO section here.

I’m not sure I understand the following line of code:

``````X = np.random.uniform(size=(1, 3, h, w))  # Construct input data
``````

What exactly are the input data, and why is Y not affected by the first two dimensions?