Can anyone plz explain this statement moreβ¦

Therefore, we are usually only interested in a combination containing s1 or r1 sizes and aspect ratios, that is

And thisβ¦

That is, the number of anchor boxes centered on the same pixel is n+mβ1

Thanks for your time.

Hey @Wolf_Rage, great question!

It can be interpreted as

Therefore, we only cross combine size s1 with all the ratios {r1, r2, β¦r_m}, and cross combine ratio r1 with all the sizes {s1, s2, β¦s_n}, this is (π 1,π1),(π 1,π2),β¦,(π 1,ππ),(π 2,π1),(π 3,π1),β¦,(π π,π1).

Hence, we has n+m-1 combinations in total. Does it make sense to you now?

Hello, I have questions about the way that labeling anchors. It seems we need a loop to generate labels in the step as Figure 13.4.2 shows, and it might be slow. I find one other way to generate labels, itβs just find the largest iou and check if itβs larger than threshold.Whatβs the difference between the two ways?

Yes, now it totally makes sense. Thanks a lot.

Assume the size is sβ(0,1]

What is meant by size βsβ of anchor boxes as stated in section 13.4.1? How should I visualize it in my head? I can understand that a box might have its height and width but what does this number called size denote?

Thank You.

Hi @Aman_Singh, great question! As we stated in the 13.4.1,

Assume the size is π β(0,1], the aspect ratio is π>0, and the width and height of the anchor box are π€π βπand βπ /βπ, respectively.

You can think of the size as the ratio *to enlarge the height or width* of the original bbox.

Got it. Thanks a lot !!

Hi, in the training mode, it is not hard to assign a category label (by Fig. 13.4.2) and offset value (by Eq. 13.4.3) of an anchor box because we have ground truth. But how do we set category label and offset value in the prediction mode but we donβt have ground truth b-box? Though we need the category label and offset value to calculate the probability class for each anchor box. The example given in prediction mode is trivial since it set all of the anchor boxes to prediction b-boxes. Thank you.

Hey @rezahabibi96, great question! If I understand correctly, you were asking about how to deal with no βground truthβ label for the training set? In that case, the most efficient way is to label some manually and then apply transfer learning. If it is a commonly seen object, you only need ~100 labels with transfer learning.

No, I am not asking about annotating training set, I am asking for the prediction mode, how to calculate the offset value and assign the category label in the prediction mode?

Iβm not sure I understand the following line of code:

```
X = np.random.uniform(size=(1, 3, h, w)) # Construct input data
```

What exactly are the input data, and why is Y not affected by the first two dimensions?