Region-based CNNs (R-CNNs)

https://d2l.ai/chapter_computer-vision/rcnn.html

ROI calculation

In this chapter, I am confused to ROI calculation, then I search from google, this web is helpful:

Region of interest pooling explained (deepsense.ai)

The calculation process as below:

roi_pooling-1

  1. A single 8×8 feature map, one region of interest and an output size of 2×2.

  2. The region proposal (top left, bottom right coordinates): (0, 3), (7, 8).

  3. By dividing it into (2×2) sections (because the output size is 2×2). int((7+0)/2)=3 and int((3+8)/2)=5, which split the region proposal into 4 part.

  4. Select the max values in each of the sections.

interpretation of this chapter’s ROI calculation

To this chapter, why the rois = torch.Tensor([[0, 0, 0, 20, 20], [0, 0, 10, 30, 30]]) of


tensor([[[[ 0.,  1.,  2.,  3.],

          [ 4.,  5.,  6.,  7.],

          [ 8.,  9., 10., 11.],

          [12., 13., 14., 15.]]]])

is


tensor([[[[ 5.,  6.],

          [ 9., 10.]]],

        [[[ 9., 11.],

          [13., 15.]]]])

To calculate [0, 0, 10, 30, 30] ROI, we have to spilt element of matrix from (0,1) to (3,3) into four part. (which 1=10*0.1, 0.1 is the torchvision.ops.roi_pool(X, rois, output_size=(2, 2), spatial_scale=0.1)'s spatial_scale).

Calculation: int((0+3)/2)=1, int((1+3)/2)=2.

Result like image below:

image

Select max from each part is


[ 9., 11.],

[13., 15.]

I think you are not exactly right. Here is my interpretation of this chapter’s ROI calculation:

To calculate [0, 0, 10, 30, 30] ROI, we have to spilt element of matrix from (0,1) to (3,3) into four part).

Calculation: math.ceil((3-0+1)/2)=2, math.ceil((3-1+1)/2)=2

Result like image below:

RoI pooling