What s the difference between BB regression algorithms used in R-CNN variants vs BB in YOLO localization techniques

0 votes

Question:

What's the difference between the bounding box(BB) produced by "BB regression algorithms in region-based object detectors" vs "bounding box in single shot detectors"? and can they be used interchangeably if not why?

While understanding variants of R-CNN and Yolo algorithms for object detection, I came across two major techniques to perform object detection i.e Region-based(R-CNN) and niche-sliding window based(YOLO).

Both use different variants(complicated to simple) in both regimes but in the end, they are just localizing objects in the image using Bounding boxes!. I am just trying to focus on the localization(assuming classification is happening!) below since that is more relevant to the question asked & explained my understanding in brief:

  • Region-based:

    • Here, we let the Neural network to predict continuous variables(BB coordinates) and refers to that as regression.
    • The regression that is defined (which is not linear at all), is just a CNN or other variants(all layers were differentiable),outputs are four values (𝑟,𝑐,ℎ,𝑤), where (𝑟,𝑐) specify the values of the position of the left corner and (ℎ,𝑤) the height and width of the BB.
    • In order to train this NN, a smooth L1 loss was used to learn the precise BB by penalizing when the outputs of the NN are very different from the labeled (𝑟,𝑐,ℎ,𝑤) in the training set!
  • niche-Sliding window(convolutionally implemented!) based:

    • first, we divide the image into say 19*19 grid cells.
    • the way you assign an object to a grid-cell is by selecting the midpoint of an object and then assigning that object to whichever one grid cell contains the midpoint of the object. So each object, even if the objects span multiple grid cells, that object is assigned only to one of the 19 by 19 grid cells.
    • Now, you take the two coordinates of this grid-cell and calculate the precise BB(bx, by, bh, bw) for that object using some method such as
    • (bx, by, bh, bw) are relative to the grid cell where x & y are center point and h & w are the height of precise BB i.e the height of the bounding box is specified as a fraction of the overall width of the grid cell and h& w can be >1.
    • There multiple ways of calculating precise BB specified in the paper.

Both Algorithms:

  • outputs precise bounding boxes.!

  • works in supervised learning settings, they were using labeled dataset where the labels are bounding boxes stored(manually marked my some annotator using tools like labelimg ) for each image in a JSON/XML file format.

I am trying to understand the two localization techniques on a more abstract level(as well as having an in-depth idea of both techniques!) to get more clarity on:

  • in what sense they are different?, &

  • why 2 were created, I mean what are the failure/success points of 1 on the another?.

  • and can they be used interchangeably, if not then why?

please feel free to correct me if I am wrong somewhere, feedback is highly appreciated! Citing to any particular section of a research paper would be more rewarding!

Apr 11, 2022 in Machine Learning by Dev
• 6,000 points
1,250 views

2 answers to this question.

0 votes
The main distinction is that two-stage Faster R-CNN-like algorithms are more accurate, but single-stage YOLO/SSD-like algorithms are faster.
The first stage of a two-stage architecture is usually dedicated to region suggestion, while the second stage is dedicated to classification and more precise localization. The first step is identical to single-stage architectures, with the exception that the region proposal only distinguishes between "object" and "background," but the single-stage architecture distinguishes between all object types. An RPN specifies whether or not there is an object present in the first stage, also in a sliding window-like form, and if there is - to roughly give the region (bounding box) in which it lies.
By first pooling the relevant features from the proposed region, and then passing through the Fast R-CNN-like architecture (which accomplishes the classification + regression), the second step uses this region for classification and bounding box regression (for better localization).
In response to your query about transferring data between them, why would you want to do so? Typically, you would select an architecture based on your most pressing requirements (e.g. latency/power/accuracy), and you would not switch between them unless you have a smart notion that will assist you in some way.
answered Apr 14, 2022 by anonymous

reshown Aug 22, 2023 by Neelam
0 votes

The main distinction is that two-stage Faster R-CNN-like algorithms are more accurate, but single-stage YOLO/SSD-like algorithms are faster.
The first stage of a two-stage architecture is usually dedicated to region suggestion, while the second stage is dedicated to classification and more precise localization. The first step is identical to single-stage architectures, with the exception that the region proposal only distinguishes between "object" and "background," but the single-stage architecture distinguishes between all object types. An RPN specifies whether or not there is an object present in the first stage, also in a sliding window-like form, and if there is - to roughly give the region (bounding box) in which it lies.
By first pooling the relevant features from the proposed region, and then passing through the Fast R-CNN-like architecture (which accomplishes the classification + regression), the second step uses this region for classification and bounding box regression (for better localization).
In response to your query about transferring data between them, why would you want to do so? Typically, you would select an architecture based on your most pressing requirements (e.g. latency/power/accuracy), and you would not switch between them unless you have a smart notion that will assist you in some way.

Ignite Your Future with Machine Learning Training

answered Apr 14, 2022 by anonymous

Related Questions In Machine Learning

0 votes
1 answer
0 votes
1 answer
0 votes
1 answer

What's the difference between regression testing and mutation testing?

Regression testing is a test suite that ...READ MORE

answered Mar 8, 2022 in Machine Learning by Dev
• 6,000 points
985 views
0 votes
1 answer

How can I Get Laplacian pyramid using opencv

As far as I can see you ...READ MORE

answered Sep 4, 2018 in Python by Priyaj
• 58,020 points
2,052 views
0 votes
1 answer

tf.reshape vs tf.contrib.layers.flatten

All 3 options reshape identically: import tensorflow as ...READ MORE

answered Oct 10, 2018 in Python by Priyaj
• 58,020 points
2,662 views
0 votes
1 answer

Leela Chess Zero: how large is the probability vector in the output layer?

The next move's probability vector (called the ...READ MORE

answered Mar 9, 2022 in Machine Learning by Nandini
• 5,480 points
668 views
0 votes
1 answer

What is Depth of a convolutional neural network?

The depth of a Deep Neural Network ...READ MORE

answered Mar 25, 2022 in Machine Learning by Nandini
• 5,480 points
1,573 views
0 votes
1 answer

Plotting logistic regression in R with the Smarket dataset

The first, third, and fourth methods of ...READ MORE

answered Apr 12, 2022 in Machine Learning by Dev
• 6,000 points
984 views
0 votes
1 answer

different results for Random Forest Regression in R and Python

Random Forests, as others have mentioned, have ...READ MORE

answered Apr 12, 2022 in Machine Learning by Dev
• 6,000 points
1,536 views
webinar REGISTER FOR FREE WEBINAR X
REGISTER NOW
webinar_success Thank you for registering Join Edureka Meetup community for 100+ Free Webinars each month JOIN MEETUP GROUP