If you look at the object detection literature, there's a set of ideas called region proposals that's been very influential in computer vision as well. I wanted to make this video optional because I tend to use the region proposal instead of algorithm a bit less often but nonetheless, it has been an influential body of work and an idea that you might come across in your own work. Let's take a look. So if you recall the sliding windows idea, you would take a train crossfire and run it across all of these different windows and run the detector to see if there's a car, pedestrian, or maybe a motorcycle. Now, you could run the algorithm convolutionally, but one downside that the algorithm is it just crossfires a lot of the regions where there's clearly no object. So this rectangle down here is pretty much blank. It's clearly nothing interesting there to classify, and maybe it was also running it on this rectangle, which look likes there's nothing that interesting there. So what Russ Girshik, Jeff Donahue, Trevor Darrell, and Jitendra Malik proposed in the paper, as cited to the bottom of the slide, is an algorithm called R-CNN, which stands for Regions with convolutional networks or regions with CNNs. And what that does is it tries to pick just a few regions that makes sense to run your continent crossfire. So rather than running your sliding windows on every single window, you instead select just a few windows and run your continent crossfire on just a few windows. The way that they perform the region proposals is to run an algorithm called a segmentation algorithm, that results in this output on the right, in order to figure out what could be objects. So, for example, the segmentation algorithm finds a blob over here. And so you might pick that pounding balls and say, "Let's run a crossfire on that blob." It looks like this little green thing finds a blob there, as you might also run the crossfire on that rectangle to see if there's some interesting there. And in this case, this blue blob, if you run a crossfire on that, hope you find the pedestrian, and if you run it on this light cyan blob, maybe you'll find a car, maybe not,. I'm not sure. So the details of this, this is called a segmentation algorithm, and what you do is you find maybe 2000 blobs and place bounding boxes around about 2000 blobs and value crossfire on just those 2000 blobs, and this can be a much smaller number of positions on which to run your continent crossfire, then if you have to run it at every single position throughout the image. And this is a special case if you are running your continent not just on square-shaped regions but running them on tall skinny regions to try to find pedestrians or running them on your white fat regions try to find cars and running them at multiple scales as well. So that's the R-CNN or the region with CNN, a region of CNN features idea. Now, it turns out the R-CNN algorithm is still quite slow. So there's been a line of work to explore how to speed up this algorithm. So the basic R-CNN algorithm with proposed regions using some algorithm and then crossfire the proposed regions one at a time. And for each of the regions, they will output the label. So is there a car? Is there a pedestrian? Is there a motorcycle there? And then also outputs a bounding box, so you can get an accurate bounding box if indeed there is a object in that region. So just to be clear, the R-CNN algorithm doesn't just trust the bounding box it was given. It also outputs a bounding box, B X B Y B H B W, in order to get a more accurate bounding box and whatever happened to surround the blob that the image segmentation algorithm gave it. So it can get pretty accurate bounding boxes. Now, one downside of the R-CNN algorithm was that it is actually quite slow. So over the years, there been a few improvements to the R-CNN algorithm. Russ Girshik proposed the fast R-CNN algorithm, and it's basically the R-CNN algorithm but with a convolutional implementation of sliding windows. So the original implementation would actually classify the regions one at a time. So far, R-CNN use a convolutional implementation of sliding windows, and this is roughly similar to the idea you saw in the fourth video of this week. And that speeds up R-CNN quite a bit. It turns out that one of the problems of fast R-CNN algorithm is that the clustering step to propose the regions is still quite slow and so a different group, Shaoqing Ren, Kaiming He, Ross Girshick, and Jian Son, proposed the faster R-CNN algorithm, which uses a convolutional neural network instead of one of the more traditional segmentation algorithms to propose a blob on those regions, and that wound up running quite a bit faster than the fast R-CNN algorithm. Although, I think the faster R-CNN algorithm, most implementations are usually still quit a bit slower than the YOLO algorithm. So the idea of region proposals has been quite influential in computer vision, and I wanted you to know about these ideas because you see others still used these ideas, for myself, and this is my personal opinion, not the opinion of the computer vision research committee as a whole. I think that we can propose an interesting idea but that not having two steps, first, proposed region and then crossfire, being able to do everything more or at the same time, similar to the YOLO or the You Only Look Once algorithm that seems to me like a more promising direction for the long term. But that's my personal opinion and not necessary the opinion of the whole computer vision research committee. So feel free to take that with a grain of salt, but I think that the R-CNN idea, you might come across others using it. So it was worth learning as well so you can understand others algorithms better. So we're now finished up our material for this week on object detection. I hope you enjoy working on this week's problem exercise, and I look forward to seeing you this week.