You've learned about Object Localization as well as Landmark Detection. Now, let's build up to other object detection algorithm. In this video, you'll learn how to use a cofinite to perform object detection using something called the Sliding Windows Detection Algorithm. Let's say you want to build a car detection algorithm. Here's what you can do. You can first create a label training set, so x and y with closely cropped examples of cars. So, this is image x has a positive example, there's a car, here's a car, here's a car, and then there's not a car, there's not a car. And for our purposes in this training set, you can start off with the one with the car closely cropped images. Meaning that x is pretty much only the car. So, you can take a picture and crop out and just cut out anything else that's not part of a car. So you end up with the car centered in pretty much the entire image. Given this label training set, you can then train a cofinite that inputs an image, like one of these closely cropped images. And then the job of the cofinite is to output y, zero or one, is there a car or not. Once you've trained up this cofinite, you can then use it in Sliding Windows Detection. So the way you do that is, if you have a test image like this what you do is you start by picking a certain window size, shown down there. And then you would input into this cofinite a small rectangular region. So, take just this below red square, input that into the cofinite, and have a cofinite make a prediction. And presumably for that little region in the red square, it'll say, no that little red square does not contain a car. In the Sliding Windows Detection Algorithm, what you do is you then pass as input a second image now bounded by this red square shifted a little bit over and feed that to the cofinite. So, you're feeding just the region of the image in the red squares of the cofinite and run the cofinite again. And then you do that with a third image and so on. And you keep going until you've slid the window across every position in the image. And I'm using a pretty large stride in this example just to make the animation go faster. But the idea is you basically go through every region of this size, and pass lots of little cropped images into the cofinite and have it classified zero or one for each position as some stride. Now, having done this once with running this was called the sliding window through the image. You then repeat it, but now use a larger window. So, now you take a slightly larger region and run that region. So, resize this region into whatever input size the cofinite is expecting, and feed that to the cofinite and have it output zero or one. And then slide the window over again using some stride and so on. And you run that throughout your entire image until you get to the end. And then you might do the third time using even larger windows and so on. Right. And the hope is that if you do this, then so long as there's a car somewhere in the image that there will be a window where, for example if you are passing in this window into the cofinite, hopefully the cofinite will have outputs one for that input region. So then you detect that there is a car there. So this algorithm is called Sliding Windows Detection because you take these windows, these square boxes, and slide them across the entire image and classify every square region with some stride as containing a car or not. Now there's a huge disadvantage of Sliding Windows Detection, which is the computational cost. Because you're cropping out so many different square regions in the image and running each of them independently through a cofinite. And if you use a very coarse stride, a very big stride, a very big step size, then that will reduce the number of windows you need to pass through the cofinite, but that courser granularity may hurt performance. Whereas if you use a very fine granularity or a very small stride, then the huge number of all these little regions you're passing through the cofinite means that means there is a very high computational cost. So, before the rise of Neural Networks people used to use much simpler classifiers like a simple linear classifier over hand engineer features in order to perform object detection. And in that era because each classifier was relatively cheap to compute, it was just a linear function, Sliding Windows Detection ran okay. It was not a bad method, but with cofinite now running a single classification task is much more expensive and sliding windows this way is infeasibily slow. And unless you use a very fine granularity or a very small stride, you end up not able to localize the objects that accurately within the image as well. Fortunately however, this problem of computational cost has a pretty good solution. In particular, the Sliding Windows Object Detector can be implemented convolutionally or much more efficiently. Let's see in the next video how you can do that.