In this video, we're going to, look at a proof that the perceptron learning procedure will eventually get the weights into the cone of feasible solutions. I don't want you to get the wrong idea about the course from this video. In general, it's gonna be about engineering, not about proofs of things. There'll be very few proofs in the course. But we get to understand quite a lot more about perceptrons when we try and prove that they will eventually get the right answer. So we going to use our geometric understanding of what's happening in weight space as subset from learns, to get a proof that the perceptron will eventually find a weight vector if it gets the right answer for all of the training cases, if any such vector exists. And our proof is gonna assume that there is a vector that gets the right answer for all training cases. We'll call that a feasible vector. An example of a feasible vector is shown by the green dot in the diagram. So we start with the wait vector that's getting some of the training cases wrong, and in the diagram we've shown a training case that is getting wrong. And what we want to show. This is the idea for the proof. Is that, every time he gets a training case wrong, it will update the current weight vector. In a way that makes it closer to every feasible weight factor. So we can represent the squared distance of the current weight factor from a feasible weight factor, as the sum of a squared distance along the line of the input factor that defines the training case, and another squared difference orthogonal to that line. The orthogonal squared distance won't change, and the squared distance along the line of the input factor will get smaller. So our hopeful claim is that, every time the perceptor makes a mistake. Our current weight factor is going to get closer to all feasible weight factors. Now this is almost right, but there's an unfortunate problem. If you look at the feasible weight vector in gold, it's just on the right side of the plane that defines one of the training cases. And the current weight factor is just on the wrong side, and the input vector is quite big. So when we add the input factor to the count weight factor, we actually get further away from that gold feasible weight factor. So our hopeful claim doesn't work, but we can fix it up so that it does. So what we're gonna do is we're gonna define a generously feasible weight factor. That's a weight factor that not only gets every training case right, but it gets it right by at least a certain margin. Where the margin is as big as the input factor for that training case. So we take the cone of feasible solutions, and inside that we have another cone of generously feasible solutions. Which get everything right by at least the size of the input vector. And now, our proof will work. Now we can make the claim, that every time the perceptron makes a mistake. The squared distance to all of the generously feasible weight vectors would be decreased by at least the squared length of the input vector, which is the update we make. So given that, we can get an informal sketch of a proof of convergence. I'm not gonna try and make this formal. I'm more interested in the engineering than the mathematics. If your mathematician i'm sure you can make it formal yourself. So, every time the preceptor makes a mistake, the current weight vector moves and it decreases its squared distance from every feas, generously feasible weight vector, by at least the squared length of the current input vector. And so the squared distance to all the generously feasible weight vectors decreases by at least that squared length and assuming that none of the input vectors are infinitesimally small. That means that after a finite number of mistakes the weight vector must lie in the feasible region if this region exists. Notice it doesn't have to lie in the generously feasible region, but it has to get into the feasible region to make, to stop it making mistakes. And that's it. That's our informal sketch of a proof that the perceptron convergence procedure works. But notice, it all depends on the assumption that there is a generously feasible weight vector. And if there is no such vector, the whole proof falls apart.