In this figure, we're going to get a geometrical understanding of what happens when a perceptron learns. To do this, we have to think in terms of a weight space. It's a high dimensional space in which each point corresponds to a particular setting for all the weights. In this phase, we can represent the training cases as planes and learning consists of trying to get the weight vector on the right side of all the training planes. For non-mathematicians, this may be tougher than previous material. You may have to spend quite a long time studying the next two parts. In particular, if you're not used to thinking about hyperplanes and high dimensional spaces, you're going to have to learn that. To deal with hyperplanes in a 14-dimensional space, for example, what you do is you visualize a 3-dimensional space and you say, fourteen to yourself very loudly. Everybody does it. But remember, that when you go from 13-dimensional space to a 14-dimensional space, your creating as much extra complexity as when you go from a 2D space to a 3D space. 14-dimensional space is very big and very complicated. So, we are going to start off by thinking about weight space. This is the space that has one dimension for each weight in the perceptron. A point in the space represents a particular setting of all the weights. Assuming we've eliminated the threshold, we can represent every training case as a hyperplane through the origin in weight space. So, points in the space correspond to weight vectors and training cases correspond to planes. And, for a particular training case, the weights must lie on one side of that hyperplane, in order to get the answer correct for that training case. So, let's look at a picture of it so we can understand what's going on. Here's a picture of white space. The training case, we're going to think of one training case for now, it defines a plane, which in this 2D picture is just the black line. The plane goes through the origin and it's perpendicular to the input vector for that training case, which here is shown as a blue vector. We're going to consider a training case in which the correct answer is one. And for that kind of training case, the weight vector needs to be on the correct side of the hyperplane in order to get the answer right. It needs to be on the same side of the hyperplane as the direction in which the training vector points. For any weight vector like the green one, that's on that side of the hyperplane, the angle with the input vector will be less than 90 degrees. So, the scaler product of the input vector with a weight vector will be positive. And since we already got rid of the threshold, that means the perceptron will give an output of what? It'll say yes, and so we'll get the right answer. Conversely, if we have a weight vector like the red one, that's on the wrong side of the plane, the angle with the input vector will be more than 90 degrees, so the scalar product of the weight vector and the input vector will be negative, and we'll get a scalar product that is less than zero so the perceptron will say, no or zero, and in this case, we'll get the wrong answer. So, to summarize, on one side of the plane, all the weight vectors will get the right answer. And on the other side of the plane, all the possible weight vectors will get the wrong answer. Now, let's look at a different training case, in which the correct answers are zero. So here, we have the weight space again. We've chosen a different input vector, of this input factor, the right answer is zero. So again, the input case corresponds to a plane shown by the black line. And in this case, any weight vectors will make an angle of less than 90 degrees with the input factor, will give us a positive scalar product, [unknown] perceptron to say yes or one, and it will get the answer wrong conversely. And the input vector on the other side of the plain, will have an angle of greater than 90 degrees. And they will correctly give the answer of zero. So, as before, the plane goes through the origin, it's perpendicular to the input vector, and on one side of the plane, all the weight vectors are bad, and on the other side, they're all good. Now, let's put those two training cases together in one picture weight space. Our picture of weight space is getting a little bit crowded. I've moved the input vector over so we don't have all the vectors in quite the same place. And now, you can see that there's a code of possible weight vectors. And any weight vectors inside that cone, will get the right answer for both training cases. Of course, there doesn't have to be any cone like that. It could be there are no weight vectors that get the right answers for all of the training cases. But if there are any, they'll lie in a cone. So, what the learning algorithm needs to do is consider the training cases one at a time and move the weight vector around in such a way that it eventually lies in this cone. One thing to notice is that if you get a good weight factor, that is something that works for all the training cases, it'll lie on the cone. Ad if you had another one, it'll lie on the cone. And so, if you take the average of those two weight vectors, that will also lie on the cone. That means the problem is convex. The average of two solutions is itself a solution. And in general in machine learning if you can get a convex learning problem, that makes life easy.