1 00:00:00,098 --> 00:00:06,048 In this figure, we're going to get a geometrical understanding of what happens 2 00:00:06,048 --> 00:00:11,020 when a perceptron learns. To do this, we have to think in terms of a 3 00:00:11,020 --> 00:00:14,087 weight space. It's a high dimensional space in which 4 00:00:14,087 --> 00:00:19,060 each point corresponds to a particular setting for all the weights. 5 00:00:19,089 --> 00:00:25,025 In this phase, we can represent the training cases as planes and learning 6 00:00:25,025 --> 00:00:30,354 consists of trying to get the weight vector on the right side of all the 7 00:00:30,354 --> 00:00:35,014 training planes. For non-mathematicians, this may be 8 00:00:35,014 --> 00:00:40,037 tougher than previous material. You may have to spend quite a long time 9 00:00:40,037 --> 00:00:44,092 studying the next two parts. In particular, if you're not used to 10 00:00:44,092 --> 00:00:48,099 thinking about hyperplanes and high dimensional spaces, you're going to have 11 00:00:48,099 --> 00:00:52,013 to learn that. To deal with hyperplanes in a 12 00:00:52,013 --> 00:00:56,709 14-dimensional space, for example, what you do is you visualize a 3-dimensional 13 00:00:56,709 --> 00:01:00,076 space and you say, fourteen to yourself very loudly. 14 00:01:00,076 --> 00:01:05,011 Everybody does it. But remember, that when you go from 15 00:01:05,011 --> 00:01:10,307 13-dimensional space to a 14-dimensional space, your creating as much extra 16 00:01:10,307 --> 00:01:13,090 complexity as when you go from a 2D space to a 3D space. 17 00:01:13,090 --> 00:01:17,059 14-dimensional space is very big and very complicated. 18 00:01:20,026 --> 00:01:24,007 So, we are going to start off by thinking about weight space. 19 00:01:24,007 --> 00:01:28,084 This is the space that has one dimension for each weight in the perceptron. 20 00:01:29,043 --> 00:01:34,059 A point in the space represents a particular setting of all the weights. 21 00:01:35,035 --> 00:01:42,006 Assuming we've eliminated the threshold, we can represent every training case as a 22 00:01:42,006 --> 00:01:45,082 hyperplane through the origin in weight space. 23 00:01:45,082 --> 00:01:51,062 So, points in the space correspond to weight vectors and training cases 24 00:01:51,062 --> 00:01:57,014 correspond to planes. And, for a particular training case, the 25 00:01:57,014 --> 00:02:02,021 weights must lie on one side of that hyperplane, in order to get the answer 26 00:02:02,021 --> 00:02:06,752 correct for that training case. So, let's look at a picture of it so we 27 00:02:06,752 --> 00:02:12,013 can understand what's going on. Here's a picture of white space. 28 00:02:13,068 --> 00:02:19,034 The training case, we're going to think of one training case for now, it defines a 29 00:02:19,034 --> 00:02:23,025 plane, which in this 2D picture is just the black line. 30 00:02:24,069 --> 00:02:30,080 The plane goes through the origin and it's perpendicular to the input vector for that 31 00:02:30,080 --> 00:02:34,041 training case, which here is shown as a blue vector. 32 00:02:35,059 --> 00:02:40,046 We're going to consider a training case in which the correct answer is one. 33 00:02:40,046 --> 00:02:45,074 And for that kind of training case, the weight vector needs to be on the correct 34 00:02:45,074 --> 00:02:49,053 side of the hyperplane in order to get the answer right. 35 00:02:49,053 --> 00:02:54,094 It needs to be on the same side of the hyperplane as the direction in which the 36 00:02:54,094 --> 00:02:59,084 training vector points. For any weight vector like the green one, 37 00:02:59,084 --> 00:03:05,045 that's on that side of the hyperplane, the angle with the input vector will be less 38 00:03:05,045 --> 00:03:09,036 than 90 degrees. So, the scaler product of the input vector 39 00:03:09,036 --> 00:03:14,036 with a weight vector will be positive. And since we already got rid of the 40 00:03:14,036 --> 00:03:18,068 threshold, that means the perceptron will give an output of what? 41 00:03:18,068 --> 00:03:21,092 It'll say yes, and so we'll get the right answer. 42 00:03:21,092 --> 00:03:27,033 Conversely, if we have a weight vector like the red one, that's on the wrong side 43 00:03:27,033 --> 00:03:31,923 of the plane, the angle with the input vector will be more than 90 degrees, so 44 00:03:31,923 --> 00:03:37,782 the scalar product of the weight vector and the input vector will be negative, and 45 00:03:37,782 --> 00:03:43,124 we'll get a scalar product that is less than zero so the perceptron will say, no 46 00:03:43,124 --> 00:03:47,037 or zero, and in this case, we'll get the wrong answer. 47 00:03:49,021 --> 00:03:53,716 So, to summarize, on one side of the plane, all the weight vectors will get the 48 00:03:53,716 --> 00:03:57,536 right answer. And on the other side of the plane, all 49 00:03:57,536 --> 00:04:01,418 the possible weight vectors will get the wrong answer. 50 00:04:01,418 --> 00:04:06,669 Now, let's look at a different training case, in which the correct answers are 51 00:04:06,669 --> 00:04:09,589 zero. So here, we have the weight space again. 52 00:04:09,589 --> 00:04:15,614 We've chosen a different input vector, of this input factor, the right answer is 53 00:04:15,614 --> 00:04:18,953 zero. So again, the input case corresponds to a 54 00:04:18,953 --> 00:04:24,465 plane shown by the black line. And in this case, any weight vectors will 55 00:04:24,465 --> 00:04:31,106 make an angle of less than 90 degrees with the input factor, will give us a positive 56 00:04:31,106 --> 00:04:37,522 scalar product, [unknown] perceptron to say yes or one, and it will get the answer 57 00:04:37,522 --> 00:04:41,150 wrong conversely. And the input vector on the other side of 58 00:04:41,150 --> 00:04:44,547 the plain, will have an angle of greater than 90 degrees. 59 00:04:44,547 --> 00:04:47,718 And they will correctly give the answer of zero. 60 00:04:47,718 --> 00:04:53,036 So, as before, the plane goes through the origin, it's perpendicular to the input 61 00:04:53,036 --> 00:04:57,393 vector, and on one side of the plane, all the weight vectors are bad, and on the 62 00:04:57,393 --> 00:05:01,623 other side, they're all good. Now, let's put those two training cases 63 00:05:01,623 --> 00:05:06,101 together in one picture weight space. Our picture of weight space is getting a 64 00:05:06,101 --> 00:05:10,018 little bit crowded. I've moved the input vector over so we 65 00:05:10,018 --> 00:05:13,215 don't have all the vectors in quite the same place. 66 00:05:13,215 --> 00:05:19,909 And now, you can see that there's a code of possible weight vectors. 67 00:05:19,909 --> 00:05:25,815 And any weight vectors inside that cone, will get the right answer for both 68 00:05:25,815 --> 00:05:29,227 training cases. Of course, there doesn't have to be any 69 00:05:29,227 --> 00:05:32,998 cone like that. It could be there are no weight vectors 70 00:05:32,998 --> 00:05:37,183 that get the right answers for all of the training cases. 71 00:05:37,183 --> 00:05:40,349 But if there are any, they'll lie in a cone. 72 00:05:40,349 --> 00:05:45,990 So, what the learning algorithm needs to do is consider the training cases one at a 73 00:05:45,990 --> 00:05:51,776 time and move the weight vector around in such a way that it eventually lies in this 74 00:05:51,776 --> 00:05:56,515 cone. One thing to notice is that if you get a 75 00:05:56,515 --> 00:06:02,847 good weight factor, that is something that works for all the training cases, it'll 76 00:06:02,847 --> 00:06:06,395 lie on the cone. Ad if you had another one, it'll lie on 77 00:06:06,395 --> 00:06:09,083 the cone. And so, if you take the average of those 78 00:06:09,083 --> 00:06:12,074 two weight vectors, that will also lie on the cone. 79 00:06:12,074 --> 00:06:17,026 That means the problem is convex. The average of two solutions is itself a 80 00:06:17,026 --> 00:06:20,061 solution. And in general in machine learning if you 81 00:06:20,061 --> 00:06:24,001 can get a convex learning problem, that makes life easy.