1
00:00:00,176 --> 00:00:03,949
In this video I am going to show you

2
00:00:03,949 --> 00:00:06,663
an example of machine learning

3
00:00:06,663 --> 00:00:08,756
It is a very simple kind of NeuralNet

4
00:00:08,756 --> 00:00:12,672
and it is gonna be learning to recognize digits.

5
00:00:12,672 --> 00:00:15,029
And you gonna be able to see how the weights evolve,

6
00:00:15,082 --> 00:00:18,135
as we run a very simple learning algorithm.

7
00:00:18,711 --> 00:00:20,326
So we gonna look at the very

8
00:00:20,326 --> 00:00:24,250
simple learning algorithm for training a very simple network

9
00:00:24,250 --> 00:00:29,273
to recognize handwritten shapes. The network has two layers of neurons.

10
00:00:29,588 --> 00:00:31,646
It has got input neurons.

11
00:00:31,646 --> 00:00:38,033
whose activities represent the intensity of pixels, and output neurons, whose activities represent the class

12
00:00:38,033 --> 00:00:39,010
classes.

13
00:00:41,857 --> 00:00:43,587
What we'd like is that when we show

14
00:00:43,587 --> 00:00:49,611
a particular shape. The output neuron for that shape gets active.

15
00:00:51,287 --> 00:00:53,863
If a pixel is active what it does

16
00:00:53,863 --> 00:00:57,096
is it votes for  particular shapes.

17
00:00:57,096 --> 00:01:00,889
Namely the shapes that contain that pixel.

18
00:01:00,889 --> 00:01:04,031
Each inked pixel can vote for several shapes.

19
00:01:04,031 --> 00:01:09,803
and the votes can have different intensities, the shape that gets the most vote wins.

20
00:01:09,803 --> 00:01:10,794
So we are assuming

21
00:01:10,794 --> 00:01:12,960
there is a competition between the output units.

22
00:01:12,960 --> 00:01:14,772
and that something I haven't explained yet

23
00:01:14,772 --> 00:01:17,951
will explain in a later lecture.

24
00:01:20,028 --> 00:01:24,297
So first, we need to decide how to display the weights. And

25
00:01:24,297 --> 00:01:29,385
it seems natural to write the weights on the connection between

26
00:01:29,385 --> 00:01:34,809
input unit and output unit. But, we are never able to see what was going on if we get that.

27
00:01:35,255 --> 00:01:39,309
We need a display in which we can see the values of thousands of weights.

28
00:01:39,447 --> 00:01:45,196
So the idea is for each output unit, we make a little map. And in that map we show

29
00:01:46,180 --> 00:01:53,032
the strength of connection coming from each input pixel in the location of that input pixel.

30
00:01:53,324 --> 00:01:59,375
And we show the strength of connection by using black and white blobs, whose area represents the magnitude.

31
00:01:59,375 --> 00:02:09,822
and whose sign represent, whose color represents the sign. So the initial weights that you see there are just small random weights.

32
00:02:10,083 --> 00:02:18,643
Now what we are gonna do is show that network some data and get it to learn weights that are better than the random weights.

33
00:02:19,550 --> 00:02:27,240
the way we are gonna look is when we show it an image, we are going to increment the weights

34
00:02:27,240 --> 00:02:33,274
from the active pixels in the image to the correct class

35
00:02:33,274 --> 00:02:37,942
if we just did that, the weights could get only bigger and eventually every class

36
00:02:37,942 --> 00:02:40,782
will get huge input whenever we show it to the image.

37
00:02:40,782 --> 00:02:44,124
So we need some way of keeping the weights under the control.

38
00:02:44,124 --> 00:02:50,311
What we gonna do is we will also gonna decrement the weights from the active pixels to whatever class

39
00:02:50,311 --> 00:02:53,040
the network guesses.

40
00:02:53,040 --> 00:03:00,113
So (??) training it to the right thing, rather than (??) currently   has a tendency to do.

41
00:03:00,113 --> 00:03:04,909
If of course it does the right thing, then the increments we make,

42
00:03:04,909 --> 00:03:11,358
in the first step the learning rule will exactly cancel the decrements so nothing will change, which is what we want to.

43
00:03:12,896 --> 00:03:18,569
So, these are the initial weights. Now we are going to show you few hundred training examples and then

44
00:03:18,569 --> 00:03:20,706
look at the weights again.

45
00:03:20,706 --> 00:03:25,893
So now the weights have changed, they started to

46
00:03:25,893 --> 00:03:31,204
form regular patterns. And we show it a few more hundred examples.

47
00:03:31,204 --> 00:03:34,871
And the weights have changed small, and a few more hundred examples.

48
00:03:34,871 --> 00:03:39,648
and a few more hundred examples. Few more hundred.

49
00:03:39,648 --> 00:03:43,016
and now the weights are pretty much at their final values.

50
00:03:43,186 --> 00:03:50,720
I'll talk more in future lectures about precise details of the learning algorithm. But what you can see is

51
00:03:50,720 --> 00:03:54,082
the weights now look like the little templates from the shapes.

52
00:03:54,082 --> 00:04:02,128
If you look at the weights going into the one unit for example, they don't (??) little template for identifying ones. they are not quite templates.

53
00:04:02,128 --> 00:04:04,786
If you look at the weights going into the nine unit

54
00:04:04,786 --> 00:04:08,875
, they don't have any positive weight below the half way line.

55
00:04:08,875 --> 00:04:16,137
That's because for telling the difference between 9s and 7s the weights below the half way line aren't much used.

56
00:04:16,137 --> 00:04:21,710
You have to tell the difference by deciding whether there is a loop at the top or horizontal bar at the top.

57
00:04:21,710 --> 00:04:26,421
And so, those output units are focused on that discrimination.

58
00:04:29,928 --> 00:04:38,703
One thing about this learning algorithm is because the network is so simple, it's unable to learn a very good way of discriminating shapes.

59
00:04:40,995 --> 00:04:45,707
what it learns is equivalent to having a little template for each shape.

60
00:04:45,707 --> 00:04:53,533
And then deciding the winner based on which shape has the template that overlapped most with the ink.

61
00:04:54,425 --> 00:05:03,043
The problem is that the weights in which handwritten digits vary are much too complicated to be captured by simple template matches of whole shapes.

62
00:05:03,043 --> 00:05:13,250
You have to model allowable  variation for digits. By first extracting features and then looking arrangement of those features.

63
00:05:14,774 --> 00:05:18,090
So here is examples we have seen already.

64
00:05:18,090 --> 00:05:29,790
If you look at those 2's in green box. You can see there is no template that will fit all those well and will fail to fit that 3 in the red box there.

65
00:05:29,790 --> 00:05:33,721
So task simply can't be solved by a simple network like that.

66
00:05:35,025 --> 00:05:38,800
The network did the best it could but it can't solve this problem.