1
00:00:00,320 --> 00:00:01,410
In this video, I want to

2
00:00:01,490 --> 00:00:02,710
tell you about how to use neural

3
00:00:02,900 --> 00:00:04,390
networks to do multiclass

4
00:00:04,830 --> 00:00:06,690
classification where we may

5
00:00:06,820 --> 00:00:07,840
have more than one category

6
00:00:07,930 --> 00:00:09,600
that we're trying to distinguish amongst.

7
00:00:10,470 --> 00:00:12,280
In the last part of

8
00:00:12,600 --> 00:00:13,920
the last video, where we

9
00:00:14,400 --> 00:00:15,320
had the handwritten digit recognition

10
00:00:15,830 --> 00:00:17,030
problem, that was actually

11
00:00:17,700 --> 00:00:19,000
a multiclass classification problem because

12
00:00:19,440 --> 00:00:20,730
there were ten possible categories

13
00:00:21,550 --> 00:00:22,820
for recognizing the digits from

14
00:00:23,040 --> 00:00:23,980
0 through 9 and so, if

15
00:00:24,060 --> 00:00:25,430
you want us to fill you

16
00:00:25,830 --> 00:00:27,840
in on the details of how to do that.

17
00:00:30,410 --> 00:00:31,870
The way we do multiclass classification

18
00:00:32,990 --> 00:00:34,380
in a neural network is essentially

19
00:00:35,060 --> 00:00:37,600
an extension of the one versus all method.

20
00:00:38,610 --> 00:00:39,650
So, let's say that we

21
00:00:39,790 --> 00:00:41,660
have a computer vision example,

22
00:00:42,630 --> 00:00:43,810
where instead of just trying

23
00:00:44,010 --> 00:00:46,170
to recognize cars as in

24
00:00:46,310 --> 00:00:47,290
the original example that I started off

25
00:00:47,470 --> 00:00:48,670
with, but let's say that

26
00:00:49,060 --> 00:00:51,380
we're trying to recognize, you know, four

27
00:00:51,510 --> 00:00:52,820
categories of objects and given

28
00:00:53,030 --> 00:00:53,900
an image we want to

29
00:00:54,100 --> 00:00:56,360
decide if it is a pedestrian, a car, a motorcycle or a truck.

30
00:00:57,200 --> 00:00:58,750
If that's the case, what

31
00:00:58,920 --> 00:01:00,480
we would do is we would

32
00:01:00,970 --> 00:01:02,820
build a neural network with four

33
00:01:03,160 --> 00:01:04,500
output units so that

34
00:01:04,710 --> 00:01:08,110
our neural network now outputs a vector of four numbers.

35
00:01:09,110 --> 00:01:10,450
So, the output now is actually

36
00:01:11,170 --> 00:01:11,840
needing to be a vector of four

37
00:01:12,070 --> 00:01:13,300
numbers and what we're

38
00:01:13,540 --> 00:01:14,400
going to try to do is

39
00:01:14,780 --> 00:01:16,680
get the first output unit

40
00:01:17,180 --> 00:01:18,840
to classify: is the

41
00:01:19,160 --> 00:01:20,650
image a pedestrian, yes or no.

42
00:01:21,200 --> 00:01:24,530
The second unit to classify: is the image a car, yes or no.

43
00:01:25,110 --> 00:01:26,880
This unit to classify: is the

44
00:01:27,130 --> 00:01:29,150
image a motorcycle, yes or

45
00:01:29,230 --> 00:01:30,460
no, and this would classify:

46
00:01:30,930 --> 00:01:32,930
is the image a truck, yes or no.

47
00:01:33,720 --> 00:01:35,730
And thus, when the image

48
00:01:36,390 --> 00:01:37,630
is of a pedestrian, we

49
00:01:37,820 --> 00:01:38,930
would ideally want the network

50
00:01:39,410 --> 00:01:40,140
to output 1, 0, 0, 0,

51
00:01:40,250 --> 00:01:41,260
when it is a

52
00:01:41,520 --> 00:01:42,310
car we want it to output

53
00:01:42,750 --> 00:01:43,530
0, 1, 0, 0, when this

54
00:01:43,840 --> 00:01:45,960
is a motorcycle, we get it to or rather, we want

55
00:01:46,390 --> 00:01:47,460
it to output 0, 0,

56
00:01:47,580 --> 00:01:48,970
1, 0 and so on.

57
00:01:50,750 --> 00:01:51,880
So this is just like

58
00:01:52,270 --> 00:01:53,690
the "one versus all" method

59
00:01:54,190 --> 00:01:55,520
that we talked about when we

60
00:01:55,680 --> 00:01:58,120
were describing logistic regression, and

61
00:01:58,320 --> 00:02:00,480
here we have essentially four logistic

62
00:02:01,290 --> 00:02:03,100
regression classifiers, each of

63
00:02:03,260 --> 00:02:04,800
which is trying to recognize one

64
00:02:05,000 --> 00:02:06,780
of the four classes that

65
00:02:06,940 --> 00:02:08,830
we want to distinguish amongst.

66
00:02:09,540 --> 00:02:10,780
So, rearranging the slide of

67
00:02:10,860 --> 00:02:12,130
it, here's our neural network

68
00:02:12,540 --> 00:02:14,070
with four output units and those

69
00:02:14,330 --> 00:02:15,510
are what we want h

70
00:02:15,670 --> 00:02:16,790
of x to be when we

71
00:02:16,990 --> 00:02:18,930
have the different images, and

72
00:02:19,580 --> 00:02:20,860
the way we're going to represent the

73
00:02:21,110 --> 00:02:22,690
training set in these settings

74
00:02:23,260 --> 00:02:24,670
is as follows. So, when we have

75
00:02:24,890 --> 00:02:26,170
a training set with different images

76
00:02:27,350 --> 00:02:28,990
of pedestrians, cars, motorcycles and

77
00:02:29,260 --> 00:02:30,450
trucks, what we're going

78
00:02:30,510 --> 00:02:31,940
to do in this example is

79
00:02:32,190 --> 00:02:34,580
that whereas previously we had

80
00:02:34,990 --> 00:02:36,780
written out the labels as

81
00:02:37,040 --> 00:02:38,320
y being an integer from

82
00:02:38,710 --> 00:02:42,180
1, 2, 3 or 4. Instead of

83
00:02:42,280 --> 00:02:44,210
representing y this way,

84
00:02:44,890 --> 00:02:46,340
we're going to instead represent y

85
00:02:47,050 --> 00:02:49,400
as follows: namely Yi

86
00:02:54,850 --> 00:02:55,230
will be either 1, 0, 0, 0

87
00:02:55,230 --> 00:02:57,040
or 0, 1, 0, 0 or 0, 0, 1, 0 or 0, 0, 0, 1 depending on what the

88
00:02:57,490 --> 00:02:59,100
corresponding image Xi is.

89
00:02:59,410 --> 00:03:00,700
And so one training example

90
00:03:01,230 --> 00:03:03,090
will be one pair Xi colon Yi

91
00:03:04,530 --> 00:03:06,340
where Xi is an image with, you

92
00:03:06,440 --> 00:03:08,000
know one of the four objects and

93
00:03:08,170 --> 00:03:09,640
Yi will be one of these vectors.

94
00:03:10,970 --> 00:03:12,020
And hopefully, we can find

95
00:03:12,420 --> 00:03:13,670
a way to get our

96
00:03:14,020 --> 00:03:15,100
Neural Networks to output some

97
00:03:15,290 --> 00:03:16,480
value. So, the h of x

98
00:03:17,310 --> 00:03:20,360
is approximately y and

99
00:03:20,550 --> 00:03:22,000
both h of x and Yi,

100
00:03:22,600 --> 00:03:23,770
both of these are going

101
00:03:24,020 --> 00:03:25,170
to be in our example,

102
00:03:26,060 --> 00:03:28,700
four dimensional vectors when we have four classes.

103
00:03:31,810 --> 00:03:33,020
So, that's how you

104
00:03:33,170 --> 00:03:34,830
get neural network to do multiclass classification.

105
00:03:36,290 --> 00:03:37,780
This wraps up our discussion on

106
00:03:38,050 --> 00:03:39,620
how to represent Neural Networks

107
00:03:40,120 --> 00:03:41,620
that is on our hypotheses representation.

108
00:03:42,780 --> 00:03:44,180
In the next set of videos, let's

109
00:03:44,690 --> 00:03:45,830
start to talk about how take

110
00:03:45,990 --> 00:03:47,360
a training set and how to

111
00:03:47,570 --> 00:03:49,970
automatically learn the parameters of the neural network.