1
00:00:00,107 --> 00:00:04,477
[MUSIC]

2
00:00:04,477 --> 00:00:07,650
>> Let's start talking about
how do you compute w hat t.

3
00:00:08,848 --> 00:00:13,750
And this quantity's intuitive and has to
know how good, or how much we trust ft.

4
00:00:13,750 --> 00:00:15,590
The classify we learn that's
[INAUDIBLE] iteration.

5
00:00:17,070 --> 00:00:21,830
So, specifically,
if ft is good, we like it,

6
00:00:21,830 --> 00:00:24,890
it's doing well in our data,
we want w hat t to be large.

7
00:00:24,890 --> 00:00:29,535
In fact, if ft has really,
really great accuracy, very low error,

8
00:00:29,535 --> 00:00:32,220
we want wt to be really big.

9
00:00:32,220 --> 00:00:34,920
However if ft is really bad,

10
00:00:34,920 --> 00:00:38,970
if it really is terrible at making
predictions, we should down weight it.

11
00:00:38,970 --> 00:00:41,640
We should not trust that particular vote.

12
00:00:41,640 --> 00:00:46,130
So in other words, how do we measure
whether a classifier's good or not?

13
00:00:46,130 --> 00:00:50,510
As we said, we said ft is good
if he has low training error.

14
00:00:50,510 --> 00:00:53,040
However you have to remember
that we have weighted data.

15
00:00:53,040 --> 00:00:55,650
So what we really care about is
how well it's doing weighted data.

16
00:00:55,650 --> 00:00:59,485
For example, if we're weighing more
certain datapoints because they're really

17
00:00:59,485 --> 00:01:02,659
high, they're making lots of mistakes
on those, we want to make sure

18
00:01:02,659 --> 00:01:05,690
that the classifier has low error
on those really hard mistakes.

19
00:01:06,800 --> 00:01:10,600
And so let's look at measuring
error on weighted data.

20
00:01:11,890 --> 00:01:12,698
Measuring error and

21
00:01:12,698 --> 00:01:15,820
weighted data is very similar to
measuring error in regular data.

22
00:01:15,820 --> 00:01:17,610
You have a data point.

23
00:01:17,610 --> 00:01:19,380
For example, the sushi was great and

24
00:01:19,380 --> 00:01:23,040
is labeled as positive,
but now we have a weight.

25
00:01:23,040 --> 00:01:25,890
In this case, alpha, which might be 1.2.

26
00:01:25,890 --> 00:01:29,690
So this is a data point which
is say above average importance.

27
00:01:29,690 --> 00:01:34,090
So we want to measure the weighted
total of the correct examples and

28
00:01:34,090 --> 00:01:35,800
the weighted total of the mistakes.

29
00:01:35,800 --> 00:01:41,150
So we take our learned classifier, f of t,
and we feed that review, in this case,

30
00:01:41,150 --> 00:01:46,840
the sushi was great, but we hide the
label, which in this case was positive.

31
00:01:46,840 --> 00:01:48,250
And now we compare the prediction.

32
00:01:48,250 --> 00:01:52,020
For example, let's say that y
hat was plus one for this input.

33
00:01:52,020 --> 00:01:54,710
It's the same as the two labels correct,
so

34
00:01:54,710 --> 00:01:59,450
we add the weight 1.2 to the weight
of other correct examples we've seen.

35
00:01:59,450 --> 00:02:01,170
So that's awesome.

36
00:02:02,170 --> 00:02:03,421
But let's say we have another data point.

37
00:02:03,421 --> 00:02:07,790
The food was OK,
which is labeled truly as negative, and

38
00:02:07,790 --> 00:02:09,330
we talked about this example before.

39
00:02:09,330 --> 00:02:11,880
We feed food was OK to a classifier.

40
00:02:11,880 --> 00:02:13,410
We hide the label.

41
00:02:13,410 --> 00:02:15,650
Minus 1.
But our classifier gets confused.

42
00:02:15,650 --> 00:02:18,920
It doesn't know the cultural reference,
the food was OK, and

43
00:02:18,920 --> 00:02:23,050
thinks it is a positive example,
y hat is plus 1, and it's a mistake.

44
00:02:23,050 --> 00:02:25,572
So we take the weight of
this data point 0.5 and

45
00:02:25,572 --> 00:02:28,360
add it to the total
weight of the mistakes.

46
00:02:28,360 --> 00:02:31,350
So keep adding the weight of the mistakes
versus the weight of the correct

47
00:02:31,350 --> 00:02:32,430
classifications.

48
00:02:32,430 --> 00:02:33,800
And use that to measure the error.

49
00:02:35,010 --> 00:02:35,670
Now that we have seen and

50
00:02:35,670 --> 00:02:40,040
intuitive notion of what a weighted error
is, let's write the down the equations for

51
00:02:40,040 --> 00:02:42,470
the weighted error, so
we can be sure if we need to implement it.

52
00:02:42,470 --> 00:02:45,810
So, the first thing we need to measure is
the total weight of all the mistakes, so

53
00:02:45,810 --> 00:02:49,970
the sum of our mistakes of
the weight of those data points.

54
00:02:49,970 --> 00:02:54,655
So this is the sum over the datapoint,
so i equals

55
00:02:54,655 --> 00:02:59,640
1 through N, of an indicator that says,
was this a mistake?

56
00:02:59,640 --> 00:03:04,300
So is y hat i different than yi?

57
00:03:04,300 --> 00:03:10,370
So this just measure whether it was
a mistake, and if it was a mistake we

58
00:03:10,370 --> 00:03:14,170
don't just count it as a mistake, we count
it whatever weight that datapoint has.

59
00:03:14,170 --> 00:03:17,890
So we're going to weigh that
contribution by alpha i.

60
00:03:17,890 --> 00:03:20,240
And now, to compute the error,
we're going to normalize it so

61
00:03:20,240 --> 00:03:21,090
it's a number between zero and

62
00:03:21,090 --> 00:03:25,130
one, so we have to divide it by
the total weight of all the data points.

63
00:03:25,130 --> 00:03:33,880
So it's the sum over i equals 1 through N
of the weight of all the data points of i.

64
00:03:33,880 --> 00:03:39,575
And these are the two
quantities that we care about,

65
00:03:39,575 --> 00:03:44,881
and the weighted error can
be denoted by the total

66
00:03:44,881 --> 00:03:52,530
weight of the mistakes divided by
the total weight of all data points.

67
00:03:53,560 --> 00:03:58,539
Extremely simple, the best possible
error you could hope for is 0.0.

68
00:03:58,539 --> 00:04:01,843
Now, the worst error is 1.0,

69
00:04:01,843 --> 00:04:07,940
which means that we're
making mistakes everywhere.

70
00:04:07,940 --> 00:04:11,580
But notice that if we're
making mistakes everywhere,

71
00:04:11,580 --> 00:04:14,230
if we emerge a class fire we're
going to get everything right.

72
00:04:14,230 --> 00:04:18,770
So the way to think about it is in
the worst possible case in some ways is

73
00:04:20,710 --> 00:04:22,170
how random does.

74
00:04:22,170 --> 00:04:27,105
So a random classifier
will get error of 0.5, and

75
00:04:27,105 --> 00:04:33,499
we discussed this in the first course
of how a random classifier gets

76
00:04:33,499 --> 00:04:39,880
error 0.5 on a binary
classification problem like this.

77
00:04:39,880 --> 00:04:44,982
So now that we've seen the weighted error,
let's look at how we

78
00:04:44,982 --> 00:04:50,759
can update the coefficient w hat
t of the function that we learn.

79
00:04:50,759 --> 00:04:54,869
>> [MUSIC]