1
00:00:04,830 --> 00:00:08,180
We've now seen linear classifiers,
in particular, logistic and

2
00:00:08,180 --> 00:00:10,870
reaction is a really core
example of those, and

3
00:00:10,870 --> 00:00:14,050
how to learn them from data using
gradient descent algorithms.

4
00:00:14,050 --> 00:00:16,600
So we're now ready to
build those classifiers.

5
00:00:16,600 --> 00:00:19,870
However, when we go into practical
settings we have to think about over

6
00:00:19,870 --> 00:00:25,080
fitting, which is very significant problem
in machine learning and in particular for

7
00:00:25,080 --> 00:00:27,890
logistic aggression can
be really troublesome.

8
00:00:27,890 --> 00:00:30,729
So let's see how we can avoid
over fitting in this setting.

9
00:00:32,220 --> 00:00:36,000
In order to explore the concept of over
fitting we need to better understand how

10
00:00:36,000 --> 00:00:38,890
we measure error in
classification in general.

11
00:00:38,890 --> 00:00:42,540
So for a classifier, we typically start
with some data that has been labeled

12
00:00:42,540 --> 00:00:47,340
as positive or negative reviews, and then
we split that data into a training set

13
00:00:47,340 --> 00:00:52,240
which we use to train our model and
a validation set, which would then

14
00:00:52,240 --> 00:00:56,850
take the results of the learned model and
use it to evaluate our classifier.

15
00:00:56,850 --> 00:01:01,090
So let's talk a little bit about the
evaluation of a classifier, in general.

16
00:01:01,090 --> 00:01:03,350
As we discussed in the first course.

17
00:01:03,350 --> 00:01:07,700
We measure a classifier's performance
using what's called classification error.

18
00:01:07,700 --> 00:01:10,820
For example, if I give a sentence,
the sushi was great, and

19
00:01:10,820 --> 00:01:14,120
it's labeled positive, and I want to
measure my error in that sentence,

20
00:01:14,120 --> 00:01:18,530
what I do is I feed the sentence
into my classifier.

21
00:01:18,530 --> 00:01:23,160
The sushi was great, but I hide that
label, so that the classifier doesn't

22
00:01:23,160 --> 00:01:25,610
get to know whether the sentence
was labeled as positive or

23
00:01:25,610 --> 00:01:27,590
negative in the training data.

24
00:01:27,590 --> 00:01:32,970
And then I compare the output, y hat
of my classifier with the true label.

25
00:01:32,970 --> 00:01:36,370
In this case they both agree, so
the classifier is correct, so

26
00:01:36,370 --> 00:01:38,120
I add one to the correct column.

27
00:01:40,040 --> 00:01:43,330
However, if I take another sentence,
so for example, I take the sentence,

28
00:01:43,330 --> 00:01:48,280
the food was okay, which in the training
set was labeled as a negative example.

29
00:01:48,280 --> 00:01:50,980
But this is a little bit of
a euphemism from America.

30
00:01:50,980 --> 00:01:52,310
Is it positive, is it negative?

31
00:01:52,310 --> 00:01:55,356
People say okay when they mean bad stuff.

32
00:01:55,356 --> 00:02:02,010
The classifier might not be familiar
with that kind of cultural jargon.

33
00:02:02,010 --> 00:02:05,540
So when you hide the label, the classifier
might say y why hat is positive but

34
00:02:05,540 --> 00:02:07,050
really it's a negative label.

35
00:02:07,050 --> 00:02:09,340
And the classifiers then made a mistake.

36
00:02:09,340 --> 00:02:12,180
So now we put plus one
in the mistake column.

37
00:02:12,180 --> 00:02:16,010
So in general we're going to go example
by example in a validation set and

38
00:02:16,010 --> 00:02:18,960
measure which one's we got right and
which one's we made a mistake.

39
00:02:20,430 --> 00:02:26,530
And now we can measure what's called the
error or classification error in our data.

40
00:02:26,530 --> 00:02:30,560
So let me turn my pen on here,
and I'm going to use white here.

41
00:02:30,560 --> 00:02:32,180
So it's really simple.

42
00:02:32,180 --> 00:02:35,940
There are measures the fraction of
data points where we made mistakes.

43
00:02:35,940 --> 00:02:43,063
So it's the ratio between the number

44
00:02:43,063 --> 00:02:48,117
of mistakes that I made and

45
00:02:48,117 --> 00:02:54,570
the total number of data points.

46
00:02:54,570 --> 00:02:57,160
Sometimes we also talk
about accuracy as well,

47
00:02:57,160 --> 00:03:00,480
which is one minus
the number of the error.

48
00:03:00,480 --> 00:03:03,700
So here, we talk about,
instead of the number of mistakes.

49
00:03:03,700 --> 00:03:08,250
It's the number of data
points where we got them

50
00:03:08,250 --> 00:03:13,274
correct divided by the total
number of data points.

51
00:03:15,875 --> 00:03:17,302
Very good.

52
00:03:17,302 --> 00:03:21,479
[MUSIC]