1
00:00:00,000 --> 00:00:04,436
[MUSIC]

2
00:00:04,436 --> 00:00:08,150
Let's start by reviewing the intuition
behind linear classifiers.

3
00:00:08,150 --> 00:00:10,220
The same intuition we
covered in the first course.

4
00:00:11,520 --> 00:00:14,990
A linear classifier will
take us in input some

5
00:00:14,990 --> 00:00:18,770
quantity x which in our case
is sentences from reviews.

6
00:00:18,770 --> 00:00:21,030
It's going to feed it through
it's classifier model and

7
00:00:21,030 --> 00:00:26,020
is going to make a prediction y that says,
is this a positive review, in which case

8
00:00:26,020 --> 00:00:31,310
y hat is plus one, or is it a negative
review in which case y hat is minus one.

9
00:00:31,310 --> 00:00:34,188
That's what we're trying to figure out.

10
00:00:34,188 --> 00:00:38,980
A linear classifier does a little bit
more, associates every word for weight or

11
00:00:38,980 --> 00:00:43,700
coefficient which says how positively
influential this word is or

12
00:00:43,700 --> 00:00:45,280
how negatively influential.

13
00:00:45,280 --> 00:00:50,363
So good might have a coefficient of 1.0,
great might have a coefficient of 1.5.

14
00:00:50,363 --> 00:00:53,553
Awesome, is awesome and
has a coefficient 2.7.

15
00:00:53,553 --> 00:00:58,103
Well in the negative side, might,
bad might have a coefficient of minus 1,

16
00:00:58,103 --> 00:00:59,630
terrible minus 2.1.

17
00:00:59,630 --> 00:01:03,462
But awful, is just awful, so minus 3.3.

18
00:01:03,462 --> 00:01:07,048
And then some words are not that relevant
to the sentiment of the review might have

19
00:01:07,048 --> 00:01:07,850
0 coefficient.

20
00:01:09,200 --> 00:01:13,071
Now let's see how these coefficient's can
be used to make a prediction of whether

21
00:01:13,071 --> 00:01:15,370
a sentence is positive or negative.

22
00:01:15,370 --> 00:01:19,040
So for example let's take this
sentence that says the sushi's great.

23
00:01:19,040 --> 00:01:21,030
So how do you score the sentence?

24
00:01:21,030 --> 00:01:26,950
Let's compute the score of this
imput sentence xy, x1, xi.

25
00:01:26,950 --> 00:01:29,770
The sentence says, the sushi is great, so

26
00:01:29,770 --> 00:01:34,000
we look at the coefficient of great,
and we see it's 1.2.

27
00:01:34,000 --> 00:01:39,814
And now it says, the food was awesome,
so the coefficient of that is 1.7.

28
00:01:41,460 --> 00:01:44,210
And then it says, but,
the service was terrible.

29
00:01:44,210 --> 00:01:45,520
My God, the service was terrible.

30
00:01:45,520 --> 00:01:46,500
So you subtract 2.1.

31
00:01:46,500 --> 00:01:51,170
And now we ask,
what is the total score for this sentence?

32
00:01:51,170 --> 00:01:53,450
So some things are positive,
some things are negative.

33
00:01:53,450 --> 00:01:58,980
The total score is 0.8,
which is greater than 0.

34
00:01:58,980 --> 00:02:03,244
And that implies that we're
going to predict that y hat,

35
00:02:03,244 --> 00:02:06,683
the sentiment for
the sentence is plus one.

36
00:02:06,683 --> 00:02:08,633
So it's a positive review.

37
00:02:12,473 --> 00:02:17,129
And this is called a linear
classifier because the output is

38
00:02:17,129 --> 00:02:20,230
the weighted sum of the inputs.

39
00:02:20,230 --> 00:02:22,560
So that's kind of what
a linear classifier is.

40
00:02:22,560 --> 00:02:25,300
We'll see in a little bit more
details what does that really means.

41
00:02:26,860 --> 00:02:29,670
So more generally a simple
linear classifier

42
00:02:29,670 --> 00:02:33,980
which we're going to take as input
coefficient associated with each word.

43
00:02:33,980 --> 00:02:36,450
And it's going to compute a score for
that input.

44
00:02:36,450 --> 00:02:40,658
If the score is greater than zero, we say
that the output, the prediction y hat,

45
00:02:40,658 --> 00:02:41,360
is +1.

46
00:02:41,360 --> 00:02:45,974
And if the score is less than zero,
we say that the prediction is -1.

47
00:02:47,240 --> 00:02:50,050
Now, what we need to do is
train the weights of these

48
00:02:50,050 --> 00:02:51,700
linear classifiers from data.

49
00:02:51,700 --> 00:02:56,060
So given some input training data
that includes sentences of reviews

50
00:02:56,060 --> 00:02:59,530
labeled with either plus one or
minus one, positive or negative.

51
00:02:59,530 --> 00:03:04,060
We're going to split those into some
training set and some validation set.

52
00:03:04,060 --> 00:03:07,080
Then we're going to feed that training
set to some learning algorithm which

53
00:03:07,080 --> 00:03:10,190
is going to learn the weights
associated with each word,

54
00:03:10,190 --> 00:03:14,410
so 1.0 for good, 1.7 for
awesome and so on.

55
00:03:14,410 --> 00:03:17,570
And then after we learn this classifier,
we're going to go back and

56
00:03:17,570 --> 00:03:20,950
evaluate its accuracy
on that validation set.

57
00:03:20,950 --> 00:03:25,160
So our goal for
today is to explore that learning box.

58
00:03:25,160 --> 00:03:27,386
How do we learn this
classifier from data and

59
00:03:27,386 --> 00:03:32,300
understand a little bit more deeply of
what a linear classifier is really about?

60
00:03:32,300 --> 00:03:35,133
In particular,
in the context of logistic regression.

61
00:03:35,133 --> 00:03:39,779
[MUSIC]