1
00:00:00,000 --> 00:00:04,162
[MUSIC]

2
00:00:04,162 --> 00:00:07,051
In order to understand linear
classifiers a little better,

3
00:00:07,051 --> 00:00:09,375
let's review the notion
of decision boundary,

4
00:00:09,375 --> 00:00:14,020
which is a boundary between positive
predictions and negative predictions.

5
00:00:14,020 --> 00:00:19,550
Now, let's say that I have taken my data
and I trained my linear classifier and

6
00:00:19,550 --> 00:00:23,410
every word has zero weight except for
two of them.

7
00:00:23,410 --> 00:00:28,588
Awesome has weight 1.0 and
awful has weight -1.5.

8
00:00:28,588 --> 00:00:30,074
So what does that mean?

9
00:00:30,074 --> 00:00:35,401
That means that the score of any sentence
is 1.0 times the number of times the word

10
00:00:35,401 --> 00:00:40,520
awesome shows up minus 1.5 times the
number of times the word awful shows up.

11
00:00:41,908 --> 00:00:45,310
So let's plot that into
a graph which depends for

12
00:00:45,310 --> 00:00:48,110
every sentence the number of awesomes and
the number of awfuls.

13
00:00:48,110 --> 00:00:52,725
So for example, for a sentence, the sushi
was awesome, the food was awesome,

14
00:00:52,725 --> 00:00:54,550
but the service was awful.

15
00:00:55,700 --> 00:00:59,030
We're going to plot that into a space
where we're going to have two awesomes and

16
00:00:59,030 --> 00:00:59,653
one awful.

17
00:00:59,653 --> 00:01:03,490
So it gets plotted in the 2,1 point.

18
00:01:03,490 --> 00:01:07,479
And then every sentence that I might
have in my training data set or

19
00:01:07,479 --> 00:01:11,612
my prediction set might have, say,
three awfuls and one awesome,

20
00:01:11,612 --> 00:01:14,170
three awesomes and no awfuls, and so on.

21
00:01:14,170 --> 00:01:15,530
And I have a data set like this.

22
00:01:17,230 --> 00:01:21,823
The classifier that we've trained
with the coefficients 1.0 and

23
00:01:21,823 --> 00:01:26,339
-1.5 will have a decision boundary
that corresponds to a line,

24
00:01:26,339 --> 00:01:31,978
where 1.0 times awesome minus 1.5 times
the number of awfuls is equal to zero.

25
00:01:31,978 --> 00:01:35,786
Everything below that line
has score greater than zero.

26
00:01:37,644 --> 00:01:39,030
For any of those points.

27
00:01:39,030 --> 00:01:43,350
And any points above that line
are going to have score less than zero.

28
00:01:43,350 --> 00:01:48,800
For example, take the point
three awesomes and zero awfuls.

29
00:01:48,800 --> 00:01:52,870
That has a score greater than zero, so
we're going to classify that as plus 1.

30
00:01:52,870 --> 00:01:54,680
Similarly to all those
points below the line.

31
00:01:56,290 --> 00:01:59,590
Now, for the points above the line,
if you check yourself,

32
00:01:59,590 --> 00:02:02,000
you'll see all of those
have negative score, so

33
00:02:02,000 --> 00:02:04,750
we're going to label all of
those as negative predictions.

34
00:02:05,880 --> 00:02:08,749
And so there's that line,
everything below the line positive,

35
00:02:08,749 --> 00:02:10,750
everything above the line's negative,

36
00:02:10,750 --> 00:02:14,230
that's what makes it a linear classifier,
a linear decision boundary, really.

37
00:02:15,890 --> 00:02:16,390
Good.

38
00:02:16,390 --> 00:02:20,210
So we have seen that with two features or

39
00:02:20,210 --> 00:02:26,370
two coefficients, our decision boundary
is really just a line in this 2D plane.

40
00:02:26,370 --> 00:02:28,735
Now, in general,
we might have more coefficients than that.

41
00:02:28,735 --> 00:02:32,246
So if you have three features
that have no zero value,

42
00:02:32,246 --> 00:02:35,913
no zero coefficient,
then what we really have is a plane

43
00:02:35,913 --> 00:02:41,310
that tries to separate the positive
points from the negative ones.

44
00:02:41,310 --> 00:02:44,760
If you have more than three
non-zero coefficients,

45
00:02:44,760 --> 00:02:49,090
then we are in this high-dimensional
space in high dimensions.

46
00:02:50,300 --> 00:02:54,420
And we call that a hyperplane that
tries to separate the positives

47
00:02:54,420 --> 00:02:55,720
from the negatives.

48
00:02:55,720 --> 00:02:58,840
That was a sci-fi reference by the way.

49
00:02:58,840 --> 00:03:03,725
And in general, if you visualize these
hyperplanes in lower dimensional space,

50
00:03:03,725 --> 00:03:08,254
so if you use more complicated features or
shapes, you might have a decision

51
00:03:08,254 --> 00:03:12,672
boundary that looks kind of like that
squiggly, more complicated line.

52
00:03:12,672 --> 00:03:17,179
[MUSIC]