1
00:00:00,124 --> 00:00:04,383
[MUSIC]

2
00:00:04,383 --> 00:00:07,306
One of the most common types
of classifiers is a so

3
00:00:07,306 --> 00:00:09,030
called linear classifier.

4
00:00:09,030 --> 00:00:10,310
So let's talk a little bit about that.

5
00:00:11,410 --> 00:00:15,390
The question here is how do
we represent the classifier?

6
00:00:15,390 --> 00:00:20,190
So we start with some sentences, for
example in a sentiment the mouse's case,

7
00:00:20,190 --> 00:00:20,850
the classifier.

8
00:00:20,850 --> 00:00:25,370
You get a prediction whether it is a
positive sentence or a negative sentence.

9
00:00:25,370 --> 00:00:27,710
So, how does this classifier work?

10
00:00:27,710 --> 00:00:31,400
In the sentimental analysis
you can imagine a simple

11
00:00:31,400 --> 00:00:32,680
kind of threshold classifier.

12
00:00:33,720 --> 00:00:37,600
Suppose that I take a sentence and
somebody tells me

13
00:00:37,600 --> 00:00:42,860
these are all of the positive words,
great, awesome, good amazing and so on.

14
00:00:42,860 --> 00:00:44,220
Here's a bunch of negative words.

15
00:00:44,220 --> 00:00:46,780
Bad, terrible, disgusting,
food, and so on.

16
00:00:46,780 --> 00:00:49,390
And so
what I can do is take the sentence and

17
00:00:49,390 --> 00:00:52,400
count how many positive words
there are in a sentence and

18
00:00:52,400 --> 00:00:54,810
how many negative words are in
the sentence, just count those.

19
00:00:56,000 --> 00:00:58,880
And then say if the number
of positive words is higher

20
00:00:58,880 --> 00:01:00,770
than the number of negative words.

21
00:01:00,770 --> 00:01:03,390
Then we have a positive sentence but

22
00:01:03,390 --> 00:01:06,990
if they use more negative more words
then you have a negative sentence.

23
00:01:06,990 --> 00:01:13,600
So for example, if the input sentence that
we have is the sushi was great positive

24
00:01:13,600 --> 00:01:20,350
one, the food was awesome positive two,
but the service was terrible negative one.

25
00:01:20,350 --> 00:01:23,230
You have two positives one negative and

26
00:01:23,230 --> 00:01:26,740
in the end was the positive wins and
so you have a positive prediction.

27
00:01:28,680 --> 00:01:32,140
Now threshold classifiers
have some limitations.

28
00:01:32,140 --> 00:01:34,940
Where does this list of positive and
negative words actually come from?

29
00:01:34,940 --> 00:01:39,430
It has to magically
come from somewhere and

30
00:01:39,430 --> 00:01:43,000
not just that, words have different
degrees of positive and negativeness.

31
00:01:43,000 --> 00:01:45,182
So great is more positive than good.

32
00:01:45,182 --> 00:01:49,760
You wanna tune and figure out
what's better great, good, amazing,

33
00:01:49,760 --> 00:01:51,360
is amazing better than great?

34
00:01:51,360 --> 00:01:52,310
Who knows?

35
00:01:52,310 --> 00:01:54,710
So how do we figure that out,
how do we weigh different words?

36
00:01:56,430 --> 00:02:00,890
And single words might not be
enough to make good classification.

37
00:02:00,890 --> 00:02:05,820
So good food, so
the food was good is positive.

38
00:02:05,820 --> 00:02:08,390
The food was not good
is actually negative.

39
00:02:09,530 --> 00:02:12,270
And so these are all issues
that need to be addressed.

40
00:02:12,270 --> 00:02:15,010
The first two areas where the positive and
negative words come from and

41
00:02:15,010 --> 00:02:17,840
how do you weigh them comes
from learning a classifier and

42
00:02:17,840 --> 00:02:18,830
we're going to talk about next.

43
00:02:20,970 --> 00:02:24,190
The issue of good versus not good

44
00:02:24,190 --> 00:02:27,440
is addressed by using more complicated
features than single words.

45
00:02:27,440 --> 00:02:29,730
And we're gonna talk about it
towards the end of the module.

46
00:02:29,730 --> 00:02:34,310
So a linear classifier,
instead of having a list of positive and

47
00:02:34,310 --> 00:02:37,720
negative words, actually takes all
the words and adds weights to them.

48
00:02:38,780 --> 00:02:43,722
So, for example, good might have a weight
of 1, great might have a weight of 1.5,

49
00:02:43,722 --> 00:02:46,514
awesome might have really
big weight of 2.7.

50
00:02:46,514 --> 00:02:53,223
While bad might have a weight of -1,
terrible may have a weight of -2.1,

51
00:02:53,223 --> 00:02:58,016
awful might be -3.3 cuz
awful is really just awful.

52
00:02:58,016 --> 00:03:01,060
And this ways that really
don't matter to sentiment.

53
00:03:01,060 --> 00:03:05,170
So things like the, we, where, restaurant,
they appear both in positive and

54
00:03:05,170 --> 00:03:07,170
negative sentences so
they get weight zero.

55
00:03:08,610 --> 00:03:11,900
Suppose somebody magically told you
what the weight of each word were.

56
00:03:11,900 --> 00:03:15,250
We're going to talk about it in
a little bit how those get learned

57
00:03:15,250 --> 00:03:16,500
by the classifier.

58
00:03:16,500 --> 00:03:21,270
But given those weight, how to figure out
if the sentence is positive or negative.

59
00:03:21,270 --> 00:03:22,610
Here we use the idea of scoring.

60
00:03:23,670 --> 00:03:25,460
So for example take this sentence.

61
00:03:25,460 --> 00:03:29,190
The sushi was great, the food was awesome,
but the service was terrible.

62
00:03:29,190 --> 00:03:30,280
Let's score that sentence.

63
00:03:30,280 --> 00:03:34,894
So we're gonna compute this,
the score of the input sentence x.

64
00:03:34,894 --> 00:03:40,967
So in this case, you get from great,
you get positive 1.2.

65
00:03:40,967 --> 00:03:45,894
From awesome, you get another 1.7 but

66
00:03:45,894 --> 00:03:51,383
from table, you get minus 2.1 right here,

67
00:03:51,383 --> 00:03:59,563
and so the grand total here is
2.9 minus 2.1, which is 0.8.

68
00:03:59,563 --> 00:04:04,943
And the key here is that since the score
of the sentence is greater than zero,

69
00:04:04,943 --> 00:04:08,910
we're going to predict that
it is a positive sentence.

70
00:04:10,080 --> 00:04:16,390
If the score was the opposite,
if the score of x were less than zero,

71
00:04:16,390 --> 00:04:19,540
then we'd have predicted
it's a negative sentence.

72
00:04:19,540 --> 00:04:22,110
So this is how a linear classifier works,

73
00:04:22,110 --> 00:04:26,760
if you know the weight of each word,
and this is called

74
00:04:26,760 --> 00:04:31,860
a linear classifier because the output is
basically the weighted sum of the input.

75
00:04:31,860 --> 00:04:35,530
Just weight, what features appears,
what words appear in the input.

76
00:04:37,390 --> 00:04:41,590
So we're working for simple linear
classifier to start out with.

77
00:04:41,590 --> 00:04:45,930
So in summary, given a sentence and
given the weights for

78
00:04:45,930 --> 00:04:48,890
the sentence,
what we do is compute the score,

79
00:04:48,890 --> 00:04:52,130
which is the weighted count of
the words that appear in the sentence.

80
00:04:52,130 --> 00:04:54,820
And then we say if the score
is greater than zero,

81
00:04:54,820 --> 00:04:56,470
we predict y-hat to be positive.

82
00:04:56,470 --> 00:05:00,800
While if the score is less than zero,
we predict it to be negative.

83
00:05:00,800 --> 00:05:02,546
And that is a linear classifier.

84
00:05:02,546 --> 00:05:05,919
[MUSIC]