1
00:00:00,000 --> 00:00:04,844
[MUSIC]

2
00:00:04,844 --> 00:00:08,395
Now that we had the chance to
review the basics of probability,

3
00:00:08,395 --> 00:00:12,350
let's figure out how it can be useful
in the classification problem.

4
00:00:13,420 --> 00:00:17,412
So, if we take our two sentences,
one where was definite plus one,

5
00:00:17,412 --> 00:00:21,064
definite positive, the sushi and
everything were awesome!

6
00:00:21,064 --> 00:00:27,160
And the other one I was not sure,
the sushi was good, the service was okay.

7
00:00:27,160 --> 00:00:31,631
For the first one, you can say that the
probability that it's a positive review

8
00:00:31,631 --> 00:00:33,443
for this sentence is very high.

9
00:00:33,443 --> 00:00:39,265
So, the probability that y equals
plus one, given the sentence is 0.99.

10
00:00:39,265 --> 00:00:44,987
On the other one though, the probability
of y equals plus 1 given the sentence,

11
00:00:44,987 --> 00:00:50,814
given x equals the sushi was good,
the service was okay, that's only 0.55.

12
00:00:50,814 --> 00:00:54,132
And in general, many classifiers
output this degree of beliefs,

13
00:00:54,132 --> 00:00:55,264
or this probability.

14
00:00:55,264 --> 00:00:59,535
So the probability of the wide
output label y given input x, and

15
00:00:59,535 --> 00:01:02,688
it's going to be extremely
useful in practice.

16
00:01:02,688 --> 00:01:08,609
So let's go through a little bit
of an example of what that means.

17
00:01:08,609 --> 00:01:13,690
So let's say we're given an input
data set with N data points.

18
00:01:13,690 --> 00:01:18,740
They have inputs, number of awesomes,
number of awfuls and the labels y.

19
00:01:19,960 --> 00:01:24,460
And I use it to train the classifier
that outputs these probabilities,

20
00:01:24,460 --> 00:01:26,794
the predictions we're
going to call that P hat, or

21
00:01:26,794 --> 00:01:31,850
estimate of the predictions which are
going to spend on the parameters w hat, or

22
00:01:31,850 --> 00:01:34,080
the coefficients w hat for our model.

23
00:01:35,550 --> 00:01:41,540
And so P hat is going to be useful for
predicting y hat,

24
00:01:41,540 --> 00:01:46,990
the predictive class, which in our
case is the sentiment for senders.

25
00:01:46,990 --> 00:01:48,740
So, let's see how that works.

26
00:01:48,740 --> 00:01:54,090
What we're going to do is learn
this P hat estimated from data,

27
00:01:54,090 --> 00:01:57,100
and use it to predict
the most likely class.

28
00:01:57,100 --> 00:02:00,910
So in particular,
if I'm giving you some input sentence and

29
00:02:00,910 --> 00:02:05,541
I compute the probability the y is
plus one, it's a positive review given

30
00:02:05,541 --> 00:02:10,170
the sentence and that's greater then 0.5,
I say the y hat is plus one,

31
00:02:10,170 --> 00:02:14,203
its a positive sentence and
if you say that its less than 0.5,

32
00:02:14,203 --> 00:02:18,053
then we say it's a negative sentence,
so y hat is minus one.

33
00:02:18,053 --> 00:02:20,153
But we're not just going to get that,

34
00:02:20,153 --> 00:02:23,448
P hat is going to give us more
kind of interpretive output.

35
00:02:23,448 --> 00:02:27,858
So it's not going to say just plus one and
minus one, but

36
00:02:27,858 --> 00:02:33,262
it's going to tell us how sure we
are that this is a positive review.

37
00:02:33,262 --> 00:02:35,949
[MUSIC]