1
00:00:00,125 --> 00:00:04,581
[MUSIC]

2
00:00:04,581 --> 00:00:07,653
Now that we've seen the logistic
regression model and

3
00:00:07,653 --> 00:00:09,987
we understand it in quite a bit of detail,

4
00:00:09,987 --> 00:00:14,250
let's give a quick overview of what
learning means for this model.

5
00:00:14,250 --> 00:00:17,020
Although in the next module we're going
to go into quite a lot of detail to

6
00:00:17,020 --> 00:00:18,210
learning algorithm.

7
00:00:18,210 --> 00:00:21,610
This is just a little primer, a teaser for
what's coming in the next module.

8
00:00:21,610 --> 00:00:24,850
Now we're going to start
from some data and

9
00:00:24,850 --> 00:00:29,320
that data has inputs x and
outputs either plus one or minus one.

10
00:00:29,320 --> 00:00:32,970
As we said we're going to split that into
a training set and a validation set.

11
00:00:32,970 --> 00:00:36,310
And from the training set we're
going to run a learning algorithm that

12
00:00:36,310 --> 00:00:39,740
will output the parameter estimates w hat.

13
00:00:39,740 --> 00:00:43,070
And those w hats are going to
be plugged into the model.

14
00:00:43,070 --> 00:00:48,500
To estimate the probability that an input
sentence is either positive or negative.

15
00:00:48,500 --> 00:00:53,273
And, of course, we can use the learn
model to take the validation set and

16
00:00:53,273 --> 00:00:58,219
estimate how good it is, what the quality
metrics are, what the error is.

17
00:00:59,490 --> 00:01:03,170
Now to find the best classifier we
are going to define the quality metric.

18
00:01:03,170 --> 00:01:07,170
In this case the quality metric is going
to be called the likelihood function.

19
00:01:07,170 --> 00:01:12,160
So very possible parameters serve
coefficient w for example w0, w1, w2,

20
00:01:12,160 --> 00:01:17,530
we will be able to score it according to,
l of w to figure out how good it is.

21
00:01:17,530 --> 00:01:21,222
So for example if I take this data
set of plusses and minuses and

22
00:01:21,222 --> 00:01:25,419
learn the line shown in green,
we might get a particular likelihood.

23
00:01:25,419 --> 00:01:29,908
So, for example,
if the parameter w0 is 0, w1 is 1 but

24
00:01:29,908 --> 00:01:35,693
w2 is -1.5, the likelihood might
be 10 to the -6, pretty small.

25
00:01:35,693 --> 00:01:38,200
These numbers actually
tend to be pretty small.

26
00:01:38,200 --> 00:01:45,470
For this alternative line, where w0
is now 1, w1 is still 1, w2 is -1.5,

27
00:01:45,470 --> 00:01:49,900
the likelihood function is a little better
10 to the -5 instead of 10 to the -6.

28
00:01:49,900 --> 00:01:57,475
But perhaps for this best line over here,
where w0 is 1, w1 is 0.5 and w2 is -1.5.

29
00:01:57,475 --> 00:02:01,550
You get the best likelihood, 10 to the -4.

30
00:02:01,550 --> 00:02:05,490
So we'd like an approach that
would actually search for

31
00:02:05,490 --> 00:02:08,680
the possible values of w
to find the best line.

32
00:02:09,710 --> 00:02:14,624
And, as we will see in the next module,
we'll use a gradient ascent algorithm to

33
00:02:14,624 --> 00:02:19,197
find the set of parameters w that has
the highest likelihood best quality.

34
00:02:19,197 --> 00:02:23,869
[MUSIC]