1
00:00:00,000 --> 00:00:04,375
[MUSIC]

2
00:00:04,375 --> 00:00:08,431
This is the same dataset ,and the same
learn model from our fewer slides ago.

3
00:00:08,431 --> 00:00:14,387
And what I'm plotting here on the right
is not just a decision boundary but

4
00:00:14,387 --> 00:00:18,740
is the probability that y
hat is equal to plus one.

5
00:00:18,740 --> 00:00:20,339
So it's a probability plot.

6
00:00:20,339 --> 00:00:27,072
For other points over here,
the probability is approximately zero.

7
00:00:27,072 --> 00:00:31,706
So approximately zero
chances that the points

8
00:00:31,706 --> 00:00:36,500
there with minus five and
four are positive.

9
00:00:36,500 --> 00:00:42,420
While the points over here have
a probability approximately one.

10
00:00:44,170 --> 00:00:46,830
So the probability being y equals plus one

11
00:00:46,830 --> 00:00:48,500
is approximately one on
the bottom right corner.

12
00:00:48,500 --> 00:00:53,476
So all that make sense and
what makes most sense to me is that

13
00:00:53,476 --> 00:00:58,555
the region in between,
right here that right region this is

14
00:00:58,555 --> 00:01:03,839
the region where the probability
is approximately .5 and so

15
00:01:03,839 --> 00:01:10,138
what kind of uncertain as who whether
were positive or negative review and

16
00:01:10,138 --> 00:01:15,342
it's a region that's a pretty
wide region of uncertainty.

17
00:01:18,109 --> 00:01:22,470
So although the linear classifier the
straight line here polynomial's is agree

18
00:01:22,470 --> 00:01:24,860
one was not a great fit to the data.

19
00:01:24,860 --> 00:01:28,330
The uncertainty measures
makes quite a lot of sense.

20
00:01:28,330 --> 00:01:31,000
The points over here that
were getting misclassified

21
00:01:31,000 --> 00:01:35,280
I'm actually uncertain about whether
their positive or negative and so

22
00:01:35,280 --> 00:01:37,860
I can feel like this classifier is
doing something very reasonable.

23
00:01:39,430 --> 00:01:42,370
Not let's look at degree
two form on your fit.

24
00:01:42,370 --> 00:01:46,710
So what happens is degree
two polynomial features or

25
00:01:46,710 --> 00:01:51,190
quadratic features and learn the same
classifier as we learned a few slides ago.

26
00:01:51,190 --> 00:01:57,600
But again, plot the probability
that y hat equals plus one.

27
00:01:58,800 --> 00:02:00,860
As we saw from a few slides ago,

28
00:02:00,860 --> 00:02:04,290
we believe that this quadratic fit was
actually a better fit to the data.

29
00:02:06,890 --> 00:02:10,682
So if you look at it,
the uncertainty region is narrower.

30
00:02:14,257 --> 00:02:18,230
And to me, this makes a lot of sense,
I have a better fit to the data.

31
00:02:18,230 --> 00:02:20,640
There's less points that
I'm ascertain about.

32
00:02:20,640 --> 00:02:23,790
And in fact, the places where I have
uncertainty are exactly the ones in

33
00:02:23,790 --> 00:02:26,280
the boundary region where I
should have some uncertainty.

34
00:02:26,280 --> 00:02:28,920
Ones where I'm not sure if
they're plus one or minus one,

35
00:02:28,920 --> 00:02:32,530
they're close to the boundary,
it makes a lot of sense.

36
00:02:32,530 --> 00:02:36,500
So this a really great fit not just in
terms of that decision boundary but

37
00:02:36,500 --> 00:02:37,790
also in terms of the probabilities.

38
00:02:37,790 --> 00:02:41,620
The places where the probabilities
closer to 0.5 are really the ones

39
00:02:41,620 --> 00:02:44,210
where I'm really unsure
about what's going on.

40
00:02:44,210 --> 00:02:47,200
Then it's mostly decreases or
it mostly increases

41
00:02:47,200 --> 00:02:51,020
depending of if I go to the left side or
the right side of the parabola.

42
00:02:52,300 --> 00:02:55,730
Now let's see what happens when
I use higher order features, for

43
00:02:55,730 --> 00:03:01,050
example polynomial degrees six feature or
polynomial degrees 20 feature.

44
00:03:01,050 --> 00:03:04,910
We saw that those decision boundaries
became really wiggly and crazy, but

45
00:03:04,910 --> 00:03:07,250
now if you look at
the uncertainty regions,

46
00:03:07,250 --> 00:03:09,590
you'll see they become
really really narrower.

47
00:03:09,590 --> 00:03:12,610
So you've gotta squint to see these,
because they're really really thin.

48
00:03:12,610 --> 00:03:17,364
But you can see them over here kind of in
the little white band, so according to

49
00:03:17,364 --> 00:03:22,047
this model, not only is the [INAUDIBLE]
boundary this really crazy line, but

50
00:03:22,047 --> 00:03:25,559
the only places where I'm
unsure about my prediction and

51
00:03:25,559 --> 00:03:28,660
this little places thin
little bands in between.

52
00:03:28,660 --> 00:03:32,440
So there's tiny uncertainty regions and

53
00:03:32,440 --> 00:03:36,230
I'm overfitting and
overconfident about it.

54
00:03:36,230 --> 00:03:40,710
The way I think about it and
I say is we're sure we're right.

55
00:03:42,680 --> 00:03:43,960
And we're surely wrong about that.

56
00:03:43,960 --> 00:03:48,481
So we're absolutely wrong, but we're
sure we're right, and that's really bad.

57
00:03:48,481 --> 00:03:53,057
And so uncertainty is something that's
very important in classifiers and

58
00:03:53,057 --> 00:03:57,489
by looking downstairs we have another
interpretation of overfitting,

59
00:03:57,489 --> 00:04:02,138
another way that overfitting gets
expressed in classification by creating

60
00:04:02,138 --> 00:04:06,296
this really narrow uncertainty bands,
and so we want to avoid that.

61
00:04:06,296 --> 00:04:08,085
We'll do everything we can to avoid it.

62
00:04:08,085 --> 00:04:13,309
[MUSIC]