[MUSIC] This is the same dataset ,and the same
learn model from our fewer slides ago. And what I'm plotting here on the right
is not just a decision boundary but is the probability that y
hat is equal to plus one. So it's a probability plot. For other points over here,
the probability is approximately zero. So approximately zero
chances that the points there with minus five and
four are positive. While the points over here have
a probability approximately one. So the probability being y equals plus one is approximately one on
the bottom right corner. So all that make sense and
what makes most sense to me is that the region in between,
right here that right region this is the region where the probability
is approximately .5 and so what kind of uncertain as who whether
were positive or negative review and it's a region that's a pretty
wide region of uncertainty. So although the linear classifier the
straight line here polynomial's is agree one was not a great fit to the data. The uncertainty measures
makes quite a lot of sense. The points over here that
were getting misclassified I'm actually uncertain about whether
their positive or negative and so I can feel like this classifier is
doing something very reasonable. Not let's look at degree
two form on your fit. So what happens is degree
two polynomial features or quadratic features and learn the same
classifier as we learned a few slides ago. But again, plot the probability
that y hat equals plus one. As we saw from a few slides ago, we believe that this quadratic fit was
actually a better fit to the data. So if you look at it,
the uncertainty region is narrower. And to me, this makes a lot of sense,
I have a better fit to the data. There's less points that
I'm ascertain about. And in fact, the places where I have
uncertainty are exactly the ones in the boundary region where I
should have some uncertainty. Ones where I'm not sure if
they're plus one or minus one, they're close to the boundary,
it makes a lot of sense. So this a really great fit not just in
terms of that decision boundary but also in terms of the probabilities. The places where the probabilities
closer to 0.5 are really the ones where I'm really unsure
about what's going on. Then it's mostly decreases or
it mostly increases depending of if I go to the left side or
the right side of the parabola. Now let's see what happens when
I use higher order features, for example polynomial degrees six feature or
polynomial degrees 20 feature. We saw that those decision boundaries
became really wiggly and crazy, but now if you look at
the uncertainty regions, you'll see they become
really really narrower. So you've gotta squint to see these,
because they're really really thin. But you can see them over here kind of in
the little white band, so according to this model, not only is the [INAUDIBLE]
boundary this really crazy line, but the only places where I'm
unsure about my prediction and this little places thin
little bands in between. So there's tiny uncertainty regions and I'm overfitting and
overconfident about it. The way I think about it and
I say is we're sure we're right. And we're surely wrong about that. So we're absolutely wrong, but we're
sure we're right, and that's really bad. And so uncertainty is something that's
very important in classifiers and by looking downstairs we have another
interpretation of overfitting, another way that overfitting gets
expressed in classification by creating this really narrow uncertainty bands,
and so we want to avoid that. We'll do everything we can to avoid it. [MUSIC]