[MUSIC] Now that we've talked about
Classification Error, we can explore the notion of
Overfitting in classification. Before we start, let's review what we
talked about when we discussed overfitting the regression course. Going back to the regression course, we had this running example of
where we're trying to predict price of the house in the y axis
given some features of the house. In this case, the x axis is the square
feet or the size of the house. And we see on the left here a really nice,
smooth curve that helps us predict what's the
price for house given it's square feet. However if instead of using a second
degree polynomial like we did on the left, we use a higher or lower polynomial we
might get a really crazy wiggly line that fits the training data extremely well but
doesn't generalize beyond the training data to data in the test set or
in the truth in the real world. so in this case we say that the model
F here has over-fit the training data. Now in classification, we going to
have the same kinds of problems and we're going to have to try to address it. What happens when you learn the model,
which is too good on the training data is training error that is too low but
doesn't do well on the true error. Now let's preview some of these
overfitting plots which we discussed quite a lot in the regression course. So, here in the x axis I'm
showing the model complexity. The model in the left is
a very simple constant model, while the model in the right is
a crazily polynomial very high degree. So, if you look at the training error, when you use a constant model it doesn't
do really well in the training data. But it feels this crazy
high degree polynomial, you might get zero training error. And somewhere in between,
we get training error. There is decreasing like that,
and eventually goes to zero. So this is my training error. And the problem with
training error goes to zero is if you look the underlying model,
it doesn't do well on other houses. So the true error is very high
when you don't have a lot of data. Sorry when your models very simple. But it's also very high when
your models very complex. And the true error function has this
kind of characteristic process. It goes down and then goes up. So this is the true error. So overfitting is going to happen when we pick something that's
too good in the training data. So it's somewhere over here, but it
doesn't do well with aspect to true error. So in other words, you get some w hat over
here that has very low training error, but high true error while there were
exists some other certain parameters. We can call them w*, which might not have
done as well in the training data, but man, they did much better as
performance in the true error. And so we want to try to avoid
taking these highly-performant models in the training data that
don't do well in the real world. So we want to shift From the right
here to the left to avoid overfitting. Now this was for the regression setting. Next we'll talk about it in
the classification setting. [MUSIC]