[MUSIC] So next let's discuss what does
over-fitting look like for a classifier. So we talked about a wide
range of classifiers, so for example here I am fitting a linear
classifier to the data and it's our usual example with predicting that their review
is positive or negative for a restaurant. And so we see that the points below
the line have score greater than 0. The points above the line
have scored less than 0. So below the line I predict as positive,
above the line I predict as negative. And you get this very,
very simple line in this simple example. For this module I've created a simple data
set shown here on the lower left with positive and negative examples and I want
to fit a bunch of different reference classifiers to it to really observe
how over-fitting happens in practice. So first, I'm going to fit a simple
classifier with a linear feature so it's just the constant that is zero, the coefficient of x1 which
is going to be w1 and the coefficient of x2 which
is going to be w2 and if I learn the logistical rational classifier
in this data I get the following results. The constant becomes 0.23 the co-efficient
of x one becomes one point twelve and the co-efficient of x
two becomes minus 1.07. So, on the right here I'm showing
the resulting decision boundary from this classifier. So, this line here corresponds to the point where 0.23 plus 1.12 x 1 minus 1. Let's clear that a little bit,
minus 1 point 07, x2 is equal to 0. So this is our observation of
that transition from the points down here where the Score(x)
is greater than 0 and the points over here where
the Score(x) is less than zero. So the points above that line in
predicting is being negative points and the points below the line is
predicting them to be positive. And you see some interesting
things in this simple data set with just a simple classifier. You see that the, and this is a pretty
decent job of separating the positives from the negatives but there are a few
points here where they are mis-classified in the training data so this plus is
over here and this minus is over here. And the question is, can I do better? Can I fit a model with maybe
slightly fancier features that does better on this data set? To fit our data model, I'm going to try
now what are called quadratic features. So, I'm going to consider,
not just one, x1, and x2, but I'm also going to consider x1 squared,
and x2 squared. Note that this is not general quadratic
features, and not considering x1 times x2 or the cross terms,
because then they becomes pretty big. Later on so I'm just going to do
this simple quadratic feature. And if I learn a classifier
on the same data, I get a really cool decision boundary. And then the decision boundary,
when I project it into this two-dimensional space,
becomes this kind of curved parabola. So the coefficient in this case
of The constant becomes 1.68, x1 is 1.39, x2 is -0.59. Both of these numbers
are different from the ones on the previous slide because
the parameters are all updated. Within this view of
the quadratic terms and here the quadratic terms become
minus 0.17 and minus 0.96. And using this quadratic
terms I get this beautiful quadratic decision boundary
over here which basically says that 1.68 plus 1.39 X1 minus 0.59x2 minus 0.17x1 squared minus 0.96x2 squared. This whole thing is equal to zero, and the points now,
on the left side of the parabola, are the ones that have Score(x),
be less than 0 and the points on the right
side here are those where a score of x is greater than 0. And you get this beautiful curve where yes
you still make a couple of mistakes but I tell you,
those mistakes seem okay to me. The data pre-well you should
never expect for real data sets. For everything like, in fact,
as we will see later in this module, getting everything right should
be a big sign of warning for you. But I get pretty good fit and
it looks beautiful and note, by the way, that the coefficients
that I learn over here are pretty good. They look like a natural magnitude,
about one, 0.5, and so on. Now let's see what happens when you
do an even higher degree polynomial. [MUSIC]