[MUSIC] So next let's discuss what does over-fitting look like for a classifier. So we talked about a wide range of classifiers, so for example here I am fitting a linear classifier to the data and it's our usual example with predicting that their review is positive or negative for a restaurant. And so we see that the points below the line have score greater than 0. The points above the line have scored less than 0. So below the line I predict as positive, above the line I predict as negative. And you get this very, very simple line in this simple example. For this module I've created a simple data set shown here on the lower left with positive and negative examples and I want to fit a bunch of different reference classifiers to it to really observe how over-fitting happens in practice. So first, I'm going to fit a simple classifier with a linear feature so it's just the constant that is zero, the coefficient of x1 which is going to be w1 and the coefficient of x2 which is going to be w2 and if I learn the logistical rational classifier in this data I get the following results. The constant becomes 0.23 the co-efficient of x one becomes one point twelve and the co-efficient of x two becomes minus 1.07. So, on the right here I'm showing the resulting decision boundary from this classifier. So, this line here corresponds to the point where 0.23 plus 1.12 x 1 minus 1. Let's clear that a little bit, minus 1 point 07, x2 is equal to 0. So this is our observation of that transition from the points down here where the Score(x) is greater than 0 and the points over here where the Score(x) is less than zero. So the points above that line in predicting is being negative points and the points below the line is predicting them to be positive. And you see some interesting things in this simple data set with just a simple classifier. You see that the, and this is a pretty decent job of separating the positives from the negatives but there are a few points here where they are mis-classified in the training data so this plus is over here and this minus is over here. And the question is, can I do better? Can I fit a model with maybe slightly fancier features that does better on this data set? To fit our data model, I'm going to try now what are called quadratic features. So, I'm going to consider, not just one, x1, and x2, but I'm also going to consider x1 squared, and x2 squared. Note that this is not general quadratic features, and not considering x1 times x2 or the cross terms, because then they becomes pretty big. Later on so I'm just going to do this simple quadratic feature. And if I learn a classifier on the same data, I get a really cool decision boundary. And then the decision boundary, when I project it into this two-dimensional space, becomes this kind of curved parabola. So the coefficient in this case of The constant becomes 1.68, x1 is 1.39, x2 is -0.59. Both of these numbers are different from the ones on the previous slide because the parameters are all updated. Within this view of the quadratic terms and here the quadratic terms become minus 0.17 and minus 0.96. And using this quadratic terms I get this beautiful quadratic decision boundary over here which basically says that 1.68 plus 1.39 X1 minus 0.59x2 minus 0.17x1 squared minus 0.96x2 squared. This whole thing is equal to zero, and the points now, on the left side of the parabola, are the ones that have Score(x), be less than 0 and the points on the right side here are those where a score of x is greater than 0. And you get this beautiful curve where yes you still make a couple of mistakes but I tell you, those mistakes seem okay to me. The data pre-well you should never expect for real data sets. For everything like, in fact, as we will see later in this module, getting everything right should be a big sign of warning for you. But I get pretty good fit and it looks beautiful and note, by the way, that the coefficients that I learn over here are pretty good. They look like a natural magnitude, about one, 0.5, and so on. Now let's see what happens when you do an even higher degree polynomial. [MUSIC]