[MUSIC] Now that we've seen how regularization can play a role in the classification setting, let's observe in our data set what happens to the boundaries as we introduce our regularization penalty. We're going to work with that Degree 20 features. So feature discretion model with features of polynomial degrees 20, which led to this technical term that I called the Crazy Decision Boundary. And the parameters had very large magnitude, in fact it varied from minus 3,170 to 3,803, they were very big. Now we're going to take the same setting, same number of features, same features, same data, same everything but just vary the parameter lambda and see what happens and here we're showing the results of doing exactly that. When lamba is equal to zero, we get very large coefficients when lamba is large like ten we get good size coefficients, let's say smaller coefficients Okay, and so for lambda equals zero, we had that crazy decision boundary and for large lambdas, we have a nicer smoother boundary. In fact, I trust that this boundary with lambda equal ten much more than the one with lambda equals zero. And the decision model of lambda equals ten looks a lot like that really beautiful one I got with the parabola, and fit my data really well but here there are tons of more features and nevertheless, hiding a little bit of regularization help us get that really nice separating plane that I can trust. We can also look to the coefficient parts, what happens to a coefficient as we increase the penalty lambda, so in the beginning, when we have a unregularized problem this coefficient tends to be large. But as we increase lambda, they tend to become smaller and smaller and smaller and go to zero. I've used the review data set, property review data set here and I picked a few words and fit a legislation model just using those words with different levels of regularization. So for example, the words that have positive coefficients tend to be associated with positive aspects of reviews, while the ones with negative coefficients tend to be associated with negative aspects of reviews. What is the word in quotes that has the most positive weight? Well, if you look at the key here, you'll see the word that has the most positive weight is actually the emoticon smiley face, well, the word that has most negative weight is another another emoticon the sad face. And the beginning all these words have pretty large coefficients except the words near zero, which are words like this and review, which are not associated with either positive things or negative things, although if the word review shows up it's slightly correlated with a negative review but in general, this coefficients much more than the others. And as I increase the regularization lambda, you see the coefficients can become smaller and smaller, and if I were to keep drawing this, they will eventually go to zero. And now, if I were to use cross validation to pick the best lambda, I'll get a result kind of around here, and I'm going to call that lambda star. And so that's what you do is cross validation, to find that point where it's fitting data pretty well, but it's not over-fitting it too much. And as a last point, I'm going to show you something that is pretty exciting. It's really beautiful about regularization with regression. Regularization doesn't only address the crazy wiggly decision boundaries but addressing with those over-confidence problems that we saw with over-fitting regularization. So I'm taking the same coefficient, the same thing that I've learned. The lambda is increasing, the range of coefficients is decreasing, they're getting smaller but I'm talking the bottom here. The actual decision boundaries that we learned and the notion of uncertainty on their data. So if lambda is equal to zero we have this highly over confident predictions. If lambda is a one, not only do I get a more natural kind of parabola like decision boundary even though I'm using Degree 20 features, polynomial degree 20 is features. I get a very natural certainty region. So the region why I don't know if it's positive or negative is really those points in the boundary which kind of between those clusters of positive points and the clusters of negative points. And you get this kind of beautiful smooth transition. So by introducing regularization, we've now addressed those two fundamental problems where over-fitting comes in in logistic aggression. [MUSIC]