[MUSIC] Now that we've seen how regularization can
play a role in the classification setting, let's observe in our data set what happens
to the boundaries as we introduce our regularization penalty. We're going to work with
that Degree 20 features. So feature discretion model with
features of polynomial degrees 20, which led to this technical term that
I called the Crazy Decision Boundary. And the parameters had
very large magnitude, in fact it varied from minus 3,170
to 3,803, they were very big. Now we're going to take the same setting,
same number of features, same features, same data, same everything but
just vary the parameter lambda and see what happens and here we're showing
the results of doing exactly that. When lamba is equal to zero,
we get very large coefficients when lamba
is large like ten we get good size coefficients,
let's say smaller coefficients Okay, and so for lambda equals zero, we had that crazy decision boundary and
for large lambdas,
we have a nicer smoother boundary. In fact, I trust that this boundary
with lambda equal ten much more than the one with lambda equals zero. And the decision model of lambda equals
ten looks a lot like that really beautiful one I got with the parabola,
and fit my data really well but here there are tons of more features and
nevertheless, hiding a little bit of regularization help us get that really
nice separating plane that I can trust. We can also look to the coefficient parts,
what happens to a coefficient as we increase the penalty lambda,
so in the beginning, when we have a unregularized problem
this coefficient tends to be large. But as we increase lambda, they tend to
become smaller and smaller and smaller and go to zero. I've used the review data set,
property review data set here and I picked a few words and fit a legislation
model just using those words with different levels of regularization. So for example, the words that have
positive coefficients tend to be associated with positive aspects of
reviews, while the ones with negative coefficients tend to be associated
with negative aspects of reviews. What is the word in quotes that
has the most positive weight? Well, if you look at the key here, you'll
see the word that has the most positive weight is actually
the emoticon smiley face, well, the word that has most
negative weight is another another emoticon the sad face. And the beginning all these words
have pretty large coefficients except the words near zero,
which are words like this and review, which are not associated with
either positive things or negative things, although if the word review shows
up it's slightly correlated with a negative review but in general, this
coefficients much more than the others. And as I increase
the regularization lambda, you see the coefficients can
become smaller and smaller, and if I were to keep drawing this,
they will eventually go to zero. And now, if I were to use cross
validation to pick the best lambda, I'll get a result kind of around here,
and I'm going to call that lambda star. And so that's what you do is cross
validation, to find that point where it's fitting data pretty well,
but it's not over-fitting it too much. And as a last point, I'm going to show
you something that is pretty exciting. It's really beautiful about
regularization with regression. Regularization doesn't only address
the crazy wiggly decision boundaries but addressing with those
over-confidence problems that we saw with over-fitting regularization. So I'm taking the same coefficient,
the same thing that I've learned. The lambda is increasing,
the range of coefficients is decreasing, they're getting smaller but
I'm talking the bottom here. The actual decision boundaries
that we learned and the notion of uncertainty on their data. So if lambda is equal to zero we have
this highly over confident predictions. If lambda is a one, not only do I get
a more natural kind of parabola like decision boundary even though
I'm using Degree 20 features, polynomial degree 20 is features. I get a very natural certainty region. So the region why I don't
know if it's positive or negative is really those points in
the boundary which kind of between those clusters of positive points and
the clusters of negative points. And you get this kind of
beautiful smooth transition. So by introducing regularization,
we've now addressed those two fundamental problems where over-fitting
comes in in logistic aggression. [MUSIC]