[MUSIC] Now let's take that third example we've
used to illustrate different machine learning algorithms in this module and
explore it in the context of AdaBoost. And it's going to give us a lot of insight
as to how boosting works in practice. So for the first classifier f1, we work directly off the original data,
all points have the same weight. That's right. And so the learning process we have
is going to be standard learning. So nothing changes in
your learning algorithm since every data point
has the same weight. And in this case we're learning a decision
stump and so here is the decision boundary that does it's best to try to separate
positive examples from negative examples. It splits right around 0. It's actually minus 0.07, if you remember
from the decision tree classifier. So this is the first decision stump, f1. Now to learn the second decision stump, f2, we have to reweigh our data based
on how much f1 did, so how well f1 did. So we're going to look at
our decision boundary and we're going to weigh data points
that were mistakes higher. And here in the picture I'm denoting them
by bigger minus signs and plus signs. So if you look at the data points here
on the left, they were mistakes or the minus on this side, and
this plus is over here. They were also made mistakes,
so we increased our weight and we decreased the weight of everybody
else and we see that the pluses here became here bigger and
the minuses in this region became larger. So that's how we update our weight. And now let's look at the next step. Learning the classifier f2 in the second
Iteration based on this weighted data. Using the weighted data,
we'll learn the following decision stump. And you see that now we're
still having a vertical split, we have a horizontal split, and
it's a better split for weighted data. Split for these weights on the left and it's kind of cool. So in the first iteration
we decided to split on x 1. In the second one we split on x2, and this is x2 greater than or
less than 1.3 or so. And you'll see that it gets all the
minuses correct on top but it makes some mistakes on the minuses in the bottom, but
it gets the pluses correct in the bottom. So as opposed to the vertical split here,
we now have a horizontal split. So now we've learned there
are decision stems f1 and f2, and the question here is
how do we combine them? So if you go through with the AdaBoost
formula you'll see that the w hat 1 the weight of the first decision
stop is going to be 0.61, then w hat 2 is going to 0.53. So we trust the first decision stamp
a little bit more than we trust the second one which makes sense. The second one doesn't seem as good,
but when you add them together, you start getting a very
interesting decision boundary. So you get the points in the top
left here are ones where we definitely think that y hear is minus 1,
so definitely negatives. On the bottom right here, it's some
definite positives y hat equals plus 1. And then for the other two regions, we can think about these as being
regions of higher uncertainty. So these are uncertain which
right now makes sense, but as you add more decision stumps
we're going to be more sure that some of the points on the left tier bottom
are negative and right top are negative. Now, if you keep our numbers going for
30 iterations the first thing that we notice is that we get all the data
points right, so our training error is 0. The second thing you'll notice, and
here I'm going to use a technical term for this, is that the decision
boundary is crazy. This is our technical term, and then if
you combine these two insides We figure out, okay we don't really
trust this classifier, we're probably over-fitting the data. So it fits perfectly on the train later, it maybe doesn't do as
well with a little error. So overfitting something that
will happen in boasting, we'll talk about a little bit next. So let's take a deep breath and
summarize what we've done so far. We described simple classifiers, and we
said that we're going to learn the simple classifiers and take the volt
between them to make predictions. And then we described
this AdaBoost algorithm, which is a pretty simple
approach to learning a non-simple classifier using
this technique of boosting where you're boosting up the weight of
data points when we're making mistakes. And simple to implement from practice. [MUSIC]