[MUSIC] Now let's take that third example we've used to illustrate different machine learning algorithms in this module and explore it in the context of AdaBoost. And it's going to give us a lot of insight as to how boosting works in practice. So for the first classifier f1, we work directly off the original data, all points have the same weight. That's right. And so the learning process we have is going to be standard learning. So nothing changes in your learning algorithm since every data point has the same weight. And in this case we're learning a decision stump and so here is the decision boundary that does it's best to try to separate positive examples from negative examples. It splits right around 0. It's actually minus 0.07, if you remember from the decision tree classifier. So this is the first decision stump, f1. Now to learn the second decision stump, f2, we have to reweigh our data based on how much f1 did, so how well f1 did. So we're going to look at our decision boundary and we're going to weigh data points that were mistakes higher. And here in the picture I'm denoting them by bigger minus signs and plus signs. So if you look at the data points here on the left, they were mistakes or the minus on this side, and this plus is over here. They were also made mistakes, so we increased our weight and we decreased the weight of everybody else and we see that the pluses here became here bigger and the minuses in this region became larger. So that's how we update our weight. And now let's look at the next step. Learning the classifier f2 in the second Iteration based on this weighted data. Using the weighted data, we'll learn the following decision stump. And you see that now we're still having a vertical split, we have a horizontal split, and it's a better split for weighted data. Split for these weights on the left and it's kind of cool. So in the first iteration we decided to split on x 1. In the second one we split on x2, and this is x2 greater than or less than 1.3 or so. And you'll see that it gets all the minuses correct on top but it makes some mistakes on the minuses in the bottom, but it gets the pluses correct in the bottom. So as opposed to the vertical split here, we now have a horizontal split. So now we've learned there are decision stems f1 and f2, and the question here is how do we combine them? So if you go through with the AdaBoost formula you'll see that the w hat 1 the weight of the first decision stop is going to be 0.61, then w hat 2 is going to 0.53. So we trust the first decision stamp a little bit more than we trust the second one which makes sense. The second one doesn't seem as good, but when you add them together, you start getting a very interesting decision boundary. So you get the points in the top left here are ones where we definitely think that y hear is minus 1, so definitely negatives. On the bottom right here, it's some definite positives y hat equals plus 1. And then for the other two regions, we can think about these as being regions of higher uncertainty. So these are uncertain which right now makes sense, but as you add more decision stumps we're going to be more sure that some of the points on the left tier bottom are negative and right top are negative. Now, if you keep our numbers going for 30 iterations the first thing that we notice is that we get all the data points right, so our training error is 0. The second thing you'll notice, and here I'm going to use a technical term for this, is that the decision boundary is crazy. This is our technical term, and then if you combine these two insides We figure out, okay we don't really trust this classifier, we're probably over-fitting the data. So it fits perfectly on the train later, it maybe doesn't do as well with a little error. So overfitting something that will happen in boasting, we'll talk about a little bit next. So let's take a deep breath and summarize what we've done so far. We described simple classifiers, and we said that we're going to learn the simple classifiers and take the volt between them to make predictions. And then we described this AdaBoost algorithm, which is a pretty simple approach to learning a non-simple classifier using this technique of boosting where you're boosting up the weight of data points when we're making mistakes. And simple to implement from practice. [MUSIC]