[MUSIC] We've now seen classifiers like logistic regression, and we've also seen decision trees, which are great ways to build classifiers that fit data. Now in this module we're going to talk about something called boosting, which is an amazing kind of trick, they can do machine learning, and it applies to any classifier. But it takes its quality or how well it fits data or its error and boosts it and make it better and better by combining multiple classifiers. The approach that we're going to discuss today has had amazing impact in the machine learning world. It's pretty simple. There's a little bit of math but not too bad. Easy to implement and it's going to make a huge difference when you go into the real world data. The way we can think intuitively about boosting is that we can start to fork out weak classifiers, so these are the things like a simple logistic regressor, a shallow decision tree or maybe even a decision stump. And the question is what can we say about these simple classifiers or weak classifiers? In general, they have some good properties. They have low variance so they tend not to overfit which is awesome. However, they tend to have high bias, so they don't fit our data really well. So, decision stuff is not likely to give you really great accuracy. And so if you look at the learning curves associated with such models, let's say a logistic regression model. If you start from a very simple weak classifier, you don't have very good fit to the data, so high training error. But the training error decreases with more and more and more features. However, as we know, the true error decreases, and then increases as you start to over fit. And our goal here is to find kind of this optimal tradeoff between bias and variance. Now we know the weak classifiers are great because they have low bias but we need something that's a little stronger in order to get good quality, low test error. And the fundamental question is, how do we that? How do we go from a weak classifier to something that has lower error? One approach is to add more features. So for example, if you're using polynomial features in logistic regression, we can add second order polynomials, third order polynomials, fourth order polynomials, and so on, try to avoid overfitting. And decision trees, we can go deeper and deeper in the decision tree. But the fundamental question here in boosting is, is there something else that we can do that helps us improve our classifiers, that takes this weak classifier and makes it a stronger classifier? This idea of boosting starts from a beautiful mathematical conjecture, or question, that Kearns and Valiant posed in 1998. And the question was, can we take weak classifiers, let's say decision stumps, and boost them, combine them together, multivote them in some kind of voting schema in order to get a stronger classify error of that? And amazingly, Rob Schapire and others just a year or two later came up with an algorithm called boosting that really showed that this was possible. And this algorithm really changed all of machine learning. In fact, it's had an amazing impact on the whole field. It's a default approach for many computer vision tasks, for many systems that are deployed in industry today. For example, things that figure out what search results you do when you search for something on a search engine, or what ads to show you when you visit a page or search something. So boosting has had tremendous impact. It's the technique that wins a lot of those machine learning competitions you might see out there. So there's a company called Kaggle that does a bunch of those competitions. Boosting wins more than half of those competitions. So it's a really exciting, really useful approach for doing machine learning. [MUSIC]