[MUSIC] We've now seen classifiers like logistic
regression, and we've also seen decision trees, which are great ways to
build classifiers that fit data. Now in this module we're going to
talk about something called boosting, which is an amazing kind of trick,
they can do machine learning, and it applies to any classifier. But it takes its quality or how well it
fits data or its error and boosts it and make it better and
better by combining multiple classifiers. The approach that we're going to
discuss today has had amazing impact in the machine learning world. It's pretty simple. There's a little bit of math but
not too bad. Easy to implement and it's going to make a huge difference
when you go into the real world data. The way we can think intuitively about
boosting is that we can start to fork out weak classifiers, so these are the things
like a simple logistic regressor, a shallow decision tree or
maybe even a decision stump. And the question is what can we say
about these simple classifiers or weak classifiers? In general,
they have some good properties. They have low variance so
they tend not to overfit which is awesome. However, they tend to have high bias,
so they don't fit our data really well. So, decision stuff is not likely
to give you really great accuracy. And so if you look at the learning
curves associated with such models, let's say a logistic regression model. If you start from a very
simple weak classifier, you don't have very good fit to the data,
so high training error. But the training error decreases with
more and more and more features. However, as we know,
the true error decreases, and then increases as you start to over fit. And our goal here is to find kind of
this optimal tradeoff between bias and variance. Now we know the weak classifiers are great
because they have low bias but we need something that's a little stronger in
order to get good quality, low test error. And the fundamental question is,
how do we that? How do we go from a weak classifier
to something that has lower error? One approach is to add more features. So for example, if you're using polynomial
features in logistic regression, we can add second order polynomials,
third order polynomials, fourth order polynomials, and so
on, try to avoid overfitting. And decision trees, we can go deeper and
deeper in the decision tree. But the fundamental question
here in boosting is, is there something else that we can do
that helps us improve our classifiers, that takes this weak classifier and
makes it a stronger classifier? This idea of boosting starts from
a beautiful mathematical conjecture, or question, that Kearns and
Valiant posed in 1998. And the question was,
can we take weak classifiers, let's say decision stumps, and boost them,
combine them together, multivote them in some kind of voting schema in order to
get a stronger classify error of that? And amazingly, Rob Schapire and
others just a year or two later came up with an algorithm called boosting
that really showed that this was possible. And this algorithm really
changed all of machine learning. In fact, it's had an amazing
impact on the whole field. It's a default approach for
many computer vision tasks, for many systems that are deployed
in industry today. For example, things that figure out what
search results you do when you search for something on a search engine, or what ads to show you when you
visit a page or search something. So boosting has had tremendous impact. It's the technique that wins
a lot of those machine learning competitions you might see out there. So there's a company called Kaggle that
does a bunch of those competitions. Boosting wins more than
half of those competitions. So it's a really exciting, really useful
approach for doing machine learning. [MUSIC]