[MUSIC] >> Let's now explore in more depth that
learning task of learning decision tree. So just like before we're giving
a set of training data, and we extract some features from that. And then we're going to take this
decision tree model, T of x, and use it to predict y hat. So let's explore the T of x in
a little bit more detail and discuss what the learning
process might look like here. In particular we're given a data set here,
which is a table where each row corresponds to a data point,
and we have our features here, h1 of x, h2 of x, h3 of x. So credit, term, and income, so in this
example it's only going to be three features, and then when try to predict a,
what we call, loan status. Which is this variable y, which is
whether the loan is risky or safe. And our goal is to learn the decision tree
that can help us make that prediction. Given one or more of these input features. More generally we're going to be given N
observations from our data of the form xi, yi where yi is the true label. And, we're trying to optimize some kind
of a quality metric that tries to fit a tree that is as good as possible
with respect to that metric. And, in the case of the decision
trees the metrics we're going to use Is just straight up classification error. There are other metrics that you may
use for decision trees, but just for this module we just want to focus
straight up on classification error. And classification error measures
the fraction of mistakes that we make when we try to make prediction
of the trailing data. So that's the number of incorrect
predictions divided by the total number of examples, just like we saw. In the logistic regression case. The best possible value here is 0,
no mistakes, and the worst possible value here is 1. I made as many mistakes
as there are data points. So hopefully you don't have error 1,
hopefully you have error closer to 0. So now that we've seen
classification error, we can state more clearly
where our learning goal is. Given the dataset, find the tree that
minimizes the classification error. The question is, how hard is this problem? How hard is this learning task? And it turns out that this is
an extremely hard learning task. There's exponentially many combination
of decision trees out there. Even if you consider just
one branch of decision tree. Selecting what features to come next in
an order can be a really hard problem and lead to explanation in many combinations. And in fact, for those familiar with this, this is an
example what's called an NP hard problem. So, it's a really hard problem and
there are no approximation algorithms for this problem with guarantees, but
there is some simple ideas of heuristics that tend to perform
extremely well in practice. And we're going to talk
about one such idea today, Wwhich is called the greedy,
simple greedy algorithm, basically. And we're going to incrementally
build a tree, one layer at a time, in order to get the best possible
classification error at each step. >> [MUSIC]