[MUSIC] In this module we'll cover a couple basic strategies for dealing with missing data, and then we'll cover a modification in the decision tree algorithm. We're actually going to be able to deal missing data in a much smarter way. Now the basic most common of dealing with missing data is what's called purification. So I'm just going to throw out missing data. So I start with a data set where for some data points, some of the feature values are missing. And somehow when I skip some of these features, some of these data points, I end up with a data set and output here h(x) where nothing is missing. Everything's observed. So skipping data and purifications with skipping data is the most obvious thing that you might want to do. So if I have nine data points over here, and three of them have missing data, so these are the three rows here. Then, I could just say, okay, there's now only three missing, not too bad. I'm just going to skip them. And so, I'm going to take my 9 and decrease it to just 6 and call that my data set. And if you only have a few missing values, maybe this is an okay thing to do. Skipping data points with missing values, however, can be a problematic idea. So, for example, in this case, you have the feature term missing in a lot of different data points. In fact, six out of nine data points the term feature is missing. So if I were just to skip those, I'll go from a data set with nine features, so if nine data points, to a data set with only three data points. So it'll become much, much smaller. And that's really bad because term here is missing on more than 50% of the data. So if you look, we go down from 9 to a much, much smaller value. And that can happen and makes your training much worse because much less data here. And so, in this cases, if you just have one feature which has lots of missing values, one another simple approach is to say instead of skipping data points you could skip features, and now instead of having fewer data points you just have fewer features. [BLANK AUDIO] So that's a reasonable alternative in this case. So there are two basic kinds of skipping that you might want to do when you have missing data. You can either skip data points that have missing data or skip features that have missing data. And somehow you have to make a decision of whether to skip a data point, skip features, or skip some data points and some features and that's a kind of complicated decision to make. In general this idea of skipping is good because it's easy, it kind of just takes your data set and simplifies it a bunch. It can be applied to any algorithm because you just simplify the data and you just feed it to any algorithm, but it has some challenges. Now removing data, removing features is always a kind of painful thing, data is important. You don't want to do that and its often unclear if you should remove features, to remove data points, what impact it will have on your answer if you do. Most fundamentally even if you really skip too much at training time at prediction time if you had a question mark what do you do? This approach does not address missing data at prediction time. And so people do this approach all the time. And I'm okay with it if you just have kind of one case here or case there. But it's a pretty dangerous approach to take. I don't fully recommend skipping as an approach to dealing missing data. [MUSIC]