1 00:00:00,000 --> 00:00:04,837 [MUSIC] 2 00:00:04,837 --> 00:00:07,969 In this module we address a very fundamental concept, 3 00:00:07,969 --> 00:00:09,950 a concept of having missing data. 4 00:00:09,950 --> 00:00:14,935 And missing data can impact us both in the training time And a prediction time. 5 00:00:14,935 --> 00:00:19,635 For both cases, we explored fundamental ideas that are useful for 6 00:00:19,635 --> 00:00:22,475 a wide range of algorithms, not just decision trees. 7 00:00:22,475 --> 00:00:27,205 We explore the idea of just skipping data points, which has its benefits and 8 00:00:27,205 --> 00:00:32,920 pitfalls, the idea of trying to impute or guess what those missing values are. 9 00:00:32,920 --> 00:00:35,991 And the idea of modifying the actual learning algorithm, 10 00:00:35,991 --> 00:00:41,060 in particular with decision trees, in order to better deal with missing data. 11 00:00:41,060 --> 00:00:46,480 Now, in practice, you will often see missing data and 12 00:00:46,480 --> 00:00:49,030 you should be always on the lookout for missing data. 13 00:00:49,030 --> 00:00:55,410 And sometimes our data comes in, in a way, value is not just explicitly missing. 14 00:00:55,410 --> 00:00:59,120 So for example, sometimes people put in zero, when it's unknown. 15 00:00:59,120 --> 00:01:00,330 And you might think it's zero. 16 00:01:00,330 --> 00:01:01,420 But it's really unknown. 17 00:01:01,420 --> 00:01:03,980 So you should always be on the lookout for missing data. 18 00:01:03,980 --> 00:01:06,180 And you should always take it very carefully, 19 00:01:06,180 --> 00:01:09,670 because it really impact the answers of your algorithm. 20 00:01:09,670 --> 00:01:12,260 Today we've seen some basic approach dealing with that. 21 00:01:12,260 --> 00:01:15,340 Of course, there are more advanced ones that you can get into. 22 00:01:15,340 --> 00:01:18,270 But this is a fundamental area we should always be on the lookout for. 23 00:01:19,310 --> 00:01:24,009 And let me close, again, by thanking my colleague, Krishna Sridhar, 24 00:01:24,009 --> 00:01:28,246 who's really been instrumental in the creation of the slides and 25 00:01:28,246 --> 00:01:31,421 helping with the overall vision of this module.