1 00:00:00,000 --> 00:00:04,599 [MUSIC] 2 00:00:04,599 --> 00:00:08,385 We've now seen the basics of decision trees, which are an amazing type of 3 00:00:08,385 --> 00:00:12,186 classifier than can be used for when the range of different types of data. 4 00:00:12,186 --> 00:00:15,274 However, decision trees are highly prone to overfitting. 5 00:00:15,274 --> 00:00:19,946 And so, let's dig in a little bit in this module on how we can avoid overfitting in 6 00:00:19,946 --> 00:00:21,819 the context of decision trees. 7 00:00:23,360 --> 00:00:28,020 And as a reminder, we're going to continue to use our loan application evaluation 8 00:00:28,020 --> 00:00:32,590 system as a running example where loan data will come in and 9 00:00:32,590 --> 00:00:38,120 we'll be able to predict whether that's a safe loan or a risky loan application. 10 00:00:38,120 --> 00:00:39,940 And so, that's the decision making we're trying to make. 11 00:00:41,340 --> 00:00:44,600 And from that loan application, we're going to learn the decision tree 12 00:00:44,600 --> 00:00:46,520 that allows us to traverse down the tree and 13 00:00:46,520 --> 00:00:52,220 make a prediction as to whether a particular loan is risky or safe. 14 00:00:52,220 --> 00:00:54,630 And so, the input is going to be xi and 15 00:00:54,630 --> 00:00:58,790 the output is going to be this y hat i that we're going to predict from data. 16 00:01:00,580 --> 00:01:04,610 Let's first, spend a quick minute reviewing overfitting and then dig in as 17 00:01:04,610 --> 00:01:08,150 to how it happens in decision trees, which hint, hint is going to be really bad. 18 00:01:10,360 --> 00:01:15,608 As we all recall, overfitting is the fact that separates the training error 19 00:01:15,608 --> 00:01:21,102 with then goes down to zero as we make our models more and more complex and the true 20 00:01:21,102 --> 00:01:26,854 error, which goes down with the complexity of the model, but then spikes backup. 21 00:01:26,854 --> 00:01:33,121 And so more specifically, overfitting happens when we end up model w hat, 22 00:01:33,121 --> 00:01:37,434 which has low training_error, but high true_error. 23 00:01:37,434 --> 00:01:41,238 But there was some other model or model parameters, w*, 24 00:01:41,238 --> 00:01:46,457 which had maybe higher training_error, but definitely lower true_error. 25 00:01:46,457 --> 00:01:49,290 And so, that's the overfitting problem. 26 00:01:49,290 --> 00:01:55,580 And when somehow, pick a model that's less complex to avoid that kind of overfitting. 27 00:01:55,580 --> 00:02:00,659 So we saw this effect in logistic regression quite pronouncedly where as we 28 00:02:00,659 --> 00:02:06,141 increase the degree of the polynomial, we got this crazier and crazier decision 29 00:02:06,141 --> 00:02:11,398 boundaries where we saw overfitting, which was bad overfitting over here. 30 00:02:11,398 --> 00:02:16,781 But overfitting for polynomials of degree six and then polynomials 31 00:02:16,781 --> 00:02:22,362 of degree 20 here for features, this is a technical term that I use. 32 00:02:22,362 --> 00:02:28,836 I think I use crazy decision boundary, but let's call it crazy overfitting. 33 00:02:28,836 --> 00:02:31,402 So really, bad stuff. 34 00:02:31,402 --> 00:02:34,500 And so, we're trying to avoid overly complex models. 35 00:02:34,500 --> 00:02:36,803 And as well see with decision trees, 36 00:02:36,803 --> 00:02:39,640 models can get overly complex very quickly. 37 00:02:39,640 --> 00:02:44,819 [MUSIC]