[MUSIC] So we've said to assess the performance of our model, we really need to have a test data set carved out from our full data set. So, this raises the question of, how do I think about dividing the data set into training data versus test data? So, in pictures, how many points do I put in this blue space here, this training set, versus this pink space this test set? Well, if I put too few points in my training set, then I'm not going to estimate my model well. And so, I'm going to have clearly bad predictor performance because of that. But on the other hand if I put to few points in my test set, that's gonna be a bad approximation to generalization error. Because it's not gonna represent a wide enough range of things I might see out there in the world. So there's no perfect formula for how to split a data set into training versus test. But a general rule of thumb if you can figure out how to do this is typically you want just enough points in your test set to approximate generalization error well. And you want all your points in your training data set. Because you want to have as many points in your training data set to learn a good model. Especially when your looking at very complex models. But you still, like we've said before, wanna have enough points in your test set to analyze the performance of the fitted model. Okay, well this is assuming that you have enough data to do this type of split, so that you can leave enough points both in the training and test sets. But if that isn't the case, there are other methods that we're gonna talk about in this course, like cross validation. [MUSIC]