1 00:00:00,006 --> 00:00:04,768 [MUSIC] 2 00:00:04,768 --> 00:00:08,320 So we've now built a simple scatter plot and 3 00:00:08,320 --> 00:00:12,120 we get an idea of what our data looks like. 4 00:00:12,120 --> 00:00:13,150 And the question is, 5 00:00:13,150 --> 00:00:19,240 can we use the scatter plot to predict sales price from square feet of living? 6 00:00:19,240 --> 00:00:20,770 So we're gonna do a simple regression model. 7 00:00:22,040 --> 00:00:27,558 So let's do that, and we hit SM, and 8 00:00:27,558 --> 00:00:32,721 we're gonna say #Create a simple 9 00:00:32,721 --> 00:00:39,500 regression model of sqft_living to price. 10 00:00:42,490 --> 00:00:44,720 All right, so let's do just that. 11 00:00:46,450 --> 00:00:50,190 So remember from the lectures that the first thing that you do before you do 12 00:00:50,190 --> 00:00:54,460 anything to your data is to split it into a training set and a test set, 13 00:00:54,460 --> 00:00:57,720 because you never want to do, trying training or 14 00:00:57,720 --> 00:01:01,120 learning on the test data, you want to do that just on the training data. 15 00:01:01,120 --> 00:01:06,696 So let's do that, split, so I'm gonna take my data and 16 00:01:06,696 --> 00:01:11,918 split it into train_data and test_data by calling 17 00:01:11,918 --> 00:01:18,088 a function that's called, that you can apply to an so 18 00:01:18,088 --> 00:01:22,285 it's called the random split function. 19 00:01:22,285 --> 00:01:24,454 Sales.random_split. 20 00:01:25,800 --> 00:01:29,660 By the way, what I did there was just use tab complete. 21 00:01:29,660 --> 00:01:32,420 So let me just show you that little trick, just for a second. 22 00:01:32,420 --> 00:01:35,770 So if I just do sales.r and I press Tab, 23 00:01:35,770 --> 00:01:37,108 you'll see there's a few things I could do. 24 00:01:37,108 --> 00:01:42,450 random_split, read_csv, remove_columns, rename_columns and so on. 25 00:01:42,450 --> 00:01:44,530 So I'm gonna just do random_split. 26 00:01:44,530 --> 00:01:49,250 And what I'm gonna tell it is to do a .8 split. 27 00:01:49,250 --> 00:01:55,630 So .8 of the data, 80% is gonna be for training and 20% is gonna be for testing. 28 00:01:56,930 --> 00:02:01,840 Now, one last thing that will be useful for your homeworks and in general, 29 00:02:01,840 --> 00:02:07,000 to make sure that you have always the same results is to set a seed for the split. 30 00:02:07,000 --> 00:02:10,900 Because a random split, it's a pseudo random number generator. 31 00:02:10,900 --> 00:02:15,290 We can set a seed to it, and we just gonna set the seed to be any number we want. 32 00:02:15,290 --> 00:02:18,820 For example, it could be 2015, 33 00:02:18,820 --> 00:02:23,500 it could be many things, so I'm just gonna set it to 0. 34 00:02:23,500 --> 00:02:28,648 So now every time I do this random split, it splits the data in the same way. 35 00:02:30,366 --> 00:02:34,269 [MUSIC]