[MUSIC] So we've now built
a simple scatter plot and we get an idea of what
our data looks like. And the question is, can we use the scatter plot to predict
sales price from square feet of living? So we're gonna do a simple
regression model. So let's do that, and we hit SM, and we're gonna say #Create a simple regression model of sqft_living to price. All right, so let's do just that. So remember from the lectures that
the first thing that you do before you do anything to your data is to split it
into a training set and a test set, because you never want to do,
trying training or learning on the test data, you want
to do that just on the training data. So let's do that, split, so
I'm gonna take my data and split it into train_data and
test_data by calling a function that's called,
that you can apply to an so it's called the random split function. Sales.random_split. By the way, what I did there
was just use tab complete. So let me just show you that little trick,
just for a second. So if I just do sales.r and I press Tab, you'll see there's a few
things I could do. random_split, read_csv, remove_columns,
rename_columns and so on. So I'm gonna just do random_split. And what I'm gonna tell
it is to do a .8 split. So .8 of the data, 80% is gonna be for
training and 20% is gonna be for testing. Now, one last thing that will be useful
for your homeworks and in general, to make sure that you have always the same
results is to set a seed for the split. Because a random split,
it's a pseudo random number generator. We can set a seed to it, and we just gonna
set the seed to be any number we want. For example, it could be 2015, it could be many things, so
I'm just gonna set it to 0. So now every time I do this random split,
it splits the data in the same way. [MUSIC]