[MUSIC] In this module, we talked about how to do regression part. We talked about how to use it to predict house prices. Now, we're going to build together and pricing notebook using Python to predict house prices for a real dataset, based on what's called King County data. King County is the county or the region where the city of Seattle, where Emmy and I live, is located. So, we're going to take some of the data, it's public record data, and actually build together a regression notebook to predict house prices. So let's get started. Okay, so here's a blank IPython notebook. And just to start, what I'm going to do is hide the, first let's change the title, so we're going to change the title to predicting house prices. So we just renamed it, and I like to the view menu and hide the header, and the toolbar, this part on the bottom, so we have more space on the slide. So we done the hiding. And the first thing we're going to do is fire up Graph Lab Create, the tool that we're going to use to run some algorithms in Python. So I'm going to type ask M, and I'm going to say fire up graphlab create. So we do that by typing import graphlab. And that just starts up graphlab create. Now our task today is to predict house prices. So the first thing that we are going to do is to load some house sales data. So this is public data, public record, of house that got sold in the Seattle region. So, I'm going to call this table data sales, and I'm going to say graphlab .SFrame. Remember we talked about SFrame as the data structure for representing tabular data in graphlab create? So it's a really fast hour of core data structure, and we're going to load up some house things in there. So this is going to be called home underscore data, and notice that data is now complete for you, so that just happened then. So if I just type while this is loading up and it's firing up GraphLab Create, if I just type sales here, you'll see what that data looks like. So I'm going to scroll up a little bit to the top, I just type sales and says, there was an ID for a date with the sale, the price, number of bedrooms, number of bathrooms, square feet, which is kind of like that American version of square meters if you're living in other countries, for the house, feet for the lot of land, number of floors, in a bunch of the categories. Whether the house has a view or not, whether it sits on a grade, which means it's on a hill, and a bunch of other measurements. We've loaded this house data and it looks pretty cool. The first thing that we're going to do is use graph lab canvas and do a little bit of visualization. So what I'm going to do is again, create the cell. And this says exploring the data for housing. So housing sales. So we're going to do some data exploration. So we're going to take the sales data and I'm going to show, so when I type .show, it's going to show some visualization of that data, and in particular, what I'm going to do is view, just rather than letting graphlab do this view, we're going to just do a scatter plot, so we'll see what a scatter plot is in a second. We'll just type new scatter plot that relates two variables. On the X axis, we're going to have the square feet of living space. And in the y axis, we're going to put the price. So, what that should show us, is the relationship to where square feet of living space and price. Now, one little trick that I like to do when I create notebooks is that sometimes you push out graph lab canvas in a new tub, but also it's kind of fun to just plot those scatter plots and simple plots inside the notebook itself, so it can print it off and hand it off to somebody. So the way to do that is I can just tell graph lab canvas to set its target, not to be the browser which is the default target, but to be the ipython notebook. I just type canvas.set_target('ipynb'), ipython notebook, and it's going to plot this scatter plot on the notebook itself. So if I hit enter here, what's going to do is take those two axes and just plot them together. So here we go. On the X axis is the square feet of the house, and the Y axis is the price. So let's kind of browse this a little bit. So for example, the more square feet they're big houses, so if I mouse over here you'll see that this big house had 5,990 square feet, which is pretty big, that's like 600 square meters, and it was sold for $2.2 million, which is quite a lot of money. Now, you also notice there's a nice relationship, that the bigger houses tend to cost more. There's a big blob of houses here, so most houses are between 1,000 and 3,000 square feet. And even at this level, see this house over here, it's an outlier. So even though it's only 1,910 square feet, it was sold for $1.5 million. Well, down here there's a house that is similar 1,700 square feet, that sold for just $149,000. This is a big discrepancy. And here's the biggest outlier of the dataset. This house has 3,730 square feet and got sold for $2.5 million. [MUSIC]