[MUSIC] We've done now some simple regression, some simple regression for our data just using square foot of living. But if you remember, our data set had many other columns associated with that, or features. So what we're gonna do next is explore other features in the data. So let's do some exploration of other possible features we might use. So I'm going to use a set of features. So let's create a list of the features I'm going to explore, the code features. And these features are going to be the number of bedrooms, the number of bathrooms. Let's see, what else? The square foot of living space, which is what we've been exploring so far. In addition to this, I'm going to include the square foot of the lot, so this is how much land the house has all around it. The number of floors in the house. And finally, I'm gonna include a variable called the ZIP code. And so the ZIP code in the United States is what other countries call the postal code. In Brazil we call this SAPI. But there are many names in different places, that's what we'll include. So, let's see, if I take the sales data and I just select this column, just select the features column here, why don't we call them just to be totally clear, instead of calling them features I'm going to call them my_features. Shift+Enter. And then I'm gonna look at what my_features columns look like. So I'm gonna type .show. Remember, we can type .show anything with GraphLab Create. On the sframe for sales, selecting the my_features. And now we're gonna have a visualization of these features. So let me just walk you through this visualization. Get my mouse here. So bedrooms, let's look at frequency. There are 13 different unique types. In fact, there's some houses with ten bedrooms. Most houses have three bedrooms. Some have four, some have two, some have five, and very few have more. With bathrooms, it turns out that house in the US, you have fractional number of bathrooms. When you say, for example, 2.5 bathrooms is the most common number, it's because if you have a house with a bathroom with a bath in it, it's called a full bath, it counts for 1. But if you have just a bathroom that has a sink and a toilet, it's just worth 0.5. And so here you have 2.5. In fact, if you have a bathroom with a sink, a toilet, and a shower, but no tub, that's worth 0.75 in the US. So there you go. That's where you can have one bathroom is the second most common, and then you have this 1.75 bathroom, which is probably going to be one full bath with a bath and a bathroom with a shower. And you can see the distribution here. Similarly for other things, like square foot of living, and number of floors, most houses have one floor, some have two. And then the ZIP code. The most common zipcode is 98103, which is an area where a lot of people live in Seattle. Okay, so we've seen a high level visualization of the different columns of the data. Now let's look at some other relationships of the data. Let's do some fun visualizations here. So I'm gonna take the sales table and I'm gonna type show. But the view that I'm gonna do, it's not gonna be scatterplot. But it's gonna be what is called a box whisker plot, and this box whisker plot is gonna relate two variables that we looked at. On the x side, I'm gonna use the ZIP code, so this is the postal code that we discussed, and on the y-axis, I'm going to plot the price. So what we're gonna see is the relationship between the location, the ZIP code, where the house is, and the price. And we're gonna see that with what's called a box whisker plot. So I'm gonna press Shift+Enter and we're gonna plot it. And here's what we'll see. So you will see, for example, this area's zip code, post code, 98003, has a significantly lower price. So the average price is low, that's the red line, and not a lot of variability. While this other zip code 98004, so this is 003, 98004, has a highest average price, much higher, 1 point something million, 1.1 million, and a huge variability. So the houses range from something like, what is this? $800,000, it's almost $4 million. But here I'm just showing a few ZIP codes. If I drag down here, you'll see more and more ZIP codes. And wow, what is this one over here? There's one that's astronomical, it goes out of the scale. Some houses here cost like, what is this? $7 million or something. And this post code is 98039. Remember 98039, we'll come back to it at the end of this notebook. It's kind of funny. [MUSIC]