1 00:00:00,241 --> 00:00:04,257 [MUSIC] 2 00:00:04,257 --> 00:00:07,776 So let's go through some of the regression fundamentals, what's our data, 3 00:00:07,776 --> 00:00:12,070 what's the model that we are going to use, and what's our task event interest. 4 00:00:12,070 --> 00:00:16,810 Okay, so the first thing is we're going to take all of our data, so 5 00:00:16,810 --> 00:00:19,900 all these houses that we looked at that sold recently, and 6 00:00:19,900 --> 00:00:22,520 for each one of them we're going to record some information. 7 00:00:23,730 --> 00:00:27,340 And in this case, in the case of simple regression where we're assuming that 8 00:00:27,340 --> 00:00:31,280 there's just one variable that we're using to predict our house price, 9 00:00:31,280 --> 00:00:35,570 specifically square feet, for every house we're going to record how many square feet 10 00:00:35,570 --> 00:00:39,680 that the house had and what the price was that that house sold for. 11 00:00:39,680 --> 00:00:44,280 And so we record this for each of our houses that have sold in the past. 12 00:00:44,280 --> 00:00:48,910 And this variable x represents the input to our model, okay? 13 00:00:48,910 --> 00:00:51,020 This is what we're going to use for our prediction. 14 00:00:51,020 --> 00:00:52,920 And what's the output, what are we trying to predict? 15 00:00:52,920 --> 00:00:57,000 Well, we're trying to predict the price of the house. 16 00:00:57,000 --> 00:01:00,670 So that y variable is going to be the output. 17 00:01:00,670 --> 00:01:04,330 So what's the difference between the input and the output? 18 00:01:04,330 --> 00:01:08,400 Well the output y is our quantity of interest, this is our goal, 19 00:01:08,400 --> 00:01:13,646 we're trying to predict the value of our house so we can list it for sale. 20 00:01:13,646 --> 00:01:19,190 And we're going to assume that we can predict y based on x, 21 00:01:19,190 --> 00:01:23,920 so we can predict the value of the house based on the square footage of the house. 22 00:01:25,110 --> 00:01:27,700 Okay. So I'm going to take my data, and 23 00:01:27,700 --> 00:01:29,995 I'm going to plot it x versus y, 24 00:01:29,995 --> 00:01:35,290 where x is the square footage of each house, and y is the sales price. 25 00:01:35,290 --> 00:01:37,640 So that's what this cloud of points represents. 26 00:01:37,640 --> 00:01:42,090 And we made exactly this plot in the first course of this specialization. 27 00:01:42,090 --> 00:01:45,280 So, just to be clear, this circle here represents, 28 00:01:46,520 --> 00:01:50,770 let's see the ith house in my data set. 29 00:01:50,770 --> 00:01:55,220 And that house had some number of 30 00:01:55,220 --> 00:01:59,880 square of feet xi, and some sales price yi. 31 00:02:02,980 --> 00:02:08,480 Okay, so this is my data, and what my model represents 32 00:02:08,480 --> 00:02:12,820 is the expected relationship between x and y, remember that's what we're trying to 33 00:02:12,820 --> 00:02:15,680 figure out, because if we have that relationship we can use it for 34 00:02:15,680 --> 00:02:19,960 predicting the value of my house that I'd like to list for sale. 35 00:02:19,960 --> 00:02:20,670 Okay, so 36 00:02:20,670 --> 00:02:25,440 we're going to assume some relationship which is some functional relationship. 37 00:02:25,440 --> 00:02:30,277 So I'm going to call this some function F of X. 38 00:02:30,277 --> 00:02:34,591 And like said, what that function represents is 39 00:02:34,591 --> 00:02:42,140 the expected relationship 40 00:02:45,892 --> 00:02:51,520 between x and y. 41 00:02:51,520 --> 00:02:53,660 So let's walk through this in a little bit more detail. 42 00:02:54,710 --> 00:02:58,860 So, let's look at this house just for 43 00:02:58,860 --> 00:03:01,400 to make this a little bit cleaner, lets look at another house here. 44 00:03:02,450 --> 00:03:08,780 Now I've reused the letter that I like to use. 45 00:03:08,780 --> 00:03:09,910 So I'm going to call this house. 46 00:03:09,910 --> 00:03:11,570 Sorry, I'm going to re-annotate this. 47 00:03:11,570 --> 00:03:13,710 I'm going to call this house j. 48 00:03:13,710 --> 00:03:16,110 This is xj and yj. 49 00:03:16,110 --> 00:03:20,480 And the reason I'm doing this is because i is a special notation for 50 00:03:20,480 --> 00:03:21,810 the house that I'm interested in. 51 00:03:21,810 --> 00:03:26,510 And so now, this house here that I'm looking at I'm going to say that 52 00:03:26,510 --> 00:03:30,030 it sold for some value, yi. 53 00:03:30,030 --> 00:03:34,680 And based on my model, what my model is saying is that I'm assuming 54 00:03:34,680 --> 00:03:39,700 that yi is approximately equal to F(xi). 55 00:03:39,700 --> 00:03:44,550 This functional relationship between the square footage of house i and 56 00:03:44,550 --> 00:03:46,060 its sales price, yi. 57 00:03:46,060 --> 00:03:51,783 But I'm assuming that my model's not 100% accurate. 58 00:03:51,783 --> 00:03:56,310 You can easily imagine that there are errors in this model, 59 00:03:56,310 --> 00:04:02,000 because you can have two houses that have exactly the same number of square feet, 60 00:04:02,000 --> 00:04:04,630 but sell for very different prices. 61 00:04:04,630 --> 00:04:07,050 They could have sold at different times. 62 00:04:07,050 --> 00:04:11,690 They could have had different numbers of bedrooms, or bathrooms, or 63 00:04:11,690 --> 00:04:18,250 size of the yard, or specific location, neighborhoods, school districts. 64 00:04:18,250 --> 00:04:21,870 Lots of things that we might not have taken into account in our model. 65 00:04:21,870 --> 00:04:27,820 So our model is just what we're using as our belief about the relationship for 66 00:04:27,820 --> 00:04:31,250 prediction, but it's not 100% accurate, there's some error. 67 00:04:31,250 --> 00:04:36,020 So the error, so just to be clear, this point here, 68 00:04:36,020 --> 00:04:40,280 this x is exactly f(xi), 69 00:04:40,280 --> 00:04:45,393 it's the function evaluated at some xi value. 70 00:04:46,520 --> 00:04:51,300 And we're saying that our observations, which don't fall 71 00:04:51,300 --> 00:04:56,190 exactly on this curve, defined by F, there's some error. 72 00:04:57,610 --> 00:05:03,990 So we'll call this error specific to ISI, we're going to call it epsilon I. 73 00:05:05,500 --> 00:05:10,942 So what our regression model's saying, 74 00:05:10,942 --> 00:05:15,140 is that we're assuming that our 75 00:05:15,140 --> 00:05:19,961 observation YI is equal to f(xi), 76 00:05:19,961 --> 00:05:25,248 our expected relationship between x and 77 00:05:25,248 --> 00:05:28,060 y, plus some error. 78 00:05:29,180 --> 00:05:35,895 And, in particular, we're treating this error as a random quantity and 79 00:05:35,895 --> 00:05:41,527 we're going to assume that the expected value of this error, 80 00:05:41,527 --> 00:05:45,235 so this notation is the expected value. 81 00:05:48,654 --> 00:05:52,270 So we're assuming the expected value of this air is equal to zero. 82 00:05:52,270 --> 00:05:54,400 And what is expected value? 83 00:05:54,400 --> 00:06:02,120 Well, it's just a weighted average over all possible values that air can take, 84 00:06:02,120 --> 00:06:06,570 weighted by how likely the air is to take each of those values. 85 00:06:08,660 --> 00:06:14,393 But what this is saying, saying that our expected error is going to be zero, 86 00:06:14,393 --> 00:06:17,003 means that it's equally likely, 87 00:06:19,634 --> 00:06:24,409 that we're going to have positive or negative error for any given house sale. 88 00:06:29,711 --> 00:06:37,490 So, it's equally likely that our error is positive or negative. 89 00:06:37,490 --> 00:06:38,804 And what does this imply? 90 00:06:38,804 --> 00:06:44,147 This implies that it's equally likely that our observation, 91 00:06:44,147 --> 00:06:48,379 the specific observation that we get, is above or 92 00:06:48,379 --> 00:06:52,634 below the functional relationship defined by F. 93 00:06:52,634 --> 00:06:59,505 So, Y-i is equally 94 00:06:59,505 --> 00:07:06,375 likely to be above or 95 00:07:06,375 --> 00:07:13,017 below, F of xi. 96 00:07:14,814 --> 00:07:16,120 Okay. 97 00:07:16,120 --> 00:07:19,070 So, I want to be clear that this is the model that we're using. 98 00:07:19,070 --> 00:07:22,550 This is how we're assuming the world works. 99 00:07:22,550 --> 00:07:26,610 And there's this very famous quote by George Box that says, 100 00:07:26,610 --> 00:07:30,325 "Essentially, all models are wrong, but some are useful". 101 00:07:30,325 --> 00:07:38,170 So what this means is no models going to be exactly how the world works. 102 00:07:38,170 --> 00:07:43,100 It's not going to exactly predict how houses sell, just based on square feet, 103 00:07:43,100 --> 00:07:46,400 or even if you incorporated other things as well. 104 00:07:46,400 --> 00:07:50,129 There's always different idiosyncracies in how the world works. 105 00:07:51,700 --> 00:07:56,550 But models are going to represent some useful extraction 106 00:07:56,550 --> 00:08:00,600 of the relationship between, for example square foot and 107 00:08:00,600 --> 00:08:04,070 price, that is useful for a task such as prediction. 108 00:08:05,760 --> 00:08:10,420 Okay, so I want to make it very clear that everything I wrote on the last line 109 00:08:10,420 --> 00:08:13,750 is just our belief about how the world is going to work. 110 00:08:13,750 --> 00:08:17,800 Or maybe it's not even our belief, maybe it's just something we're going to use 111 00:08:17,800 --> 00:08:21,416 because it's useful, as George Bach said, or can be useful. 112 00:08:21,416 --> 00:08:25,487 And we're going to talk a lot about how we assess how useful things are in this 113 00:08:25,487 --> 00:08:29,127 course, but we're going to hold off on that conversation for now. 114 00:08:29,127 --> 00:08:33,209 [MUSIC]