1 00:00:00,000 --> 00:00:04,353 [MUSIC] 2 00:00:04,353 --> 00:00:08,202 Okay, while working with a simple linear regression model, 3 00:00:08,202 --> 00:00:11,544 let's talk about how we're gonna fit a line to data. 4 00:00:11,544 --> 00:00:14,774 But before we talk about specific algorithms for fitting, 5 00:00:14,774 --> 00:00:18,408 we need to talk about how we're gonna measure the quality of a fit. 6 00:00:18,408 --> 00:00:23,529 So, we're gonna talk about this orange box here which is this quality metric. 7 00:00:25,261 --> 00:00:30,255 And in this case, now that we've mentioned that our function is 8 00:00:30,255 --> 00:00:34,971 parametrized in terms of some parameters were calling w that 9 00:00:34,971 --> 00:00:40,940 represents w zero and w one in this case or intercept in our slope. 10 00:00:40,940 --> 00:00:44,420 We know that when we're going to predict house values, 11 00:00:44,420 --> 00:00:48,370 instead of talking about f hat, our estimated function, 12 00:00:48,370 --> 00:00:52,830 we can talk in terms of w hat, our estimated parameters. 13 00:00:52,830 --> 00:00:57,240 Because those estimated parameters fully determine our estimated function. 14 00:00:58,318 --> 00:01:02,450 So, we're gonna modify this block diagram, and replace f hat now with w hat. 15 00:01:03,450 --> 00:01:06,280 And talk about estimating these parameters w. 16 00:01:07,580 --> 00:01:10,305 So what's the cost of using a specific line? 17 00:01:10,305 --> 00:01:16,670 Well the one we're gonna talk about here, and the one that we focused on 18 00:01:16,670 --> 00:01:21,360 in the first course of the specialization, is Residual sum of squares. 19 00:01:21,360 --> 00:01:26,103 And what Residual sum of squares assumes, is that we're 20 00:01:26,103 --> 00:01:31,910 just gonna add up the errors we made between 21 00:01:31,910 --> 00:01:36,880 this line here, which represents what we believe the relationship is, 22 00:01:36,880 --> 00:01:40,660 and what we've estimated the relationship to be between x and y. 23 00:01:40,660 --> 00:01:43,810 And what the actual observation y was. 24 00:01:43,810 --> 00:01:46,090 So we're gonna take each one of these errors or 25 00:01:46,090 --> 00:01:50,860 residuals and sorry, I should be clear. 26 00:01:50,860 --> 00:01:53,120 I talked about error as the epsilon i. 27 00:01:53,120 --> 00:01:55,460 The error was part of my model. 28 00:01:55,460 --> 00:02:00,334 A residual is the difference between a prediction and an actual value. 29 00:02:00,334 --> 00:02:02,522 Okay, so I wanna make sure that that's clear so 30 00:02:02,522 --> 00:02:05,930 that's why this is called Residual sum of squares. 31 00:02:05,930 --> 00:02:10,212 So this is the formula that we presented in the first course of the specialization, 32 00:02:10,212 --> 00:02:12,306 so I'll run through it fairly quickly. 33 00:02:12,306 --> 00:02:19,848 Our Residual sum of squares is gonna be a function of our two parameters, w0 and w1. 34 00:02:19,848 --> 00:02:21,660 That determines what line we're looking at. 35 00:02:21,660 --> 00:02:29,750 So of course as I change that line, we're changing the cost, this cost term, RSS. 36 00:02:29,750 --> 00:02:33,988 And what I'm doing is I'm adding up the difference between 37 00:02:33,988 --> 00:02:37,472 the specific house sales price of a given house. 38 00:02:37,472 --> 00:02:41,030 And what my line specifies it as. 39 00:02:41,030 --> 00:02:45,320 And what does the line specify the price to be? 40 00:02:45,320 --> 00:02:52,660 Well it specifies it as wo + w1 times however many square feet this house had. 41 00:02:55,430 --> 00:02:58,740 But I'm not just looking at that difference nor it's absolute value. 42 00:02:58,740 --> 00:03:01,740 I'm looking at the square of the difference. 43 00:03:01,740 --> 00:03:04,810 That's where the sum of squares comes in. 44 00:03:04,810 --> 00:03:06,060 Well that's where the squares comes in, and 45 00:03:06,060 --> 00:03:11,760 then the sum is I'm adding over this error over all houses in my training data set. 46 00:03:12,760 --> 00:03:17,848 Okay, so just to summarize residual sum of squares is I take this difference 47 00:03:17,848 --> 00:03:22,549 between what the line is telling me, the price of the house should be. 48 00:03:22,549 --> 00:03:26,545 What the actual house price was, look at the difference squared and 49 00:03:26,545 --> 00:03:29,220 add over every house in my training data set. 50 00:03:30,900 --> 00:03:33,700 So we saw this equation before. 51 00:03:33,700 --> 00:03:37,200 But now I'm gonna write it more compactly and we're gonna work with this form 52 00:03:37,200 --> 00:03:42,610 throughout the rest of this module and of forms like this in the rest of the course, 53 00:03:42,610 --> 00:03:48,390 where I've introduced this notion, this capital Sigma. 54 00:03:48,390 --> 00:03:54,480 What this means is if I write Sigma, and I write and 55 00:03:54,480 --> 00:04:00,005 i=1 one on the bottom of this Greek letter. 56 00:04:00,005 --> 00:04:03,607 And a capital N at the top of this Greek letter and 57 00:04:03,607 --> 00:04:07,670 what I'm saying is I'm summing over some quantity. 58 00:04:07,670 --> 00:04:12,110 I'll just generically look at some quantity ai. 59 00:04:12,110 --> 00:04:20,470 I'm summing a1+a2 plus all the way up to aN. 60 00:04:20,470 --> 00:04:23,740 So I'm summing up n different qualities. 61 00:04:23,740 --> 00:04:28,630 And in this case here, what is ai? 62 00:04:29,700 --> 00:04:32,590 Well, ai is just this inner thing here so 63 00:04:32,590 --> 00:04:38,106 it's yi- [w0 + w1xi] 64 00:04:38,106 --> 00:04:42,990 quantity squared. 65 00:04:42,990 --> 00:04:47,520 Okay, so this is just shorthand notation for what we had on the previous slide, 66 00:04:47,520 --> 00:04:50,890 where we're summing over all houses in the training data set. 67 00:04:50,890 --> 00:04:55,790 But instead of writing this thing in English or writing out this really massive 68 00:04:55,790 --> 00:05:01,410 sum over 1,000 and 1,000 of houses, I'm gonna write it compactly like this. 69 00:05:03,530 --> 00:05:05,580 Okay, so now that we have this notation, 70 00:05:07,240 --> 00:05:10,369 let's talk about how we think about finding the best line. 71 00:05:11,540 --> 00:05:16,380 So we have this function that if you give me any w0 and 72 00:05:16,380 --> 00:05:18,690 w1 it defines a specified cost. 73 00:05:19,760 --> 00:05:26,916 So for example this line, let's say that the intercept was 0.97 and 74 00:05:26,916 --> 00:05:32,678 the slope was 0.85, well, that results in some RSS. 75 00:05:32,678 --> 00:05:37,160 So let's call it just #1. 76 00:05:37,160 --> 00:05:41,010 And then if I give you a different line, so that's specified by a different 77 00:05:43,110 --> 00:05:47,920 intercept and a different slope, that's gonna result in a different cost. 78 00:05:47,920 --> 00:05:50,030 So I'll just say that's some other number. 79 00:05:51,860 --> 00:05:56,870 And then this other line with different parameters 80 00:05:56,870 --> 00:06:01,880 has some other number associated with its cost as well. 81 00:06:01,880 --> 00:06:06,910 And my goal here, when I'm talking about estimating a function from data, 82 00:06:07,920 --> 00:06:11,590 given a specific model, which in this case is just a simple line. 83 00:06:11,590 --> 00:06:16,500 Is I'm gonna search over all possible lines, all w0 and 84 00:06:16,500 --> 00:06:21,030 w1 shifting this line up and down and looking at different slopes. 85 00:06:21,030 --> 00:06:23,560 And I'm gonna try and find the one. 86 00:06:23,560 --> 00:06:27,100 It results in the smallest residual sum of squares. 87 00:06:27,100 --> 00:06:31,750 So out of these three lines, if those were the space of all lines I was looking at, 88 00:06:31,750 --> 00:06:34,530 clearly it's not it's a huge space I'm looking at. 89 00:06:34,530 --> 00:06:40,410 I would choose the one with the smallest number here. 90 00:06:40,410 --> 00:06:44,534 [MUSIC]