1 00:00:00,264 --> 00:00:04,301 [MUSIC] 2 00:00:04,301 --> 00:00:06,693 So having finished the preceding modules, 3 00:00:06,693 --> 00:00:10,633 I'm feeling pretty confident that I come in, I can specify a model, and 4 00:00:10,633 --> 00:00:14,100 I can also specify an algorithm for how to fit that model. 5 00:00:14,100 --> 00:00:17,950 In doing that, I come in and I get some fitted function, and 6 00:00:17,950 --> 00:00:20,770 I know how to use that function to make predictions. 7 00:00:20,770 --> 00:00:23,720 So I go, I make predictions about the value of my house. 8 00:00:23,720 --> 00:00:25,640 I go to sell my house, and I make money. 9 00:00:25,640 --> 00:00:26,700 And I'm happy, right? 10 00:00:26,700 --> 00:00:27,360 I did a good job. 11 00:00:28,580 --> 00:00:30,920 Well, maybe, maybe not. 12 00:00:30,920 --> 00:00:34,136 Maybe my predictions weren't that good. 13 00:00:34,136 --> 00:00:38,747 And so, as a result, the value that I list my house for was inaccurate. 14 00:00:38,747 --> 00:00:42,090 And maybe I end up losing money as a result of that. 15 00:00:42,090 --> 00:00:43,610 So what we can think about, 16 00:00:43,610 --> 00:00:47,610 is a measure of how much are we losing when we make a certain prediction? 17 00:00:49,120 --> 00:00:53,750 So for example in the housing application, if we list the house value as too low, 18 00:00:53,750 --> 00:00:55,464 then maybe we get low offers. 19 00:00:55,464 --> 00:00:59,031 And that's a cost to me relative to having made a better prediction. 20 00:00:59,031 --> 00:01:03,207 Or if I list the value as too high, maybe people don't come see the house and 21 00:01:03,207 --> 00:01:04,920 I don't get any offers. 22 00:01:04,920 --> 00:01:08,080 Or maybe people notice that not many people are showing up 23 00:01:08,080 --> 00:01:10,920 to look at the house and they make me a very low offer. 24 00:01:10,920 --> 00:01:15,400 So, again, I'm in the situation of being in a worse financial state 25 00:01:15,400 --> 00:01:18,119 having made a poor prediction of the value of my house. 26 00:01:19,420 --> 00:01:20,860 So a question is, 27 00:01:20,860 --> 00:01:24,590 how much am I losing compared to having made perfect predictions? 28 00:01:24,590 --> 00:01:27,950 Of course we can never make perfect predictions, the way in which the world 29 00:01:27,950 --> 00:01:32,570 works is really complicated, and we can't hope to perfectly model that as well as 30 00:01:32,570 --> 00:01:37,240 the noise that's adherent in the process of any observations we might see. 31 00:01:38,390 --> 00:01:43,125 But let's just imagine that we could perfectly predict the value, 32 00:01:43,125 --> 00:01:46,377 then we'd say, in that case, our loss is 0. 33 00:01:46,377 --> 00:01:48,880 We're not losing any money because we did perfectly. 34 00:01:49,960 --> 00:01:54,040 So a question is, how do we formalize this notion of how much we're losing? 35 00:01:54,040 --> 00:01:57,770 And in machine learning, we do this by defining something called a loss function. 36 00:01:57,770 --> 00:02:01,500 And what the loss function specifies is the cost incurred 37 00:02:01,500 --> 00:02:05,970 when the true observation is y, and I make some other prediction. 38 00:02:05,970 --> 00:02:08,440 So, a bit more explicitly, what we're gonna do, 39 00:02:08,440 --> 00:02:11,230 is we're gonna estimate our model parameters. 40 00:02:11,230 --> 00:02:12,840 And those are w hat. 41 00:02:12,840 --> 00:02:15,200 We're gonna use those to form predictions. 42 00:02:15,200 --> 00:02:17,700 So, this notation here, 43 00:02:17,700 --> 00:02:23,430 f sub w hat is something we've equivalently written as f hat, but 44 00:02:23,430 --> 00:02:28,740 for reasons that we'll see later in this module, this notation is very convenient. 45 00:02:28,740 --> 00:02:35,716 And what it is, is it's our predicted value at some input x. 46 00:02:35,716 --> 00:02:38,360 And y is the true value. 47 00:02:38,360 --> 00:02:40,281 And this loss function, L, 48 00:02:40,281 --> 00:02:44,610 is somehow measuring the difference between these two things. 49 00:02:44,610 --> 00:02:47,403 And there are a couple ways in which we could define loss function. 50 00:02:47,403 --> 00:02:49,353 Well, there's actually many, many ways, but 51 00:02:49,353 --> 00:02:51,770 I'm just gonna go through a couple examples. 52 00:02:51,770 --> 00:02:56,690 And in particular, these examples that I'm gonna go through assume that the cost you 53 00:02:56,690 --> 00:03:02,460 incur by doing an overestimate, relative to an underestimate, are exactly the same. 54 00:03:02,460 --> 00:03:07,739 So there's no difference in listing my house as $1,000 too high, 55 00:03:07,739 --> 00:03:10,341 relative to $1,000 too low. 56 00:03:10,341 --> 00:03:14,600 Okay, so we're assuming what's called a symmetric loss function in these examples. 57 00:03:14,600 --> 00:03:19,720 And very common choices include assuming something that's called absolute error, 58 00:03:19,720 --> 00:03:24,022 which just looks at the absolute value of the difference between your 59 00:03:24,022 --> 00:03:28,590 true value and your predicted value. 60 00:03:28,590 --> 00:03:32,770 And another common choice is something called squared error, where, instead of 61 00:03:32,770 --> 00:03:36,540 just looking at the absolute value, you look at the square of that difference. 62 00:03:36,540 --> 00:03:41,773 And so that means that you have a very high cost if that difference is large, 63 00:03:41,773 --> 00:03:44,196 relative to just absolute error. 64 00:03:44,196 --> 00:03:46,380 So as we're going through this module, 65 00:03:46,380 --> 00:03:50,120 it's useful to keep in the back of your mind this quote by George Box. 66 00:03:50,120 --> 00:03:53,450 Which says that, Remember that all models are wrong; 67 00:03:53,450 --> 00:03:57,970 the practical question is how wrong do they have to be to not be useful. 68 00:03:57,970 --> 00:04:02,259 Okay, so we have spent a lot of time defining different models, and 69 00:04:02,259 --> 00:04:06,850 now we're gonna have tools to assess the performance of these methods, 70 00:04:06,850 --> 00:04:11,842 to think about these questions of whether they can be useful to us in practice. 71 00:04:11,842 --> 00:04:16,119 [MUSIC]