1 00:00:04,247 --> 00:00:08,558 Now that we have an understanding of what the fitted line is, and how we can use it, 2 00:00:08,558 --> 00:00:12,997 let's talk about an algorithm, or really algorithms for searching out the space of 3 00:00:12,997 --> 00:00:17,340 all possible lines that we might use, and finding the one that best fits the data. 4 00:00:19,840 --> 00:00:23,080 So in particular what we're going to be doing is focusing in on this machine 5 00:00:23,080 --> 00:00:26,770 learning algorithm, which is this dark gray square shown in this flow chart. 6 00:00:28,380 --> 00:00:34,780 Okay, so recall that our cost was to find us this residual sum of squares, 7 00:00:34,780 --> 00:00:39,350 and for any given line, we can compute the cost of that line. 8 00:00:39,350 --> 00:00:42,735 So, for example, we showed three different lines and 9 00:00:42,735 --> 00:00:46,430 three different residual sum of squares here, but 10 00:00:46,430 --> 00:00:51,370 our goal was to minimize over all possible W0 and W1 slopes and 11 00:00:52,400 --> 00:00:57,390 intercepts, but a question is, how are we going to do this? 12 00:00:57,390 --> 00:01:00,590 So that's the key question that we're looking to address in this 13 00:01:00,590 --> 00:01:01,330 part of the module. 14 00:01:04,800 --> 00:01:07,327 Let's formalize this idea a little bit more. 15 00:01:07,327 --> 00:01:12,316 So here, what we're showing is our residual sum of squares and 16 00:01:12,316 --> 00:01:17,226 what we see is it's a function of two variables, w0 and w1. 17 00:01:17,226 --> 00:01:19,635 So we can write it generically, 18 00:01:19,635 --> 00:01:25,270 let's just write it as some function g of a variable w0 and a variable w1. 19 00:01:25,270 --> 00:01:27,540 And what I've done is I've gone ahead and 20 00:01:27,540 --> 00:01:32,340 plotted the residual sum of squares versus w zero and w one for 21 00:01:32,340 --> 00:01:38,030 the data set you guys played within the first course of this specialization. 22 00:01:38,030 --> 00:01:46,770 So here along this axis is w zero and along this axis is w one. 23 00:01:46,770 --> 00:01:52,080 And then we are plotting here our residual sum of squares. 24 00:01:52,080 --> 00:01:56,220 And that is this blue mesh surface here, that's 25 00:01:57,350 --> 00:02:02,000 our residual sum of squares for any given w zero, w one pair. 26 00:02:03,880 --> 00:02:08,950 And our objective here is to minimize over all possible combinations of w zero, 27 00:02:08,950 --> 00:02:14,330 and w one, so mathematically we write that, great, I wrote right over what I 28 00:02:14,330 --> 00:02:19,480 wanted to show you guys, so let me erase what I wrote before, and 29 00:02:19,480 --> 00:02:22,900 rewrite it so that it's a little bit more intelligible. 30 00:02:24,290 --> 00:02:29,310 But, as I was saying, our mathematical notation for this minimization over 31 00:02:29,310 --> 00:02:35,450 all possible W0, W1 is this notation right here. 32 00:02:35,450 --> 00:02:40,210 Okay, so in terms of the picture what we want to do is 33 00:02:40,210 --> 00:02:44,830 over this entire space of w zero and w one, we want to find the place, 34 00:02:44,830 --> 00:02:49,659 let me find the place so that this is a little easier to see on this slide, 35 00:02:51,770 --> 00:02:57,760 so we want to find the specific value of W0. 36 00:02:57,760 --> 00:03:00,550 So, we'll call that W0 hat. 37 00:03:00,550 --> 00:03:09,570 And W1 hat, that minimize this residual summit squares. 38 00:03:09,570 --> 00:03:11,370 So, this is our objective. 39 00:03:11,370 --> 00:03:17,557 And switching back to our blue color here, this is an optimization problem, 40 00:03:17,557 --> 00:03:23,650 where specifically the optimization objective is to minimize a function, 41 00:03:23,650 --> 00:03:28,623 in this case, of two parameters, two different variables. 42 00:03:28,623 --> 00:03:32,759 [MUSIC]