1 00:00:00,032 --> 00:00:04,104 [MUSIC] 2 00:00:04,104 --> 00:00:08,872 But before we get to how to minimize residual sum of squares in particular, 3 00:00:08,872 --> 00:00:12,597 let's talk about optimization in a more abstract sense. 4 00:00:12,597 --> 00:00:16,960 In terms of just a generic function and thinking about how do we minimize it, 5 00:00:16,960 --> 00:00:20,981 what are the properties of the minimum of a function and algorithms for 6 00:00:20,981 --> 00:00:23,170 achieving that minimum? 7 00:00:23,170 --> 00:00:27,994 One of the first important concepts that we can talk about is the notion of convex 8 00:00:27,994 --> 00:00:29,514 and concave functions. 9 00:00:29,514 --> 00:00:34,297 So here, I have a picture of a concave function over 10 00:00:34,297 --> 00:00:37,870 just one variable, we'll call it w. 11 00:00:37,870 --> 00:00:43,530 And this function here, this is g(w). 12 00:00:43,530 --> 00:00:48,973 And this is a concave function, and the way I remember concave versus convex, 13 00:00:48,973 --> 00:00:54,348 is concave it looks like you're in a cave, so it looks like the arch of a cave. 14 00:00:54,348 --> 00:00:57,726 And convex is just the other one. 15 00:00:57,726 --> 00:01:02,658 So, the way we can think about defining what a concave 16 00:01:02,658 --> 00:01:07,054 function is, we can look at any two values of w. 17 00:01:07,054 --> 00:01:09,422 Let's call it a and b. 18 00:01:09,422 --> 00:01:14,083 And look at the value of the function at a and b. 19 00:01:14,083 --> 00:01:16,741 So g(a) and g(b). 20 00:01:16,741 --> 00:01:20,980 And let's draw a line between g(a) and g(b). 21 00:01:20,980 --> 00:01:23,240 So that's this green line here. 22 00:01:23,240 --> 00:01:29,142 And what we see is that this line 23 00:01:29,142 --> 00:01:35,280 lies below g(w) everywhere. 24 00:01:38,200 --> 00:01:43,050 And a concave function is function where for any value of a and any value 25 00:01:43,050 --> 00:01:48,060 of b that you choose, when you go and draw this little line between the points of 26 00:01:48,060 --> 00:01:54,250 the function, it's always gonna lie below the actual curve of the function itself. 27 00:01:54,250 --> 00:01:55,530 A convex function, 28 00:01:55,530 --> 00:02:01,420 on the other hand, let's just draw a little picture of a convex function. 29 00:02:01,420 --> 00:02:05,727 It has exactly the same, but opposite, so not the same. 30 00:02:05,727 --> 00:02:08,250 It has exactly the opposite property. 31 00:02:08,250 --> 00:02:12,520 Where if you choose any a and any b, so 32 00:02:12,520 --> 00:02:17,790 just to be clear this is w, this is g(w). 33 00:02:17,790 --> 00:02:22,882 If I go and connect g(a) with 34 00:02:22,882 --> 00:02:28,178 g(b), with just a line, and 35 00:02:28,178 --> 00:02:35,320 this line is above g(w), everywhere. 36 00:02:41,360 --> 00:02:46,971 Okay, so that's the very intuitive geometric definition of what a concave or 37 00:02:46,971 --> 00:02:48,509 convex function is. 38 00:02:48,509 --> 00:02:52,010 But then of course there are functions that are neither concave nor convex. 39 00:02:52,010 --> 00:02:55,460 So, let me just draw two examples here. 40 00:02:55,460 --> 00:03:00,040 Here's an example of a function that is not concave nor convex. 41 00:03:00,040 --> 00:03:01,150 And how do we see that? 42 00:03:01,150 --> 00:03:06,270 Well, let's look at a point a and 43 00:03:06,270 --> 00:03:12,630 a point b, and draw a line between these points. 44 00:03:12,630 --> 00:03:17,510 And we see that this line lies both below the curve here and above the curve here. 45 00:03:23,790 --> 00:03:29,416 So it doesn't satisfy either the concave or convex criteria. 46 00:03:29,416 --> 00:03:37,540 And likewise this function Has the same property, a and b. 47 00:03:39,470 --> 00:03:45,337 Draw our little line, which okay I didn't draw that well at all. 48 00:03:45,337 --> 00:03:48,260 Let me try and draw a straight line. 49 00:03:48,260 --> 00:03:52,920 But I think you get the point, that neither of these functions are concave, 50 00:03:52,920 --> 00:03:53,890 nor convex. 51 00:03:57,070 --> 00:04:02,420 Okay, so that's the notion of defining what a concave or convex function is. 52 00:04:02,420 --> 00:04:05,195 But we're looking at an optimization objective, 53 00:04:05,195 --> 00:04:08,947 where either our goal is to find the minimum or maximum of a function. 54 00:04:08,947 --> 00:04:13,822 So typically if we're looking at a concave function, our interest is 55 00:04:13,822 --> 00:04:19,400 in finding the maximum, and for convex it's in finding the minimum. 56 00:04:19,400 --> 00:04:24,466 So let's look at, what is the maximum of this concave function? 57 00:04:24,466 --> 00:04:27,386 Well, that's just this point right here. 58 00:04:27,386 --> 00:04:30,916 So that is the maximum. 59 00:04:30,916 --> 00:04:35,776 And [COUGH] here, what I'm saying is 60 00:04:35,776 --> 00:04:40,645 our goal is max over all w, g(w). 61 00:04:40,645 --> 00:04:44,408 In terms of the notation that we introduced a little bit earlier. 62 00:04:44,408 --> 00:04:47,940 And so what's the property of this concave function at this point? 63 00:04:49,170 --> 00:04:54,708 Well what we know is that this is the point where the derivative is 0. 64 00:05:01,875 --> 00:05:06,655 So there's no rate of change of this function at this point. 65 00:05:06,655 --> 00:05:09,770 And what about a convex function? 66 00:05:09,770 --> 00:05:14,090 If we're seeking to find min over all w g(w), 67 00:05:14,090 --> 00:05:19,900 well the minimum is clearly this point right here. 68 00:05:19,900 --> 00:05:21,820 And again, what's the property? 69 00:05:21,820 --> 00:05:24,728 Same thing, derivative = 0. 70 00:05:31,984 --> 00:05:35,372 And note that for the concave and convex functions, 71 00:05:35,372 --> 00:05:38,760 there's only one place where the derivative = 0. 72 00:05:38,760 --> 00:05:42,952 In contrast, when we look at the two examples for functions that were neither 73 00:05:42,952 --> 00:05:46,750 concave nor convex that we showed on the previous slide. 74 00:05:46,750 --> 00:05:50,845 What we see is in this case, for example, 75 00:05:50,845 --> 00:05:58,110 they're multiple locations where you have the derivative equaling 0. 76 00:05:58,110 --> 00:06:04,872 So, let's write that explicitly and 77 00:06:04,872 --> 00:06:09,733 say that there are multiple 78 00:06:09,733 --> 00:06:15,446 solutions to derivative = 0. 79 00:06:15,446 --> 00:06:21,200 In contrast for this function there's actually no solution to derivative = 0. 80 00:06:21,200 --> 00:06:22,900 The derivative is nowhere near 0. 81 00:06:22,900 --> 00:06:28,194 The function continuously has a rate of change. 82 00:06:28,194 --> 00:06:31,607 So, no solution 83 00:06:34,329 --> 00:06:40,230 To derivative = 0. 84 00:06:40,230 --> 00:06:42,110 Okay, so the key take home message here, 85 00:06:42,110 --> 00:06:46,750 is the reason we're talking about concave and convex functions is the fact that when 86 00:06:46,750 --> 00:06:50,950 we have an objective to find the minimum or the maximum. 87 00:06:50,950 --> 00:06:54,070 Minimum of a convex, maximum of a concave. 88 00:06:55,720 --> 00:07:00,247 The solution to that is gonna be unique, and we know that it's gonna exist. 89 00:07:00,247 --> 00:07:04,509 [MUSIC]