1 00:00:00,000 --> 00:00:04,011 [MUSIC] 2 00:00:04,011 --> 00:00:06,938 Okay, so that's one way to find the maximum or 3 00:00:06,938 --> 00:00:10,850 minimum of a function depending on which scenario we're in. 4 00:00:10,850 --> 00:00:15,450 It's just taking the derivative of the function, setting it equal to zero. 5 00:00:15,450 --> 00:00:18,870 And that solution will be unique assuming we're either in this convex or 6 00:00:18,870 --> 00:00:20,800 concave situation. 7 00:00:22,100 --> 00:00:28,660 But there are other methods for finding the maximum or minimum. 8 00:00:28,660 --> 00:00:32,135 So, if we're looking at these concave situations and 9 00:00:32,135 --> 00:00:37,041 our interest is in finding the max over all w of g(w) one thing we can look at is 10 00:00:37,041 --> 00:00:40,157 something called a hill-climbing algorithm. 11 00:00:40,157 --> 00:00:44,844 Where it's going to be an integrative algorithm where we start 12 00:00:44,844 --> 00:00:48,117 somewhere in this space of possible w's and 13 00:00:48,117 --> 00:00:53,350 then we keep changing w hoping to get closer and closer to the optimum. 14 00:00:54,370 --> 00:01:01,636 Okay, so, let's say that we start w here. 15 00:01:01,636 --> 00:01:06,900 And a question is well should I increase w, 16 00:01:06,900 --> 00:01:11,010 move w to the right or should I decrease w and 17 00:01:11,010 --> 00:01:14,710 move w to the left to get closer to the optimal. 18 00:01:16,610 --> 00:01:17,430 Well, what is the optimal? 19 00:01:17,430 --> 00:01:19,360 The optimal is this point here. 20 00:01:19,360 --> 00:01:21,428 So, if I had this picture in front of me, 21 00:01:21,428 --> 00:01:24,760 I know that I should be moving to the right. 22 00:01:24,760 --> 00:01:26,870 But mathematically what's a way to say that? 23 00:01:29,590 --> 00:01:32,470 What's a way to say that where increasing that we know. 24 00:01:32,470 --> 00:01:33,460 Let me write this down. 25 00:01:36,420 --> 00:01:40,707 How do we know whether 26 00:01:40,707 --> 00:01:45,670 to move w to the right or left 27 00:01:47,500 --> 00:01:51,690 and equivalently meaning increase or 28 00:01:51,690 --> 00:01:56,390 decrease the value of w. 29 00:01:56,390 --> 00:02:00,100 Well, what I can do is I can look at the function at w and 30 00:02:00,100 --> 00:02:06,660 I can take the derivative and if the derivative is positive like it is here, 31 00:02:06,660 --> 00:02:08,730 this is the case where I want to increase w. 32 00:02:08,730 --> 00:02:13,460 But what if on the other hand I had started the algorithm over here 33 00:02:14,820 --> 00:02:17,310 at this value of w? 34 00:02:17,310 --> 00:02:22,900 Would I want to move increasing w or move to the left decreasing w? 35 00:02:22,900 --> 00:02:24,420 Well, in this case, I'd want to decrease w. 36 00:02:24,420 --> 00:02:28,810 And again, if I look at the derivative, I see in this case, 37 00:02:28,810 --> 00:02:29,930 the derivative is negative. 38 00:02:31,050 --> 00:02:36,119 So, we can actually divide the space. 39 00:02:36,119 --> 00:02:40,696 Where on the left of the optimal, we have that 40 00:02:40,696 --> 00:02:46,390 the derivative of g with respect to w is greater than 0. 41 00:02:46,390 --> 00:02:50,136 And these are the cases where we're gonna wanna increase w. 42 00:02:50,136 --> 00:02:52,975 And on the right-hand side of the optimum we have 43 00:02:52,975 --> 00:02:56,170 that the derivative of g with respect to w is negative. 44 00:02:57,650 --> 00:03:01,600 And these are cases where we're gonna wanna decrease w. 45 00:03:01,600 --> 00:03:06,070 And what about if I'm exactly at the optimum, which maybe I'll call w Star? 46 00:03:08,380 --> 00:03:10,200 Do I want to move to the left or the right? 47 00:03:13,320 --> 00:03:14,200 Well, neither. 48 00:03:14,200 --> 00:03:18,930 I want to stay exactly where I am. 49 00:03:18,930 --> 00:03:20,630 That's the maximum. 50 00:03:20,630 --> 00:03:21,870 And what happens in this case? 51 00:03:21,870 --> 00:03:26,450 Well, if I look at the derivative at the optimum, 52 00:03:26,450 --> 00:03:30,630 we know that the derivative is 0. 53 00:03:30,630 --> 00:03:33,110 So, again, the derivative is telling me what I wanna do. 54 00:03:33,110 --> 00:03:36,270 I don't wanna change w at all. 55 00:03:36,270 --> 00:03:39,650 So, this hill climbing algorithm, the way we can write it is, we say. 56 00:03:41,470 --> 00:03:45,040 While our algorithm has not converged. 57 00:03:45,040 --> 00:03:51,910 So, while not converged, if I can spell converged. 58 00:03:54,190 --> 00:04:01,039 I'm gonna take my previous w, where I was at iteration t. 59 00:04:01,039 --> 00:04:03,129 So this is an iteration counter. 60 00:04:05,497 --> 00:04:11,249 And I'm gonna move in the direction 61 00:04:11,249 --> 00:04:16,240 indicated by the derivative. 62 00:04:16,240 --> 00:04:22,690 So, if the derivative of the function is positive, 63 00:04:22,690 --> 00:04:25,860 I'm going to be increasing w, and if the derivative is negative, 64 00:04:25,860 --> 00:04:29,370 I'm going to be decreasing w, and that's exactly what I want to be doing. 65 00:04:29,370 --> 00:04:35,570 But instead of moving exactly the amount specified by the derivative at that point, 66 00:04:35,570 --> 00:04:38,650 we can introduce something, I'll call it ada. 67 00:04:38,650 --> 00:04:41,930 And ada is what's called a step size. 68 00:04:44,490 --> 00:04:48,710 It says when I go, so let me just complete this statement here. 69 00:04:48,710 --> 00:04:51,000 So, it's a little bit more interpretable. 70 00:04:51,000 --> 00:04:54,190 So, when I go to compute my next w value, 71 00:04:54,190 --> 00:04:59,940 I'm gonna take my previous w value and I'm going to move and 72 00:04:59,940 --> 00:05:04,540 amount based on the derivative as determined by the step size. 73 00:05:06,740 --> 00:05:12,060 Okay so let's look at a picture of how this might work. 74 00:05:12,060 --> 00:05:16,550 So, let's say I happen to start on this left hand side at this w value here. 75 00:05:16,550 --> 00:05:19,080 Compute the derivative, and I take a step. 76 00:05:20,950 --> 00:05:22,510 Determined by that step size. 77 00:05:22,510 --> 00:05:25,030 And at this point the derivative is pretty large. 78 00:05:25,030 --> 00:05:26,840 This function's pretty steep. 79 00:05:26,840 --> 00:05:28,390 So, I'm going to be taking a big step. 80 00:05:29,470 --> 00:05:30,940 Then, I compute the derivative. 81 00:05:30,940 --> 00:05:33,170 I'm still taking a fairly big step. 82 00:05:33,170 --> 00:05:35,350 I keep stepping increasing. 83 00:05:35,350 --> 00:05:38,860 What I mean by each of these is I keep increasing w. 84 00:05:41,910 --> 00:05:43,710 Keep taking a step in w. 85 00:05:43,710 --> 00:05:46,358 Going, computing the derivative and 86 00:05:46,358 --> 00:05:51,732 as I get closer to the optimum The size of the derivative has decreased, and so 87 00:05:51,732 --> 00:05:57,541 if I assume that eda is just some constant we'll get back to this in a couple slides. 88 00:05:57,541 --> 00:06:01,490 But if I assume that there is just a fixed step size that I am taking. 89 00:06:01,490 --> 00:06:11,000 Then, I'm gonna be decreasing how much I'm moving. 90 00:06:11,000 --> 00:06:12,690 As I get to the optimal. 91 00:06:12,690 --> 00:06:18,540 And if I end up on the other side of the optimum, then 92 00:06:18,540 --> 00:06:21,760 the derivative there is negative, it's going to push me back towards the optimum. 93 00:06:21,760 --> 00:06:25,530 So, eventually I'm gonna converge to the optimum itself. 94 00:06:26,710 --> 00:06:30,252 And note that if I'm close to the optimum, I'm not gonna jump very far. 95 00:06:30,252 --> 00:06:33,867 Again because when you're close to the optimum, that derivative Is really, 96 00:06:33,867 --> 00:06:34,586 really small. 97 00:06:34,586 --> 00:06:40,347 So this term here is gonna be really small and I'm not gonna be changing w very much. 98 00:06:40,347 --> 00:06:44,399 [MUSIC]