1 00:00:02,380 --> 00:00:06,317 So what are greedy algorithms good for? Well, it turns out they're well suited 2 00:00:06,317 --> 00:00:10,102 for a number of fundamental problems across different domains of computer 3 00:00:10,102 --> 00:00:12,454 science. And to wet your appetite, for the many 4 00:00:12,454 --> 00:00:16,341 examples that we're going to see, I want to begin by discussing the problem of 5 00:00:16,341 --> 00:00:19,983 optimal caching. The punchline of the lecture is going to 6 00:00:19,983 --> 00:00:24,691 be that a natural greedy algorithm in fact minimizes the number of cache misses 7 00:00:24,691 --> 00:00:27,850 over all possible ways of managing a small fast cache. 8 00:00:27,850 --> 00:00:31,457 So what is the caching problem? Well on the one hand, there's going to be 9 00:00:31,457 --> 00:00:35,315 a big, but slow memory which you can think of as holding everything you might 10 00:00:35,315 --> 00:00:38,271 be interested in. And then there's also going to be what we 11 00:00:38,271 --> 00:00:41,929 call a cache and so this is a much smaller memory to which access is much 12 00:00:41,929 --> 00:00:44,459 faster. So this situation comes up all the time 13 00:00:44,459 --> 00:00:48,654 across different doings, computer science, architecture, operating systems, 14 00:00:48,654 --> 00:00:51,324 networking. just to mention a couple of really 15 00:00:51,324 --> 00:00:54,593 obvious examples. You could imagine, the small fast memory 16 00:00:54,593 --> 00:00:58,297 being something like an L2 cache. And the big slow memory being main 17 00:00:58,297 --> 00:01:00,913 memory. Or perhaps actually main memory is the 18 00:01:00,913 --> 00:01:03,637 fast memory, and the big slow memory would be disc. 19 00:01:03,637 --> 00:01:07,723 Now, your task, in the caching problem is to process what we're going to call a 20 00:01:07,723 --> 00:01:11,250 sequence of page requests. So a page request just means that the 21 00:01:11,250 --> 00:01:15,638 client wants to access something in memory and it's guaranteed to be in the 22 00:01:15,638 --> 00:01:18,803 big slow memory. But if its not already in the small fast 23 00:01:18,803 --> 00:01:23,191 memory then you got to bring it in, you got to put it in there for the subsequent 24 00:01:23,191 --> 00:01:25,544 access. The algorithmic aspect of the problem 25 00:01:25,544 --> 00:01:29,524 answers the picture when there is a cache miss or also known as a page fault. 26 00:01:29,524 --> 00:01:33,660 That is when there is a request for some data which is not already in the cache. 27 00:01:33,660 --> 00:01:36,658 When that is the case, you have to bring it into the cache. 28 00:01:36,658 --> 00:01:40,690 The design question then is what do you evict from the cache in order to make 29 00:01:40,690 --> 00:01:43,740 room for this new piece of data which you have to bring in. 30 00:01:46,940 --> 00:01:51,775 So to illustrate the issues, let's look at extremely simple example. 31 00:01:51,775 --> 00:01:55,420 A cache that just has four slots for pieces of data. 32 00:01:56,740 --> 00:02:02,023 Let's assume that initially the cache is seated with the four pieces of data I'll 33 00:02:02,023 --> 00:02:05,787 call a, b, c, and d. Now remember the input is a sequence of 34 00:02:05,787 --> 00:02:10,680 page requests, so these are requests for different pieces of data. 35 00:02:10,680 --> 00:02:14,645 Now when a new request comes in, you're basically sitting there crossing your 36 00:02:14,645 --> 00:02:17,764 fingers that what's been requested is already in the cache. 37 00:02:17,764 --> 00:02:21,782 So for example, if the first request comes in for the piece of data marked c, 38 00:02:21,782 --> 00:02:25,853 you said good we're good to go it's already in the cache go ahead and access 39 00:02:25,853 --> 00:02:28,073 it. Similarly if the next request is for d, 40 00:02:28,073 --> 00:02:31,989 you don't have to do anything. Good times roll, and d just gets accessed 41 00:02:31,989 --> 00:02:34,602 directly. The problem arises when something is 42 00:02:34,602 --> 00:02:38,863 requested that's not in the cache. So let's say the next request is for the 43 00:02:38,863 --> 00:02:41,703 data item e. Now remember, you have to bring e into 44 00:02:41,703 --> 00:02:46,419 the cache and of course you have to evict one of these four pieces of data to make 45 00:02:46,419 --> 00:02:49,316 room for it, and your algorithm has to decide which. 46 00:02:49,316 --> 00:02:56,004 Can you get rid of a or b or c or d. For this example, let's assume that we 47 00:02:56,004 --> 00:03:03,084 evict a to make room for e. Assume further that the next request that 48 00:03:03,084 --> 00:03:07,548 comes in is for a new piece of data f. Again it's not in the cache so we have to 49 00:03:07,548 --> 00:03:11,846 evict something to make room for it. Let's assume we get rid of b in order to 50 00:03:11,846 --> 00:03:14,344 bring in f. And now an unhappy situation but 51 00:03:14,344 --> 00:03:18,968 something that could certainly occur is we get a request for something that used 52 00:03:18,968 --> 00:03:21,823 to be in the cache but which we have since evicted. 53 00:03:21,823 --> 00:03:25,306 So for example if the next request is for a, then we're stuck. 54 00:03:25,306 --> 00:03:29,530 It's going to be another page fault, we have to evict something to bring in a. 55 00:03:29,530 --> 00:03:34,041 And similarly, if there's a b, again we're paying the price for evicting b to 56 00:03:34,041 --> 00:03:38,188 make room for f in the past. So in this example with these particular 57 00:03:38,188 --> 00:03:41,044 choices for page evictions, we incur four page faults. 58 00:03:41,044 --> 00:03:45,169 Now the first two the e and the f there's nothing we could have done about it. 59 00:03:45,169 --> 00:03:49,506 We were given a cache that initially did not have e and f and then e and f showed 60 00:03:49,506 --> 00:03:53,050 up, well what are you going to do, you're going to miss no matter what. 61 00:03:53,050 --> 00:03:58,793 However, two were caused by our unfortunate eviction decisions early on, 62 00:03:58,793 --> 00:04:03,890 to evict a and b only to find them requested after eviction. 63 00:04:03,890 --> 00:04:09,481 And with 20/20 hindsight, we can conclude we really should have evicted c and d, 64 00:04:09,481 --> 00:04:13,793 not a and b, to make room for e and f. So the point of this example is to 65 00:04:13,793 --> 00:04:16,551 illustrate first of all the caching problem, how it works. 66 00:04:16,551 --> 00:04:20,373 You have this small fast memory, it can't contain everything at once and so you 67 00:04:20,373 --> 00:04:24,195 have to sort of manage the cache and evict things, to make room for stuff as 68 00:04:24,195 --> 00:04:26,856 it gets accessed. That's the first point of the example. 69 00:04:26,856 --> 00:04:30,872 The second point is to illustrate there's really two types of cache misses or page 70 00:04:30,872 --> 00:04:33,098 faults. There's the ones which you know, really 71 00:04:33,098 --> 00:04:36,388 you can't do anything about. No matter what algorithm you use, you're 72 00:04:36,388 --> 00:04:39,484 going to suffer those faults. But then, depending on the eviction 73 00:04:39,484 --> 00:04:43,403 algorithm, you maybe able to avoid some of the cache misses that you would incur 74 00:04:43,403 --> 00:04:45,376 with an inferior algorithm. Algorithm. 75 00:04:45,376 --> 00:04:48,710 So, now the obvious question is, how well can we do? 76 00:04:48,710 --> 00:04:52,755 What's the best algorithm? How do we minimize the number of cache 77 00:04:52,755 --> 00:04:55,980 misses, suffering only the ones that are inevitable? 78 00:05:00,360 --> 00:05:04,865 So this question was given a very elegant answer by Belady back in the 1960's. 79 00:05:04,865 --> 00:05:07,579 And I'm going to state the answer as a theorem. 80 00:05:07,579 --> 00:05:12,026 it's a theorem we're not going to prove, for reasons I'll discuss in a second. 81 00:05:12,026 --> 00:05:16,705 but what the theorem says is that a natural greedy algorithm is an optimal 82 00:05:16,705 --> 00:05:21,036 algorithm for the caching problem. That is, it minimizes the number of cache 83 00:05:21,036 --> 00:05:24,560 misses over any way you might think about managing the cache. 84 00:05:27,200 --> 00:05:32,054 And the natural greedy algorithm that is optimal, is called the furthest in the 85 00:05:32,054 --> 00:05:36,908 future algorithm. So what is the furthest in the future 86 00:05:36,908 --> 00:05:39,193 algorithm? Well, it's exactly what you think it 87 00:05:39,193 --> 00:05:41,529 would be. It's basically what seems like a good 88 00:05:41,529 --> 00:05:44,908 idea at the moment you have to perform a eviction from the cache. 89 00:05:44,908 --> 00:05:46,945 Basically you want to put off judgment day. 90 00:05:46,945 --> 00:05:50,622 You want to put off the regret of evicting this particular piece of data as 91 00:05:50,622 --> 00:05:53,454 long as possible. When are you going to regret evicting a 92 00:05:53,454 --> 00:05:56,069 piece of data? Well, it's when it gets requested next. 93 00:05:56,069 --> 00:06:00,546 So if we have four things in the cache, you know, one is one is requested next, 94 00:06:00,546 --> 00:06:04,911 one is requested in seven time steps and one is requested in you know, 70 time 95 00:06:04,911 --> 00:06:09,163 steps, that's the one you want to evict now because it will take the longest 96 00:06:09,163 --> 00:06:13,686 until you actually regret that eviction. So for example, in the example on the 97 00:06:13,686 --> 00:06:18,262 previous slide, you can check that the furthest in the future algorithm would in 98 00:06:18,262 --> 00:06:21,065 fact evict the ones you want to evict, a and b, 99 00:06:21,065 --> 00:06:24,320 not the ones that we evicted in the example, c and d. 100 00:06:24,320 --> 00:06:28,812 Now at this point, many of you are probably justifiably scratching your 101 00:06:28,812 --> 00:06:31,597 heads. You're wondering, you know, why is this 102 00:06:31,597 --> 00:06:33,921 useful. It doesn't seem like this is what we 103 00:06:33,921 --> 00:06:37,683 wanted. The objection, to this result being that the furthest in the future 104 00:06:37,683 --> 00:06:41,344 algorithm is clairvoyant. Its very definition assumes that you know the 105 00:06:41,344 --> 00:06:45,310 future, it assumes that at the moment that you have to make an eviction you're 106 00:06:45,310 --> 00:06:49,139 aware of when each of the pieces data in the cache will be requested next. 107 00:06:49,139 --> 00:06:53,596 But if you think for a minute about the motivating applications for sudding the 108 00:06:53,596 --> 00:06:58,288 ultimate caching problem, this assumption simply doesn't hold, you simply do not 109 00:06:58,288 --> 00:07:02,921 know the future, you simply do not know when each of the pieces of data in your 110 00:07:02,921 --> 00:07:07,085 cache will be requested next. So this algorithm is not defined, it is 111 00:07:07,085 --> 00:07:10,369 unimplementable. Despite that, this is still an extremely 112 00:07:10,369 --> 00:07:13,720 useful result to know. Why. 113 00:07:13,720 --> 00:07:17,259 Well, two reasons. First of all, this unimplementable 114 00:07:17,259 --> 00:07:21,770 algorithm can never the less, serve as a guide line for practical. 115 00:07:21,770 --> 00:07:25,295 Implementable algorithms. For example it naturally motivates the 116 00:07:25,295 --> 00:07:27,773 LRU or least recently used caching algorithm. 117 00:07:27,773 --> 00:07:32,070 So what you do in the LRU algorithm is that instead of looking forward in the 118 00:07:32,070 --> 00:07:35,210 future, which you can't do generally, you look in the past. 119 00:07:35,210 --> 00:07:39,441 And you say, well, let me guess that whatever's been requested recently, will 120 00:07:39,441 --> 00:07:42,713 be requested again soon. Whatever hasn't been request been 121 00:07:42,713 --> 00:07:47,000 requested for a long time, will continue to not be requested for a long time. 122 00:07:47,000 --> 00:07:51,513 So, that sets as a proxy for the piece of data that's going to be referenced the 123 00:07:51,513 --> 00:07:56,026 furthest down the future, you look for the one that was most recently referenced 124 00:07:56,026 --> 00:07:59,528 the furthest back in the past. So that's the LRU algorithm. 125 00:07:59,528 --> 00:08:03,526 And as long as data exhibits what's called locality of reference, 126 00:08:03,526 --> 00:08:08,447 meaning whatever's being requested a lot in the recent past is also going to be 127 00:08:08,447 --> 00:08:13,245 what's requested in the near future. Then LRU is going to approximate furthest 128 00:08:13,245 --> 00:08:16,505 in the future. And indeed, LRU is in many applications, 129 00:08:16,505 --> 00:08:20,750 the gold standard amongst practical implementable caching algorithms. 130 00:08:20,750 --> 00:08:26,396 The second reason this theorem is useful in practice is because it served as an 131 00:08:26,396 --> 00:08:30,631 idealized benchmark. A hypothetical perfect scenario against 132 00:08:30,631 --> 00:08:35,220 which you can compare your latest and greatest cashing hereistic. 133 00:08:35,220 --> 00:08:39,432 So for example, maybe you have a caching application and you start by implementing 134 00:08:39,432 --> 00:08:43,388 the LRU least recently used caching algorithm and then as a sanity check you 135 00:08:43,388 --> 00:08:47,498 probably want to go back later once you have hindsight you look at the last few 136 00:08:47,498 --> 00:08:51,300 days of traces of logs of page requests and you say, how well did we do. 137 00:08:51,300 --> 00:08:53,756 Let's look at how well our caching algorithm LRU did. 138 00:08:53,756 --> 00:08:57,001 And let's look at how well we would have done had we known the future. 139 00:08:57,001 --> 00:08:59,086 And hopefully, you're just a few percent away. 140 00:08:59,086 --> 00:09:02,470 And then you can conclude that, yes indeed, the data seem to have locality 141 00:09:02,470 --> 00:09:04,741 reference. Yes indeed, LRU is doing almost as well 142 00:09:04,741 --> 00:09:06,780 as if we know the future, and we can proceed. 143 00:09:06,780 --> 00:09:10,488 On the other hand, if you go back through the last few days of logs, and you find 144 00:09:10,488 --> 00:09:14,243 that your caching algorithm is doing much worse than furthest in the future, then 145 00:09:14,243 --> 00:09:17,487 it's back to the drawing board with respect to your caching algorithm. 146 00:09:17,487 --> 00:09:21,056 You should work harder, understand the data better, and come up with a smarter 147 00:09:21,056 --> 00:09:24,487 heuristic. So for almost all of the greedy 148 00:09:24,487 --> 00:09:28,758 algorithms that we cover in this course, I'm going to explain to you why they are 149 00:09:28,758 --> 00:09:30,761 correct. I'm going to prove it rigorously. 150 00:09:30,761 --> 00:09:34,505 this algorithm is an exception. I'm actually not going to prove this 151 00:09:34,505 --> 00:09:37,616 theorem for you. the way it works is by what's called an 152 00:09:37,616 --> 00:09:40,252 exchange argument. So again, you may not have seen any 153 00:09:40,252 --> 00:09:43,890 examples, but you will soon. but the exchange argument to prove the 154 00:09:43,890 --> 00:09:47,634 latter result, as far as I know it's pretty tricky, believe it or not. 155 00:09:47,634 --> 00:09:51,852 Even though the algorithm is natural, you might think this result feels a little 156 00:09:51,852 --> 00:09:53,961 self-evident. Try to prove it rigorously. 157 00:09:53,961 --> 00:09:55,280 Not easy. Not easy at all. 158 00:09:55,280 --> 00:09:59,146 Indeed if you look at a textbook and say operating systems or a field like that, 159 00:09:59,146 --> 00:10:01,707 generally you'll see a description of this algorithm. 160 00:10:01,707 --> 00:10:04,993 You'll see the claim that it's optimal but you won't find the proof. 161 00:10:04,993 --> 00:10:08,956 some algorithms textbooks, for example Algorithm Design by Kleinberg and Tardos, 162 00:10:08,956 --> 00:10:12,435 do include the proof of this theorem. Those of you that are interested, I 163 00:10:12,435 --> 00:10:16,108 challenge you to prove it yourself without looking it up on the web or in a 164 00:10:16,108 --> 00:10:18,524 textbook. I think if you try to prove it you'll 165 00:10:18,524 --> 00:10:22,487 appreciate the subtleties that come up in understanding whether greedy algorithms 166 00:10:22,487 --> 00:10:25,290 are correct or not. In the best case scenario, I would love. 167 00:10:25,290 --> 00:10:29,673 Love to see, a simple proof of this theorem, something that I could explain 168 00:10:29,673 --> 00:10:32,576 in a class like this, in say five minutes or less, 169 00:10:32,576 --> 00:10:33,880 that would be amazing.