So what are greedy algorithms good for? Well, it turns out they're well suited for a number of fundamental problems across different domains of computer science. And to wet your appetite, for the many examples that we're going to see, I want to begin by discussing the problem of optimal caching. The punchline of the lecture is going to be that a natural greedy algorithm in fact minimizes the number of cache misses over all possible ways of managing a small fast cache. So what is the caching problem? Well on the one hand, there's going to be a big, but slow memory which you can think of as holding everything you might be interested in. And then there's also going to be what we call a cache and so this is a much smaller memory to which access is much faster. So this situation comes up all the time across different doings, computer science, architecture, operating systems, networking. just to mention a couple of really obvious examples. You could imagine, the small fast memory being something like an L2 cache. And the big slow memory being main memory. Or perhaps actually main memory is the fast memory, and the big slow memory would be disc. Now, your task, in the caching problem is to process what we're going to call a sequence of page requests. So a page request just means that the client wants to access something in memory and it's guaranteed to be in the big slow memory. But if its not already in the small fast memory then you got to bring it in, you got to put it in there for the subsequent access. The algorithmic aspect of the problem answers the picture when there is a cache miss or also known as a page fault. That is when there is a request for some data which is not already in the cache. When that is the case, you have to bring it into the cache. The design question then is what do you evict from the cache in order to make room for this new piece of data which you have to bring in. So to illustrate the issues, let's look at extremely simple example. A cache that just has four slots for pieces of data. Let's assume that initially the cache is seated with the four pieces of data I'll call a, b, c, and d. Now remember the input is a sequence of page requests, so these are requests for different pieces of data. Now when a new request comes in, you're basically sitting there crossing your fingers that what's been requested is already in the cache. So for example, if the first request comes in for the piece of data marked c, you said good we're good to go it's already in the cache go ahead and access it. Similarly if the next request is for d, you don't have to do anything. Good times roll, and d just gets accessed directly. The problem arises when something is requested that's not in the cache. So let's say the next request is for the data item e. Now remember, you have to bring e into the cache and of course you have to evict one of these four pieces of data to make room for it, and your algorithm has to decide which. Can you get rid of a or b or c or d. For this example, let's assume that we evict a to make room for e. Assume further that the next request that comes in is for a new piece of data f. Again it's not in the cache so we have to evict something to make room for it. Let's assume we get rid of b in order to bring in f. And now an unhappy situation but something that could certainly occur is we get a request for something that used to be in the cache but which we have since evicted. So for example if the next request is for a, then we're stuck. It's going to be another page fault, we have to evict something to bring in a. And similarly, if there's a b, again we're paying the price for evicting b to make room for f in the past. So in this example with these particular choices for page evictions, we incur four page faults. Now the first two the e and the f there's nothing we could have done about it. We were given a cache that initially did not have e and f and then e and f showed up, well what are you going to do, you're going to miss no matter what. However, two were caused by our unfortunate eviction decisions early on, to evict a and b only to find them requested after eviction. And with 20/20 hindsight, we can conclude we really should have evicted c and d, not a and b, to make room for e and f. So the point of this example is to illustrate first of all the caching problem, how it works. You have this small fast memory, it can't contain everything at once and so you have to sort of manage the cache and evict things, to make room for stuff as it gets accessed. That's the first point of the example. The second point is to illustrate there's really two types of cache misses or page faults. There's the ones which you know, really you can't do anything about. No matter what algorithm you use, you're going to suffer those faults. But then, depending on the eviction algorithm, you maybe able to avoid some of the cache misses that you would incur with an inferior algorithm. Algorithm. So, now the obvious question is, how well can we do? What's the best algorithm? How do we minimize the number of cache misses, suffering only the ones that are inevitable? So this question was given a very elegant answer by Belady back in the 1960's. And I'm going to state the answer as a theorem. it's a theorem we're not going to prove, for reasons I'll discuss in a second. but what the theorem says is that a natural greedy algorithm is an optimal algorithm for the caching problem. That is, it minimizes the number of cache misses over any way you might think about managing the cache. And the natural greedy algorithm that is optimal, is called the furthest in the future algorithm. So what is the furthest in the future algorithm? Well, it's exactly what you think it would be. It's basically what seems like a good idea at the moment you have to perform a eviction from the cache. Basically you want to put off judgment day. You want to put off the regret of evicting this particular piece of data as long as possible. When are you going to regret evicting a piece of data? Well, it's when it gets requested next. So if we have four things in the cache, you know, one is one is requested next, one is requested in seven time steps and one is requested in you know, 70 time steps, that's the one you want to evict now because it will take the longest until you actually regret that eviction. So for example, in the example on the previous slide, you can check that the furthest in the future algorithm would in fact evict the ones you want to evict, a and b, not the ones that we evicted in the example, c and d. Now at this point, many of you are probably justifiably scratching your heads. You're wondering, you know, why is this useful. It doesn't seem like this is what we wanted. The objection, to this result being that the furthest in the future algorithm is clairvoyant. Its very definition assumes that you know the future, it assumes that at the moment that you have to make an eviction you're aware of when each of the pieces data in the cache will be requested next. But if you think for a minute about the motivating applications for sudding the ultimate caching problem, this assumption simply doesn't hold, you simply do not know the future, you simply do not know when each of the pieces of data in your cache will be requested next. So this algorithm is not defined, it is unimplementable. Despite that, this is still an extremely useful result to know. Why. Well, two reasons. First of all, this unimplementable algorithm can never the less, serve as a guide line for practical. Implementable algorithms. For example it naturally motivates the LRU or least recently used caching algorithm. So what you do in the LRU algorithm is that instead of looking forward in the future, which you can't do generally, you look in the past. And you say, well, let me guess that whatever's been requested recently, will be requested again soon. Whatever hasn't been request been requested for a long time, will continue to not be requested for a long time. So, that sets as a proxy for the piece of data that's going to be referenced the furthest down the future, you look for the one that was most recently referenced the furthest back in the past. So that's the LRU algorithm. And as long as data exhibits what's called locality of reference, meaning whatever's being requested a lot in the recent past is also going to be what's requested in the near future. Then LRU is going to approximate furthest in the future. And indeed, LRU is in many applications, the gold standard amongst practical implementable caching algorithms. The second reason this theorem is useful in practice is because it served as an idealized benchmark. A hypothetical perfect scenario against which you can compare your latest and greatest cashing hereistic. So for example, maybe you have a caching application and you start by implementing the LRU least recently used caching algorithm and then as a sanity check you probably want to go back later once you have hindsight you look at the last few days of traces of logs of page requests and you say, how well did we do. Let's look at how well our caching algorithm LRU did. And let's look at how well we would have done had we known the future. And hopefully, you're just a few percent away. And then you can conclude that, yes indeed, the data seem to have locality reference. Yes indeed, LRU is doing almost as well as if we know the future, and we can proceed. On the other hand, if you go back through the last few days of logs, and you find that your caching algorithm is doing much worse than furthest in the future, then it's back to the drawing board with respect to your caching algorithm. You should work harder, understand the data better, and come up with a smarter heuristic. So for almost all of the greedy algorithms that we cover in this course, I'm going to explain to you why they are correct. I'm going to prove it rigorously. this algorithm is an exception. I'm actually not going to prove this theorem for you. the way it works is by what's called an exchange argument. So again, you may not have seen any examples, but you will soon. but the exchange argument to prove the latter result, as far as I know it's pretty tricky, believe it or not. Even though the algorithm is natural, you might think this result feels a little self-evident. Try to prove it rigorously. Not easy. Not easy at all. Indeed if you look at a textbook and say operating systems or a field like that, generally you'll see a description of this algorithm. You'll see the claim that it's optimal but you won't find the proof. some algorithms textbooks, for example Algorithm Design by Kleinberg and Tardos, do include the proof of this theorem. Those of you that are interested, I challenge you to prove it yourself without looking it up on the web or in a textbook. I think if you try to prove it you'll appreciate the subtleties that come up in understanding whether greedy algorithms are correct or not. In the best case scenario, I would love. Love to see, a simple proof of this theorem, something that I could explain in a class like this, in say five minutes or less, that would be amazing.