This sequence of videos is about the design and analysis of heuristics. Algorithms which are generally guaranteed to be fast but not guaranteed to be 100% correct. As a case study, we're going to revisit the knapsack problem that we thought about earlier in the That they name a programming section. Let briefly review the 3 strategies for dealing with NP complete problems all of these are relevant for the knapsack problems. The first strategy is to identify computationally tractable special cases of the NP complete problem that you're grappling with. If you're lucky, the instance is relevant for your application will fall squarely into one of these computationally tractable special cases. Even otherwise, it is useful have these kinds of sub routines lying around. Perhaps with a little bit of extra work you can reduce your problem instances to one of these computational tractable special cases. A dynamic programming solution to the knapsack problem identifies one such special case. Namely, knapsack instances where the knapsack capacity is small. For example, if it's comparable with the number of items. The second approach is the design and analysis of heuristics. Algorithms which are not guaranteed to be 100% optimal. Although more to say about those in a second. The third strategy is to insist on a correct algorithm and being an NP complete problem, you're then expecting to see exponential running time but still they have running time quality to be better than what you have achieved with naive brute force search. A dynamic programming solution to an abstract problem can be viewed as a contribution in this direction. Naive brute force search enumerating overall possible subsets of items. It's going to take time proportional to 2 to the n, where n is the number of items. The dynamic programming algorithm takes time proportional to n the number of items times W for knapsack capacity. For most problems of interest there's then a programming solution or be at exponential for reasons we've discussed in the past will be a Significant improvement over naive brute-force search. [SOUND] Let's now return to the subject of heuristics, the topic for today. Happily, despite being NP-complete, the knapsack problem admits some quite useful heuristics. We're going to study two of them. First we'll use the greedy algorithm design paragon to come up with a pretty good heuristic. Pleasingly, this pretty good Greedy heruistic is also blazingly fast. We'll then pull out a different tool, mainly dynamic programming to develop yet another heuristic. It's going to be polynomial time, not as fast as a Greedy solution, but it's accuracy will be excellent. It will get within a 1-epsilon factor of optimal where epsilon is a paramater that we can tune as we wish. Now, there's zillions and zillions of algorithms in the world, and anybody can write one of them down. So, if someone proposes an algorithm to you, the owness is on them, to convince you that it's a good idea. To convince you that it's, in some sense, a good algorithm. Periodically, in these courses, I've been making that argument in 2 ways. First of all, I provide you with an argument, that the algorithm that I proposed, is correct. It always is guaranteed to solve the problem that we're interested in. Secondly, I argue that it solves the problem using a reasonable amount of resources. So 1 great exhibit A would be say Dijkstra's algorithm. We wanted to solve the shortest path problem with a single source non-negative edge lengths. I gave you a proof that Dijkstra's algorithm always achieves that goal, and I showed you an implementation so that the running time was not much more than what it takes to just read the input. Now, of course, this best case scenario is not going to be realized for NP-Complete problems, and in particular for the nap sack problem. So you have to give up either on the correctness or on the running time. You have to make a compromise. A tricky question then, is how do you argue that the compromise you're making is not overly severe how do you argue that your approach to an NP-complete Problem is a good one. Heuristics relax correctness, and so one would like an argument that says, correctness is not relaxed by too much. In the best case scenario, when you propose a heuristic, you should supply a performance guarantee which says that the solution is almost correct in some sense on every single instance. Or failing that, at least on many instances of interest. Now in practice you do bump into problems that are so hard you wind up having to resort to heuristics where you really don't have a good understanding of how or when they work, where you really just cross your fingers, run the heuristic and hope for the best. Local search is probably the preeminent example of this type of heuristic, something like the of the K means. An algorithm for clustering. But, for the problem we're going to talk about today, we're going to seek out heuristics that have worse case performance guarantees. And of course, a broader theme of this discussion, and the reason I'm using knapsack as an example is this will show us how our programmer toolbox or algorithm toolbox is now rich enough to design not just lots of different exact algorithms, but also lots of useful heuristics for complete problems. So let me briefly remind you what the knapsack problem is: As input, we are given 2 n+1 numbers. There's n items. Each one has a postiive value and a positive weight. The final number is a knapsack capacity Capital W. The goal is to pack the knapsack with the most valuable set of items. That is, you're seeking out a subset S of items and it should have the property that the sum of the sizes of the items in your set S is bounded above by the knapsack capital W subject to that constraint, your subset of chosen items should maximize the sum. More of the values. This is a fundamental problem that comes up all the time, in particular as a subroutine. Really, when you have a constrain resource that you want to use in the best way possible, that is basically the knapsack problem. So let's turn to heuristics for the knapsack problem. So now that I've told you that we're happy with heuristics, we're willing to relax correctness, this all of a sudden resuscitates the greedy algorithm design paradigm as a feasible approach to the knapsack Knapsack problem. Because the knapsack problem is empty complete, we certainly are not expecting to find a exactly correct greedy algorithm, but maybe there's a greedy algorithm which is pretty good, and we're expecting most greedy algorithms are going to run extremely quickly. So let's talk through a potential greedy approach to the knapsack problem. Probably the first idea you'd have would be to consider the items in some order. and when you consider an item, you make an irrevocable decision at that point, whether to include it in your solution. Or not. So the question then is, in what order should you look at the items? Well, what's our objective? Our objective is to maximize the value of our set. So obviously, high value items are important, so maybe the first idea would be to sort the items, from highest value, to lowest value. But if you thik about that proposal for say, 30 sconds, you quickly realize that this is a little naive. This is not the whole story. If you had a high value item that fills up the whole knapsack, seems like that's not quite as useful if you had an almost as higih value item that basically had size close to zero. That didn't use up any of the knapsack at all. Remember that each item has 2 properties, not just its value but also its size, and both of these are important. We want to prefer items that have a lot of value but we also want to prefer items that are small. So I hope many of you are now sensing a strong parallel between the present discussion. and the discussion we had with our first serious study of agreeom. That was scheduling jobs on a single machine to minimize the sum of the weighted completion times. In that old scheduling problem, just like in this one, each object had 2 parameters, back then it was a job length and a job weight or a job priority, and you had a preference for jobs that were higher weight, you wanted. Them to be first, and you had a preference for shortage ops, you wanted them to be first. Here again we have items, two parameters, the value and the size, high value is good, low size is good. Back in the scheduling problem, the key idea was to look at jobs ratios and schedule in that order. By analogy, here in the knapsack problem, if you want to take the 2 parameters of each item, form a single parameter by which we can then sort the jobs, the natural first cut to look at That is a ratio.Since we prefer high values, we prefer low weights, the sensible think we look at is the ratio of the valuable item divided by its size and we then consider items from the highest values of these ratios to the lowest values of these ratios. One way to think of this is that we consider the items in order of non-decreasing bang per buck, we're always trying to get the most value for our money where money is the amount of the knapsack capacity that we're spending. So now that we have our greed ordering we just proceed through the items one at a time and. And we pack the items into this knapsack in this ordering. Now what happens here and didn't actually trouble us in the scheduling problem is at some point we might no longer be able to pack items into the knapsack. The thing might get full. So once that happens, once we reach an item which doesn't fit into the knapsack given the items that we've already packed into it, we just stop the A greedy algorithm. If you think about it, the way I've written this algorithm, aborting as soon as you fail to pack 1 item is not really what you would do in practice. You would do something a little bit smarter. If you fail to pack some given item, say item k +1, well maybe item k+2, it has a worse ratio but maybe it's much smaller. Maybe item k+2 does fit in the knapsack. Then there's no reason not to just take it. So in practice you'd go through the entire list of items, and you would just pack whatever fits; skipping whatever doesn't fit. For simplicity, I'm going to analyze this slightly worse heurisitc in the slides to follow. So just a real quick example. Let's consider the following 3 item instance. So, I've taken the liberty of sorting the items by ratio. The 1st item has value 2 in size 1 for ratio 2. The 2nd item has a ratio of 4/3rd's value 4 size 3. And the 3rd item has a ratio of 1, value in weight equal to 3. Let's say the nap-sack capasity is 5. So what does the greedy solution do. It first packs in item number 1, then there's a residual knapsack capacity of 4. So there's room for item number 2 it packs that. Now there's a residual knapsack capacity of only 1 so there isn't any room for item number 3 so it halts. So the greedy algorithm is just going to output the set 1, 2. In this particular example, you will notice that this, greedy solution of value 6, is in fact the optimal solution. You can't do better. But, lets look at a second example with just 2 items. So in this particular instance, what is the value of the output of the greedy algorithm, and what is the value of an optimal solution. So the correct answer is A. Since the knapsack capacity is a 1000 and the sum of the sizes of the jobs is 1,001. There's no room for both of them. The greedy algorithm unfortunately, because the first tiny item has a smaller ratio. Well pack in item number 1, and that leaves no room for item number 2, so the, value of the greedy algorithm solution is just 2, where as the optimal solution is of course, to just take the second item. Yeah, its ratio is worse, but on the other hand, it fills up the whole knapsack. And so overall, your value is 1,000, which obviously blows away the Greedy solution. This quiz shows that at least for general instances the Greedy knapsack heuristic we've devised so far can be arbitararily bad. It can be terrible with respect to an optimal solution. There is, however, a simple fix to address this issue. We're going to add a very simple step three to our greedy heuristic. So the new step three is just going to compare two candidate solutions in return which everyone is better, which everyone has a bigger total value. The first candidate is just what we were doing before, it's just the output of step two of the algorithm. The second candidate is whichever item by itself has the maximum value. In other words, this new greedy algorithm, it just runs the old one, but then it does a final sanity check. It looks at each item individually and it says, well, if this item just by itself dominates the solution I've computed thus far, I return this single item by itself. Instead. There's no question that adding this new Step 3 fixes the problem in the criz, quiz in the last slide. If you add this Step 3, then we will choose that item that has value 1,000 instead of the lousy solution of value 2 that we computed in Step 2. But it may seem that this Step 3 is just kind of a hack. Meant to address that specific instance. But remarkably just adding step three transforms this greedy heuristic into one that has a reasonably good worst-case performance guarantee. Precisely, the value of the solution computed by this three-step greedy algorthim is always at least 50% of the value of an optimal solution. So, the jargan for this kind of algorithm is a 1/2 approximation algorithm. For every instance, it's guaranteed to compute a solution at least 1/2 that of an optimal solution. Also, as you would hope, from a greedy heuristic, it runs extremely quickly. Basically all we do is sort the items, so the running time is o of n, log n. Again. In the next video, we'll provide a proof of this theorem. And, as we'll see, under some additional assumptions about the type of instances the performance guarantee will in fact be much, much better than 50%,