Let's turn to the analysis of our dynamic programming base heuristic for the knapsack problem. In particular, let's understand precisely how the running time of this heuristic depends on the desired accuracy epsilon. Let me jog your memory, what are the two steps of our dynamic programming base heuristic? The first thing we do is we massage the item values. We round them so that they are relatively small integers. Precisely, for each item i, we define v hat to be what you get by dividing the item size by m, and then rounding down to the nearest integer. In step 2, we invoke our second dynamic programming solution for the knapsack problem, and we pass to it these new values, the v hats, as well as the original item sizes and the original knapsack capacity. This algorithm as stated is underdetermined, that's because I haven't told you how to set the parameter m. We do at least understand qualitatively the role of m and how it influences the choice in the accuracy running time trade-off. Specifically, the bigger we set m, the more information we lose when we round to the original values the Vi's to the V hat i's. More loss in information should translate to less accuracy, larger m, less accuracy. On the other hand, the bigger we take m, the smaller the transformed item values, the v hats. Remember, the running time of our second dynamic programming algorithm is proportional to the largest item value, so smaller item values trans, translates to faster running time. Summarizing, the bigger the m, the less the accuracy, but the faster the algorithm. Here's our plan for the analysis. We begin by studying the accuracy constraint. Remember the set up, the client is giving us a positive parameter epsilon and it is our responsibility to output a feasible solution whose total value is at least one minus epsilon times that of an optimal solution. This constraint on our algorithm's accuracy should translate to an upper bound on how, how large a value of m we can get away with. Remember, as we jack up m, we lose accuracy. So the first question we're going to study is how large an m can we get away with while still being guaranteed to compute a solution within one minus epsilon times that of an optimal one. Once we solve this problem and we understand how large a value of m that we can get away with, how aggressive we're permitted to be with the scaling, we'll then evaluate the running time of the resulting algorithm for that particular choice of m. To answer this first question, how big can we take m subject to an accuracy constraint of one minus epsilon? It's important to understand in detail how the rounded item values, the V hats, compare to the original item values, the Vs. That's what this quiz is mean to ask you to think about. Remember how we obtain v hat i from Vi. First, we decrease it to the next multiple of m, then we divide it by m. So we can think of m times V hat of i, as well we have just after we've rounded down to the nearest multiple of m and before we actually divided by m. So how did we get to this m times V hat i? We decreased Vi down to the next nearest multiple. So we decreased it by somewhre between zero and m, so that result is somewhere between Vi minus m and Vi. So let's now assemble a few preliminaries crucial for the accuracy analysis. So, what did we learn from the quiz? We learned that for each item, i, the rounded value V hat i scaled up by m, so intuitively, this is after we've rounded Vi down to the nearest multiple, but before we've divided, that lies somewhere between Vi as an upper bound and Vi-m as a lower bound. So let me extract from this two different inequalities, one which expresses an upper bound on Vi in terms of V hat i, and then one, which expresses a lower amount on Vi in terms of V hat i. So there are two inequalities which are immediate from this fact. Let me go ahead and give them names 1 and 2. We'll put these to use in the next slide. So first of all, the value Vi, well, that's lower bounded by m times the rounded value V hat i, and then, V hat i, we can lower bound that by Vi after we subtract m from it. So we'll call those inequalities 1 and 2. So these first two inequalities, they're important for us. They don't have a lot of content, they're just sort of immediate from our definition of step 1 of our algorithm from the way we do our rounding. But step 2 of our algorithm gives us a third inequality, which is much more powerful, and this says, well look, in the second step, we solve a knapsack instance optimally. It's true we solve it for the wrong values, we solve it for the V hats, but, for these rounded values, the V hats, the solution we come up with is better than any other. In particular, if we measure value using the rounded values, the V hats, then the solution S output by our heuristic is even better than the solution S star optimal for the original problem, optimal for the Vi's. That is, if we sum up the V hat i's of the items in our solution S, and we compare it to the sum of the V hat i's in any other solution, in particular, the solution S star optimal for the original instance, we come out ahead. Our sum of values is bigger. That's going to be our inequality number 3. These 3 inequalities are pretty much all we've got going for us, but fortunately they are sufficient to solve the problem that we asked. To determine how large an m we can get away with, subjects to guaranteeing an accuracy of one minus epsilon. To see this, let's just follow our nose. Let's just see what happens if we chain these three inequalities together. Now, the third one seems to have the most content, that really say's were optimally solving this problem with the transform values, the V hats. So let's by just writing down, yet again, inequality number three. So, inequality three just says that the sum of the transformed values, the sum of the V hats and r heuristic solution capital S is at least as good as any other solution, in particular, that of the solution S star optimal for the original problem for the original item values. So inequality three is a fine starting point. Fundamentally, what we're trying to do here is lower bound the performance of the solution S, spit out by our heuristic, and that's what this inequality does. The one problem is that it's justifying the performance of S in terms of the wrong data, in terms of the transformed values that V hats, whereas what we really care about are the original values of the Vs. So we need to take this inequality 3 as a seed. And grow from it a chain of inequalities, rewriting on both the left-hand side and the right-hand side, the transformed values the V hats in terms of the original values, the V's and we'll use inequalities one and two to do that. As we grow the chain to the left, we want quantities that are only getting bigger and bigger. As we grow the chain to the right, we want quantities that are only going smaller and smaller. Hopefully, at the end of the day, when the dust settles, we'll have an approximate optimaility result for our heuristic solution S. Now if you look back inequalities one and two, you'll see that both are in terms of the quantity m times V hat. So, it's going to be convenient to multiply both sides of the inequanity by m. Of course, it's still valid if I do that. To extend this chain of inequalities on the left-hand side, we need to tack on a quantity which is only bigger. That is, we need to upper bound m times V hat i in terms of the original value Vi, but inequality one simply says that indeed, m times V hat i is never bigger than the original value Vi. So we simply apply inequality one to each item in the heuristic solution capital S, so we get an upper bound of sum over the items in our solution S of the value of that item. So that's great, we get a lower bound on the performance of our solution, that's exactly what we want. To grow the chain of inequalities to the right, we need a quantity which is only smaller. That is we want to lower bound m times V hat i in terms of some function of the original value V sub i. Now, V sub i can be bigger than m times V hat i. Remember, we round it down, but inequality two says it can only be bigger by at most m. So if we subtract m from Vi, we get a lower down on m times V hat i, so we just apply that term by term, to the optimal solution S star. So just by following our nose on this way, we get exactly what we want. We get on the left-hand side, what we really care about, namely the total value of the solution, output by our heuristic and this is the total value with the original values, mind you. Not with the transformed ones, but the ones we really care about. We get a lower bound on that total value, that's exactly what we wanted. In terms of the best case scenario, the total value of the optimal solution, and the only blemish as we pick up this error of at most minus m for each item i in the optimal solution S star. So lets just, lower bound this crudely, the optimal solution S star, it can only have n items at most, so we pick up a minus m at worst for each of those at most n items. So, if we subtract an error term of m times n, that gives us lower bound on the value of R solution S. So now that the dust is cleared and we see that we have a performance guarantee for R solution S in terms of that, of the optimal solution S star, basically, we're off by this error term of m times n. Remember, m is this parameter that governs how aggressively we scale the item values and it's controlling the running time accuracy trade-off. We were expecting this all along. We argued that the bigger you take m, the more information you lose, so the less accuracy you'd expect. And indeed, the extent that we're off by the optimal solution is growing linearly in this parameter m. The question we're striving to answer, remember, is how big an m can we get away with and still be sure our solution is within a one minus epsilon factor of the optimal one? But to achieve this, we just need to make sure the worst case error in our solution, m*n, is never bigger than an epsilon fraction of the optimal solution's value. That is, we just set m small enough that m*n is not more than epsilon times the optimal value. So this inequality is meant to tell us how to set the parameter m in our algorithm, but you'd be right to complain that on the right-hand side, there's a quality here that we don't actually know. Right? We don't actually know the optimal value of the optimal solution S star, that after all, is what we're trying to compute, at least approximately in the first place. So instead, what we're going to do is just use a crude lower bound on how small the optimal solution value could be. Let's make a trivial assumption. Let's assume that each item has a size, at most, the knapsack capacity. That is, each item fits by itself inside the knapsack. Any items which fail that test obviously can be deleted in a preprocessing step, then the optimal solution, if nothing else, it's at least as large as the value of the max value item. So to satisfy the desired accuracy constraint, all we need to do is to set m small enough, so that m*n is at most the right-hand side listed here. Of course, it's sufficient to set m even smaller so that m*n is no more than an even smaller number. So, in our algorithm we just set m to be the number that equalizes m*n, the left-hand side, with our lower bound on the right-hands side, namely epsilon time Vmax. So notice, these are all indeed parameters known to the algorithm. It of course knows how many items n there are, it knows epsilon, that's the parameter supplied to the algorithm by the client, and it's easy to compute Vmax, the maximum value item. So the algorithm is in a position to set m so that this equality holds, that is to set m equals to epsilon Vmax over n. That finally, completes the description of the dynamic, dynamic programming based heuristic. So we've now completed the answer to the first question that we asked, how large an m can we get a way with subject to the accuracy constrain of one minus epsilon? We can get away with m as large as epsilon times the largest item value divided by n? And now, I hope that everybody is holding their breath and crossing their fingers just a little bit, because remember, n not only governs the accuracy, it also governs the running time. The more aggressively that we can scale the numbers, the larger we can take m, the faster the algorithm. The question now is, are we permitted to take m large enough that the resulting heuristic running time is polynomial? The running time of the heuristic is certainly dominated by step 2, where we invoke our dynamic programming algorithm. The running time there is n squared times the maximum item value passed to that subroutine. That is, the maximum value of a scaled value, the maximum V hat i. So the big question then is how big can the scaled item values, the V hat i's be? So pick your favorite item i, how big can it's transformed value V hat i be? Well, how do we get V hat i? We take the original value vi, we round it down, we divide it by m. So, the biggest certainly that V hat i could be is the original value of Vi divided by m. That obviously is at most the maximum orig, original value divided by m. Now, we plug in our selection for m epsilon times Vmax over n. Very happily, the two Vmax's cancel, leaving us with just n over epsilon. Plugging in n over epsilon as an upper bound on every transformed value passed to the step 2 dynamic programming solution, we get an overall running time of n^3 over epsilon. The running time does degrade with epsilon, the smaller the epsilon, the bigger the running time. Of course, you would expect that with any NP-complete problem. As you take epsilon smaller and smaller, you need to take m smaller and smaller to get the accuracy guarantee, that results in less agressive scaling of the item values, in turn, bigger transform values get passed to the dynamic programming subroutine resulting in the bigger running time. Nevertheless, this does show that the full continuum of running time accuracy trade-offs is possible for the knapsack problem. You want arbitrarily close approximation for this problem? You've got it.