1 00:00:00,012 --> 00:00:04,024 In this video, we'll develop our second dynamic programming solution for the 2 00:00:04,024 --> 00:00:07,191 knapsack problem. Why do we need a second algorithm? Well, 3 00:00:07,191 --> 00:00:11,377 it's a crucial building block in our heuristic that achieves arbitrarily close 4 00:00:11,377 --> 00:00:15,092 approximation to the knapsack problem in polynomial time. 5 00:00:15,092 --> 00:00:19,283 Recall that in the knapsack problem, the input is 2n+1 numbers. 6 00:00:19,283 --> 00:00:24,247 There's n items, each one has a positive value, Vi, and a positive size, Wi. 7 00:00:24,247 --> 00:00:27,332 Additionally, we're given a knapsack capacity, 8 00:00:27,332 --> 00:00:30,094 capital W. Back in the dynamic programming section 9 00:00:30,094 --> 00:00:33,664 of this course, we already devised an algorithm for this problem. 10 00:00:33,664 --> 00:00:37,731 That algorithm assumed that the item sizes were integral and also that the 11 00:00:37,731 --> 00:00:41,394 knapsack capacity is an integer. The running time of that dynamic 12 00:00:41,394 --> 00:00:44,821 programming algorithm was O(n), the number of items, times capital W, the 13 00:00:44,821 --> 00:00:48,046 knapsack capacity. In particular, we get a polynomial time 14 00:00:48,046 --> 00:00:51,945 algorithm for the special case of knapsack when the knapsack capacity is 15 00:00:51,945 --> 00:00:55,132 not too big, when it's polynomial in n. 16 00:00:55,132 --> 00:00:59,592 It turns out that this is not the right special case on which to build an 17 00:00:59,592 --> 00:01:03,877 algorithm which has arbitrarily good approximation in polynomial time. 18 00:01:03,877 --> 00:01:08,687 Rather, for that kind of heuristic, what we really want is a polynomial time 19 00:01:08,687 --> 00:01:13,738 solution for the special case where item values are small, that is precisely the 20 00:01:13,738 --> 00:01:16,505 solution that we're going to provide in this video. 21 00:01:16,505 --> 00:01:19,967 We will be assuming that the integer values are all integers. 22 00:01:19,967 --> 00:01:24,657 The sizes of the knapsack capacity can be whatever and the dynamic program solution 23 00:01:24,657 --> 00:01:28,309 that we developed will have running time. Big O of n^2 times Vmax. 24 00:01:28,309 --> 00:01:31,104 Vmax here denotes the maximum value of any item. 25 00:01:31,104 --> 00:01:35,549 The running times of the two algorithms are more in parallel than the notation 26 00:01:35,549 --> 00:01:38,697 might suggest. Probably, it looks like there's an extra 27 00:01:38,697 --> 00:01:41,971 factor of n in the second dynamic programming algorithm, 28 00:01:41,971 --> 00:01:47,247 but that's the wrong way to think. Think, for the first algorithm, think of 29 00:01:47,247 --> 00:01:51,512 w as an upper bound on the largest total size that a feasible solution might 30 00:01:51,512 --> 00:01:54,067 possess. In the second algorithm, think of n times 31 00:01:54,067 --> 00:01:58,682 Vmax as an upper bound on the maximum total value that any feasible solution 32 00:01:58,682 --> 00:02:01,692 might possess. In the first algorithm, we're focused on 33 00:02:01,692 --> 00:02:04,257 size, in the second algorithm, we're focused on 34 00:02:04,257 --> 00:02:06,812 value, and in any other case, the running time 35 00:02:06,812 --> 00:02:11,467 is the maximum quantity that we have to worry about times the number of items, n. 36 00:02:11,467 --> 00:02:15,852 Now, since all of you are survivors of my dynamic programming boot camp, I don't 37 00:02:15,852 --> 00:02:20,117 feel the need to develop this solution in great detail, let me just jump to the 38 00:02:20,117 --> 00:02:23,802 relevant subproblems. So the subproblems here will be a twist 39 00:02:23,802 --> 00:02:27,667 on the ones that we used in our first dynamic programming algorithm for 40 00:02:27,667 --> 00:02:30,232 knapsack. The part that's going to be exactly the 41 00:02:30,232 --> 00:02:34,482 same is we're going to have an index i, i specifies which prefix of items we're 42 00:02:34,482 --> 00:02:38,919 permitted to use in a given subproblem. Now, in general, in the knapsack problem, 43 00:02:38,919 --> 00:02:43,079 you're simultaneously trying to get the value high while keeping the weight 44 00:02:43,079 --> 00:02:45,453 small. So the way we organized things in our 45 00:02:45,453 --> 00:02:49,588 first dynamic programming algorithm is the second index into a subproblem 46 00:02:49,588 --> 00:02:53,124 specified the residual capacity x that we were allowed to use, 47 00:02:53,124 --> 00:02:57,996 and then, while respecting that capacity x, we asked for the largest value that we 48 00:02:57,996 --> 00:03:00,636 could obtain just using the first I, items. 49 00:03:00,636 --> 00:03:03,069 Here, we're going to turn that on its head. 50 00:03:03,069 --> 00:03:07,766 We're going to have a second parameter x, which is the value that we're striving 51 00:03:07,766 --> 00:03:12,381 for. So we want to get value x or more and we're going to try to minimize the 52 00:03:12,381 --> 00:03:17,022 total size that we need to use subjects to getting the value at least x. 53 00:03:17,022 --> 00:03:20,421 So the first index, i, arranges as usual from 0 to n, 54 00:03:20,421 --> 00:03:25,163 this corresponds to all the prefixes of items you might be interested in. 55 00:03:25,163 --> 00:03:30,340 The second parameter, x, ranges over all of the targets for total value that you 56 00:03:30,340 --> 00:03:34,018 might be interested in. Now, remember, we're assuming in this 57 00:03:34,018 --> 00:03:36,378 video, that every item value is an integer, 58 00:03:36,378 --> 00:03:40,707 so that means any subset of items will have an integral total value so we only 59 00:03:40,707 --> 00:03:45,244 need to worry about integer values of x. In addition, we never need to worry about 60 00:03:45,244 --> 00:03:48,992 trying to achieve a total value bigger than n, the number of items, 61 00:03:48,992 --> 00:03:51,787 times v max, the largest value of any item. 62 00:03:51,787 --> 00:03:56,726 In fact, you could replace this upward value n times Vmax by the sub of the vi's 63 00:03:56,726 --> 00:03:59,546 if you prefer. So if we're given choice of i and x, 64 00:03:59,546 --> 00:04:04,100 [COUGH] I'm going to use capital S sub i,x to denote the optimal value of that 65 00:04:04,100 --> 00:04:08,072 corresponding subproblem. So, that is, the minimum weight, the 66 00:04:08,072 --> 00:04:13,417 minimum total size that is sufficient to achieve a total value of x or more while 67 00:04:13,417 --> 00:04:16,701 being restricted to using just the first i items. 68 00:04:16,701 --> 00:04:21,616 If there is in fact no way to get the target value of x using just the first i 69 00:04:21,616 --> 00:04:24,728 items, that is if the sum of the values of these 70 00:04:24,728 --> 00:04:29,617 items is already less than x, we just define this to be plus infinity. 71 00:04:29,617 --> 00:04:31,889 So let's write down the natural occurrence. 72 00:04:31,889 --> 00:04:35,562 Let's assume here that i is at least 1. So, the structure here is going to be 73 00:04:35,562 --> 00:04:39,577 exactly the same as it was in our first dynamic programming algorithm for the 74 00:04:39,577 --> 00:04:42,714 knapsack problem. Specifically, let's focus on a subproblem 75 00:04:42,714 --> 00:04:46,566 with a choice of i and a choice of x. That final item i is either in this 76 00:04:46,566 --> 00:04:49,227 optimal solution for the subproblem or it's not. 77 00:04:49,227 --> 00:04:52,452 That gives us two cases and the recurrence is just going to do 78 00:04:52,452 --> 00:04:56,431 brute-force search over the two case, because we're trying to minimize the 79 00:04:56,431 --> 00:05:00,352 total size required to achieve a given value target, here, the brute-force 80 00:05:00,352 --> 00:05:03,088 search is going to take the form of a minimum. 81 00:05:03,088 --> 00:05:07,394 What are the two candidates for an optimal solution? Well, the boring one is 82 00:05:07,394 --> 00:05:11,688 when you don't even bother to use item i, then of course, you just inherit an 83 00:05:11,688 --> 00:05:15,857 optimal solution, a minimum sized solution using just the first i-1 items 84 00:05:15,857 --> 00:05:19,915 achieving the same value target x. On the other hand, suppose that in an 85 00:05:19,915 --> 00:05:23,592 optimal soluton to the subproblem, you do actually use item i. 86 00:05:23,592 --> 00:05:28,156 Well, this contributes to the size of item i to the weight of this optimal 87 00:05:28,156 --> 00:05:30,536 solution. So that's going to be w sub i. 88 00:05:30,536 --> 00:05:35,196 What can we say about the rest of the items in this optimal solution? Well, 89 00:05:35,196 --> 00:05:38,737 because the solution here achieves a value target of x, 90 00:05:38,737 --> 00:05:43,727 the items in this solution other than i, must be achieving by themselves a value 91 00:05:43,727 --> 00:05:47,368 target of x minus v sub i. And, of course, by the usual cut and 92 00:05:47,368 --> 00:05:52,546 paste argument amongst all subsets of the first i-1 items with that property, they 93 00:05:52,546 --> 00:05:55,603 better will have the minimum total weight. 94 00:05:55,603 --> 00:06:00,408 So one quick edge case in this second case, if vi is actually bigger than x and 95 00:06:00,408 --> 00:06:04,871 we have a negative index here, we just interpret this quantity as 0, 96 00:06:04,871 --> 00:06:08,612 that's because item i alone meets the value target of x. 97 00:06:08,612 --> 00:06:16,144 Let's now wrap things up with the pseudocode of the new dynamic programming 98 00:06:16,144 --> 00:06:20,156 algorithm. So A will be our usual table. 99 00:06:20,156 --> 00:06:26,245 It has two dimensions, because subproblems are indexed by two 100 00:06:26,245 --> 00:06:33,103 parameters, i, the prefix that ranges from 0 to n, and x, the value target that 101 00:06:33,103 --> 00:06:36,402 ranges from 0 to, to the maximum imaginable value, 102 00:06:36,402 --> 00:06:39,851 let's say n times Vax. So the base case is when i equal 0, that 103 00:06:39,851 --> 00:06:44,984 is, you're not allowed to use any items. In this case, we have to fill it in with 104 00:06:44,984 --> 00:06:48,599 plus infinity. Well, except if x itself is 0, then the 105 00:06:48,599 --> 00:06:51,891 answer is 0. Now, we just populate the table using the 106 00:06:51,891 --> 00:06:55,983 recurrence in a double for loop. The structure here is exactly the same as 107 00:06:55,983 --> 00:06:59,081 in our first knapsack dynamic programming algorithm. 108 00:06:59,081 --> 00:07:03,841 In the first dynamic programming solution that we developed for knapsack, we could 109 00:07:03,841 --> 00:07:08,047 return the correct answer in constant time given the filled-in table. 110 00:07:08,047 --> 00:07:12,512 That's because of one of the subproblems in that dynamic programming algorithm, 111 00:07:12,512 --> 00:07:16,657 literally was the original problem. When i=n and x is equal to the full map 112 00:07:16,657 --> 00:07:20,557 sack capacity capital W, the responsibility of that subproblem was to 113 00:07:20,557 --> 00:07:24,952 compute the max value solution subject to the capacity capital W using all of the 114 00:07:24,952 --> 00:07:27,612 items, that's literally the original problem. 115 00:07:27,612 --> 00:07:32,227 By contrast, in this dynamic programming algorithm, none of the subproblems are 116 00:07:32,227 --> 00:07:35,532 literally the original problem that we wanted to solve. 117 00:07:35,532 --> 00:07:39,472 In this new dynamic programming algorithm, however, none of the 118 00:07:39,472 --> 00:07:44,087 subproblems correspond directly to the original problem that we wanted to solve, 119 00:07:44,087 --> 00:07:47,922 none of them tell you the maximum value of a feasible solution. 120 00:07:47,922 --> 00:07:51,342 To see why, let's inspect the largest batch of subproblems. 121 00:07:51,342 --> 00:07:56,086 When i is equal to n and you can use whatever subset of items that you want. 122 00:07:56,086 --> 00:08:00,108 The second index of one of these problems is a target value x. 123 00:08:00,108 --> 00:08:04,900 That might be a number, like say 17,231. So after you've run this algorithm and 124 00:08:04,900 --> 00:08:10,077 you've filled in the whole table, what do you know in this subproblem? You will 125 00:08:10,077 --> 00:08:14,675 have computed the smallest total size of a bunch, bunch of items that has total 126 00:08:14,675 --> 00:08:18,929 value at least 17,231. So that's going to be some number, maybe 127 00:08:18,929 --> 00:08:22,146 it's 11,298. But, what if your knapsack capacity's 128 00:08:22,146 --> 00:08:27,269 only 10,000? Then, this is not a feasible solution, so it doesn't do you any good. 129 00:08:27,269 --> 00:08:30,956 Okay, well that's not quite true, it does do you some good. 130 00:08:30,956 --> 00:08:36,456 If you know that every single solution that gets value 17,231 or more has size 131 00:08:36,456 --> 00:08:42,944 bigger than your knapsack capacity, well then, you know that the optimal solution 132 00:08:42,944 --> 00:08:48,117 has to be less than 17,231. There is no feasible way to get a total 133 00:08:48,117 --> 00:08:51,492 value that high. Now, you realize that if you knew this 134 00:08:51,492 --> 00:08:55,778 information for every single target value x, and you do once you've filled in the 135 00:08:55,778 --> 00:08:59,442 entire table, that's sufficient to figure out the optimal solution, 136 00:08:59,442 --> 00:09:02,244 figure out the biggest value of a feasible solution. 137 00:09:02,244 --> 00:09:05,842 All you need to do is scan through this final batch of subproblems. 138 00:09:05,842 --> 00:09:10,070 You begin with the largest conceivable target value x, and then you get less and 139 00:09:10,070 --> 00:09:13,137 less ambitious as you keep finding infeasible solutions. 140 00:09:13,137 --> 00:09:17,872 So you scan from high target values to low target values and you're looking for 141 00:09:17,872 --> 00:09:22,682 the first that is the largest target value x, such that there exists a subset 142 00:09:22,682 --> 00:09:27,442 of items meeting that target value whose total size is at most your knapsack 143 00:09:27,442 --> 00:09:30,727 capacity. That is going to be the optimal solution, 144 00:09:30,727 --> 00:09:35,824 that first target value x, that can be physically met given your knapsack is at 145 00:09:35,824 --> 00:09:38,311 capacity. Analyzing the running time is 146 00:09:38,311 --> 00:09:43,285 straightforward, so how many subproblems are there, or equivalently, how many 147 00:09:43,285 --> 00:09:47,833 iterations of a double for loop do we have? Well, it's n outer iterations, then 148 00:09:47,833 --> 00:09:51,193 n times Vmax inner iterations, so that's n^2 Vmax overall. 149 00:09:51,193 --> 00:09:55,988 Our brute-force searches over only two candidates, so we only do constant work 150 00:09:55,988 --> 00:10:01,597 per iteration of the double for loop. In the final line, we do a scan through 151 00:10:01,597 --> 00:10:06,664 the largest batch of subproblems, so that's O of n times Vmax. 152 00:10:06,664 --> 00:10:13,796 So the running time is dominated just by the constant work per subproblem, and so, 153 00:10:13,796 --> 00:10:18,851 our overall running time is O of n squared times Vmax, as promised.