1 00:00:00,000 --> 00:00:04,057 So in these next two videos we'll give our second application of the dynamic 2 00:00:04,057 --> 00:00:07,271 programming paradigm. It's to a quite well known problem, it's 3 00:00:07,271 --> 00:00:10,959 called the knapsack problem. And we'll show how following the exact 4 00:00:10,959 --> 00:00:15,175 same recipe that we used for computing independent sets in path graphs leads to 5 00:00:15,175 --> 00:00:19,400 the very well known dynamic programming solution to this problem. 6 00:00:19,400 --> 00:00:24,385 So lets jump right into the definition of a knapsack problem. 7 00:00:24,385 --> 00:00:30,351 So the input is given by N items. So each item comes with a value V sub I, 8 00:00:30,351 --> 00:00:35,200 the more the better for us, and also a size W sub I. 9 00:00:35,200 --> 00:00:38,798 We're going to assume that both of these are non negative. 10 00:00:38,798 --> 00:00:43,265 For the sizes we're going to make an additional assumption that they're 11 00:00:43,265 --> 00:00:46,367 integral. In addition to these two N numbers we're 12 00:00:46,367 --> 00:00:50,896 given one more which is called the capacity capital W, again we'll assume 13 00:00:50,896 --> 00:00:57,646 this is both non negative and an integer. The role of this integrality assumptions 14 00:00:57,646 --> 00:01:03,083 will become clear in due time. The knapsack problem, the responsibility 15 00:01:03,083 --> 00:01:07,200 of an algorithm is, is to select a subset of the items. 16 00:01:08,820 --> 00:01:11,882 What do we want? What we want is much value as possible. 17 00:01:11,882 --> 00:01:15,780 So we want maximize the sum of the values of the items that we select. 18 00:01:16,860 --> 00:01:21,905 So, what prevents us from picking everything, well, the sum of the sizes of 19 00:01:21,905 --> 00:01:27,020 the items that we pick have to total to at most the capacity of capital W. 20 00:01:29,020 --> 00:01:33,039 Now, I could tell you some cheesy story about a burglar breaking into a house 21 00:01:33,039 --> 00:01:36,902 with a knapsack, with a certain size capital W, and wants to make away with 22 00:01:36,902 --> 00:01:40,399 sort of the best loot possible. But, that would really be doing a 23 00:01:40,399 --> 00:01:43,688 disservice to this problem, which is actually quite fundamental. 24 00:01:43,688 --> 00:01:47,499 this problem comes up quite a bit, especially as a sub-routine in some 25 00:01:47,499 --> 00:01:50,213 larger task. Really, just whenever you have sort of a 26 00:01:50,213 --> 00:01:54,233 budget of a resource that you can use, and you want to use it in the smartest 27 00:01:54,233 --> 00:01:56,895 way possible, that's basically the knapsack problem. 28 00:01:56,895 --> 00:02:00,080 So, you can imagine how it would come up in a lot of contexts. 29 00:02:01,640 --> 00:02:05,420 So let's now execute a recipe for developing a dynamic programming 30 00:02:05,420 --> 00:02:07,789 algorithm. Remember the key to any dynamic 31 00:02:07,789 --> 00:02:12,077 programming solution is to figure out what is the right set of sub-problems. 32 00:02:12,077 --> 00:02:16,196 And we're going to arrive at the sub problems for the knapsack problem, just 33 00:02:16,196 --> 00:02:20,653 as we did for max weight independence sets by doing a thought experiment about 34 00:02:20,653 --> 00:02:23,982 the optimal solution, and understanding what it has to look 35 00:02:23,982 --> 00:02:26,860 like, in terms of solutions to smaller sub problems. 36 00:02:28,620 --> 00:02:32,521 The bottom line of this thought experiment, the deliverable, will be a 37 00:02:32,521 --> 00:02:35,517 recurrence. It'll be a formula which tells us how the 38 00:02:35,517 --> 00:02:40,380 optimal value of one subproblem depends on the value of smaller sub problems. 39 00:02:42,720 --> 00:02:48,513 So to begin the thought experiment, fix an instance of knapsack, and let capital 40 00:02:48,513 --> 00:02:52,180 S denote an optimal solution, a max value solution. 41 00:02:55,520 --> 00:02:59,223 We began our previous thought experiment with a content-free statement that the 42 00:02:59,223 --> 00:03:03,095 final vertex of a path is either in the optimal solution or it's not. 43 00:03:03,095 --> 00:03:08,650 Now, what is the analog of the rightmost vertex in this knapsack problem? 44 00:03:08,650 --> 00:03:12,961 Well, unlike a path graph, there's no intrinsic sequentiality to the items that 45 00:03:12,961 --> 00:03:15,393 we're given. They're just an unordered set. 46 00:03:15,393 --> 00:03:19,594 But it's actually useful to think of the items as sort of literally ordered, 47 00:03:19,594 --> 00:03:23,519 one, two, three, all the way up to n. And then, the analog of the rightmost 48 00:03:23,519 --> 00:03:27,167 vertex is just the final item. So the content free statement we're 49 00:03:27,167 --> 00:03:30,925 going to work with here is either the last item belongs to the optimal 50 00:03:30,925 --> 00:03:35,596 solution, capital S, or it doesn't. We'll again start with an easy case when 51 00:03:35,596 --> 00:03:40,280 it doesn't. What we argued in the path graph problem 52 00:03:40,280 --> 00:03:44,507 was that the max weight independent set in the analogous case one has to be 53 00:03:44,507 --> 00:03:47,900 optimal if we just delete that rightmost edge from the graph. 54 00:03:47,900 --> 00:03:52,183 So here the analogous claim is that this set s should still be optimal if we 55 00:03:52,183 --> 00:03:55,020 delete the final item n from the knapsack instance. 56 00:03:56,920 --> 00:04:00,482 The argument is the exact same near trivial contradiction. 57 00:04:00,482 --> 00:04:05,396 If there was a different solution as star amongst the first and minus one items, 58 00:04:05,396 --> 00:04:10,371 with weight even bigger than that of S. What we could regard this equally well as 59 00:04:10,371 --> 00:04:14,487 a superior knapsack feasible solution, back with all N of the items, 60 00:04:14,487 --> 00:04:17,620 but that contradicts the purported optimality of S. 61 00:04:19,440 --> 00:04:26,360 So let's go through the slightly trickier case two together, using a quiz. 62 00:04:26,360 --> 00:04:30,494 So suppose the optimal knapsack solution does, in fact, make use of this final 63 00:04:30,494 --> 00:04:32,856 item, N. Now we want to talk about this being 64 00:04:32,856 --> 00:04:36,346 somehow composed of an optimal solution to a smaller sub problem. 65 00:04:36,346 --> 00:04:39,889 So if we're going to delete the last item, than we can't talk about S, 66 00:04:39,889 --> 00:04:44,238 right?'Cause that's has the last item. So we need to remove the last item from S 67 00:04:44,238 --> 00:04:47,889 before we talk about its optimality. That's analogous to back in the 68 00:04:47,889 --> 00:04:51,915 independence set problem, we removed the right most vertex from the optimal 69 00:04:51,915 --> 00:04:55,673 solution before talking about its optimality in a smaller sub problem. 70 00:04:55,673 --> 00:04:59,217 So the question then is, if we take capital S, the optimal solution. 71 00:04:59,217 --> 00:05:02,751 We remove item N. In what sense is the residual solution 72 00:05:02,751 --> 00:05:07,928 optimal, put differently for what kind of knapsack instance, if any is that an 73 00:05:07,928 --> 00:05:19,770 optimal solution for? Alright so the correct answer is C. 74 00:05:19,770 --> 00:05:24,072 So back in the independent set problem what we said is if we remove the right 75 00:05:24,072 --> 00:05:28,596 nosed vertex then what's left is optimal for the residual independent set problem 76 00:05:28,596 --> 00:05:31,354 again by plucking off the right most two vertices. 77 00:05:31,354 --> 00:05:35,767 Here when we remove the inth item from the optimal solution S, the claim is what 78 00:05:35,767 --> 00:05:38,140 we get is optimal for the nap sack problem. 79 00:05:38,140 --> 00:05:44,805 Involving the first n minus one items and a residual knapsack capacity of capital W 80 00:05:44,805 --> 00:05:49,407 minus little w sub n. So the overall, the original knapsack 81 00:05:49,407 --> 00:05:53,930 capacity with space reserved or deleted for the nth item. 82 00:05:53,930 --> 00:05:57,885 So before I give you a quick proof, let me just briefly explain why a couple of 83 00:05:57,885 --> 00:06:01,741 the other answers are not correct. so first of all, answer B, I hope you can 84 00:06:01,741 --> 00:06:05,497 rule out quickly it just doesn't type check, so capital W that's the knapsack 85 00:06:05,497 --> 00:06:09,052 capacity, so that's in units of size. Eh, little v sevin, that's the items 86 00:06:09,052 --> 00:06:12,757 value, that's in dollars, so it doesn't really make sense to talk about the 87 00:06:12,757 --> 00:06:16,112 difference between those two. They're just, it's apples and oranges. 88 00:06:16,112 --> 00:06:19,818 part D, if you're worried about feasibility at any point, so if you take 89 00:06:19,818 --> 00:06:23,573 capital S and remove the Nth item, what have you done, you've taken a set of 90 00:06:23,573 --> 00:06:27,529 items who's, you know, size was at most capital W by feasibility of S and you've 91 00:06:27,529 --> 00:06:32,652 removed an item with size Wn from it, so what remains has total size of most 92 00:06:32,652 --> 00:06:36,480 capital W minus little w sub n. So, s and minus n would indeed be 93 00:06:36,480 --> 00:06:41,649 feasible for this reduced this residual knapsack capacity capital W minus little 94 00:06:41,649 --> 00:06:44,054 w sub n. A much more subtle point is part A. 95 00:06:44,054 --> 00:06:47,898 That's a very natural one to guess. that turns out to not be correct. 96 00:06:47,898 --> 00:06:51,959 So it turns out there might be smarter ways of using the first N minus one 97 00:06:51,959 --> 00:06:53,529 items. Then S minus item N, 98 00:06:53,529 --> 00:06:57,048 if you had a full knapsack of capacity of capital W to work with. 99 00:06:57,048 --> 00:07:00,892 So that's a subtler point, and it's a good exercise for you to actually 100 00:07:00,892 --> 00:07:04,736 convince yourself that A is wrong, and that there's no reason that when you 101 00:07:04,736 --> 00:07:08,526 take out item N from S, and you still keep using the original knapsack 102 00:07:08,526 --> 00:07:11,882 capacity, that this has to be optimal. That's not going to be true. 103 00:07:11,882 --> 00:07:15,781 Alright, so why is capital C correct? Well this is going to have the same 104 00:07:15,781 --> 00:07:20,320 spirit of case two, of our weighted independence set thought experiment. 105 00:07:22,940 --> 00:07:26,428 So let me give you the proof. The proof is going to be the usual 106 00:07:26,428 --> 00:07:31,002 contradiction analogous to case two of our argument in the weighted independent 107 00:07:31,002 --> 00:07:33,861 set problem. So suppose there was something better 108 00:07:33,861 --> 00:07:38,150 than S with N removed with the residual capacity W minus w sub n, call that 109 00:07:38,150 --> 00:07:42,439 supposedly better solution S star. So what can we do to get a contradiction? 110 00:07:42,439 --> 00:07:46,441 Well let's just take S star, which involves only the first n minus one 111 00:07:46,441 --> 00:07:48,100 items. Let's add item n to it. 112 00:07:48,100 --> 00:07:53,562 Since S star has total size of most capital W minus w should n and item N has 113 00:07:53,562 --> 00:07:57,154 size W sub N. The results has total size at most 114 00:07:57,154 --> 00:08:00,895 capital W. So the feasible solution to take S star 115 00:08:00,895 --> 00:08:06,449 and extend it by item number N. And if S star had more value than S with 116 00:08:06,449 --> 00:08:10,458 n removed, then S star with N included has more value than S. 117 00:08:10,458 --> 00:08:13,173 So, for example, if S had total value, 1,100, 118 00:08:13,173 --> 00:08:15,888 100 of which was coming from the nth item. 119 00:08:15,888 --> 00:08:18,538 Then S would end removed, had value 1,000. 120 00:08:18,538 --> 00:08:21,253 If S star was better, it had a value, 1,050. 121 00:08:21,253 --> 00:08:24,808 Well, then we just put N back in. And it has value 1,150, 122 00:08:24,808 --> 00:08:29,268 which contradicts the purported optimality of S star, which had total 123 00:08:29,268 --> 00:08:32,500 value, merely 1,100. So, notice what's going on here. 124 00:08:32,500 --> 00:08:35,861 So, in taking away W sub N for the knapsack capacity. 125 00:08:35,861 --> 00:08:41,097 Before we look at, the residual problem. We're, in effect, reserving a buffer for 126 00:08:41,097 --> 00:08:44,122 item N, if we ever need it, that's how we know 127 00:08:44,122 --> 00:08:47,363 we're feasible when we stick N back into this solution as star. 128 00:08:47,363 --> 00:08:51,118 That's analogous to deleting the penultimate vertex of the path, again as 129 00:08:51,118 --> 00:08:54,925 a buffer to ensure feasibility when we include the Nth vertex back in the 130 00:08:54,925 --> 00:08:58,995 independent set problem. So what was the point of this whole 131 00:08:58,995 --> 00:09:01,288 thought experiment, which we've now completed? 132 00:09:01,288 --> 00:09:05,112 Well, again, the point was to say, the optimal solution, whatever it is, it has 133 00:09:05,112 --> 00:09:08,578 to have only one of two forms. We've narrowed the list of candidates 134 00:09:08,578 --> 00:09:11,738 down to two possibilities. Either you just inherit the optimal 135 00:09:11,738 --> 00:09:15,663 solution with one less item in the same knapsack capacity, or you look at the 136 00:09:15,663 --> 00:09:19,843 optimal solution with one less item and less knapsack capacity by W sub N and you 137 00:09:19,843 --> 00:09:23,055 extend that by item N. But those are the only two possibilities. 138 00:09:23,055 --> 00:09:26,113 So, again, if we only knew which of these two cases were true, 139 00:09:26,113 --> 00:09:30,242 if we only knew whether or not item N was in the optimal solution or not, in some 140 00:09:30,242 --> 00:09:33,737 sense we could recursively compute the rest of the solution. 141 00:09:33,737 --> 00:09:38,032 So just as that was enough to get us going with a dynamic programming 142 00:09:38,032 --> 00:09:42,500 algorithm for weighted independent sets, so it goes with knapsack, as I'll show 143 00:09:42,500 --> 00:09:43,760 you on the next video.