1 00:00:01,200 --> 00:00:04,170 So in these next two videos we'll give our second application of the dynamic 2 00:00:04,170 --> 00:00:05,160 programming paradigm. 3 00:00:05,160 --> 00:00:08,880 It's to a quite well known problem, it's called the knapsack problem. 4 00:00:08,880 --> 00:00:11,660 And we'll show how following the exact same recipe that we used for 5 00:00:11,660 --> 00:00:14,210 computing independent sets in path graphs 6 00:00:14,210 --> 00:00:17,729 leads to the very well known dynamic programming solution to this problem. 7 00:00:19,840 --> 00:00:23,070 So let's jump right into the definition of a knapsack problem. 8 00:00:23,070 --> 00:00:25,950 So the input is given by n items. 9 00:00:27,560 --> 00:00:29,610 So each item comes with a value, 10 00:00:29,610 --> 00:00:33,570 V sub i, the more the better for us, and also a size, W sub i. 11 00:00:33,570 --> 00:00:38,530 We're going to assume that both of these are non-negative, for 12 00:00:38,530 --> 00:00:41,980 the sizes we're going to make additional assumption that their integral. 13 00:00:41,980 --> 00:00:44,263 In addition to these two n numbers, 14 00:00:44,263 --> 00:00:48,296 we are given one more which is called the capacity, capital W, 15 00:00:48,296 --> 00:00:52,119 again we'll assume this is non-negative and an integer. 16 00:00:55,088 --> 00:00:58,740 The role of these integrality assumptions will become clear in due time. 17 00:01:00,270 --> 00:01:01,200 The knapsack problem, 18 00:01:01,200 --> 00:01:06,490 the responsibility of an algorithm is to select a subset of the items. 19 00:01:08,996 --> 00:01:09,820 What do we want? 20 00:01:09,820 --> 00:01:11,453 We want as much value as possible, so 21 00:01:11,453 --> 00:01:14,470 we want to maximize the sum of the values of the items that we select. 22 00:01:17,071 --> 00:01:19,308 So what prevents us from picking everything? 23 00:01:19,308 --> 00:01:23,868 Well, the sum of the sizes of the items that we pick have to total to at 24 00:01:23,868 --> 00:01:25,873 most the capacity capital W. 25 00:01:29,554 --> 00:01:34,053 Now I could tell you some cheesy story about a burglar breaking into a house 26 00:01:34,053 --> 00:01:36,954 with a knapsack with a certain size capital W and 27 00:01:36,954 --> 00:01:40,620 wants to make away with sort of the best loot possible. 28 00:01:40,620 --> 00:01:43,530 But that would really be doing a disservice to this problem which is 29 00:01:43,530 --> 00:01:45,070 actually quite fundamental. 30 00:01:45,070 --> 00:01:46,240 This problem comes up quite a bit, 31 00:01:46,240 --> 00:01:48,620 especially as a subroutine in some larger task. 32 00:01:49,710 --> 00:01:53,060 Really, just whenever you have sort of a budget of a resource that you can use, and 33 00:01:53,060 --> 00:01:55,010 you want to use it in the smartest way possible, 34 00:01:55,010 --> 00:01:56,550 that's basically the knapsack problem. 35 00:01:56,550 --> 00:01:58,918 So you can imagine how it would come up in a lot of contexts. 36 00:02:01,912 --> 00:02:03,993 So let's now execute a recipe for 37 00:02:03,993 --> 00:02:06,977 developing a dynamic programming algorithm. 38 00:02:06,977 --> 00:02:10,577 Remember, the key to any dynamic programming solution is to figure out what 39 00:02:10,577 --> 00:02:12,160 is the right set of subproblems. 40 00:02:12,160 --> 00:02:14,566 And we're going to arrive at the sub problems for 41 00:02:14,566 --> 00:02:18,453 the knapsack problem just as we did for max weight independent sets by doing 42 00:02:18,453 --> 00:02:21,292 a thought experiment about the optimal solution, and 43 00:02:21,292 --> 00:02:25,757 understanding what it has to look like in terms of solutions to smaller subproblems. 44 00:02:28,844 --> 00:02:31,075 The bottom line of this thought experiment, 45 00:02:31,075 --> 00:02:33,020 the deliverable will be a recurrence. 46 00:02:33,020 --> 00:02:35,930 It will be a formula which tells us how the optimal value of 47 00:02:35,930 --> 00:02:39,400 one subproblem depends on the value of smaller subproblems. 48 00:02:42,964 --> 00:02:47,069 So to begin the thought experiment, fix an instance of knapsack and 49 00:02:47,069 --> 00:02:51,040 let capital S denote an optimal solution, a max value solution. 50 00:02:55,980 --> 00:02:59,490 We began our previous thought experiment with a content free statement that 51 00:02:59,490 --> 00:03:04,140 the final vertex of a path is either in the optimal solution or it's not. 52 00:03:04,140 --> 00:03:08,690 Now what is the analog of the right most vertex in this knapsack problem? 53 00:03:08,690 --> 00:03:10,230 Well, unlike a path graph, 54 00:03:10,230 --> 00:03:13,500 there's no intrinsic sequentiality to the items that we're given. 55 00:03:13,500 --> 00:03:15,110 They're just an unordered set. 56 00:03:15,110 --> 00:03:19,090 But it's actually useful to think of the items as sort literally ordered one, 57 00:03:19,090 --> 00:03:21,300 two, three, all the way up to n, and 58 00:03:21,300 --> 00:03:25,030 then the analog of the right most vertex is just the final item. 59 00:03:25,030 --> 00:03:28,770 So the content free statement we're going to work with here is either the last item 60 00:03:28,770 --> 00:03:31,950 belongs to the optimal solution capital S, or it doesn't. 61 00:03:33,170 --> 00:03:35,819 We'll again start with the easy case when it doesn't. 62 00:03:38,228 --> 00:03:42,245 What we argued in the path graph problem was that the max weight independent set in 63 00:03:42,245 --> 00:03:46,261 the analogous case one has to be optimal if we just delete that rightmost edge from 64 00:03:46,261 --> 00:03:46,870 the graph. 65 00:03:46,870 --> 00:03:51,700 So here, the analogous claim is that this set S should still be optimal if we delete 66 00:03:51,700 --> 00:03:54,366 the final item n from the knapsack instance. 67 00:03:57,219 --> 00:04:00,727 The argument is the exact same near trivial contradiction. 68 00:04:00,727 --> 00:04:03,249 If there was a different solution, s star, 69 00:04:03,249 --> 00:04:07,476 amongst the first n minus 1 items with a weight even bigger than that of s, 70 00:04:07,476 --> 00:04:12,047 we could regard this equally well as a superior knapsack feasible solution back 71 00:04:12,047 --> 00:04:16,504 with all n of the items, but that contradicts the purported optimality of s. 72 00:04:19,647 --> 00:04:24,630 So let's go through the slightly trickier case two together using a quiz. 73 00:04:26,800 --> 00:04:29,500 So suppose the optimal knapsack solution does in 74 00:04:29,500 --> 00:04:31,290 fact make use of this final item n. 75 00:04:32,450 --> 00:04:35,550 Now we want to talk about this being somehow composed of an optimal 76 00:04:35,550 --> 00:04:37,410 solution to a smaller subproblem. 77 00:04:37,410 --> 00:04:41,284 So if we're going to delete the last item, then we can't talk about s, 78 00:04:41,284 --> 00:04:43,150 because s has the last item, so 79 00:04:43,150 --> 00:04:47,310 we need to remove the last item from s before we talk about its optimality. 80 00:04:47,310 --> 00:04:49,830 That's analogous to back in the independent set problem, 81 00:04:49,830 --> 00:04:53,110 we removed the right most vertex from the optimal solution before talking about its 82 00:04:53,110 --> 00:04:55,310 optimality in a smaller subproblem. 83 00:04:55,310 --> 00:04:58,490 So the question then is if we take capital S, the optimal solution, 84 00:04:58,490 --> 00:05:02,980 we remove item n, in what sense is the residual solution optimal? 85 00:05:02,980 --> 00:05:07,031 Put differently, for what kind of knapsack instance, if any, 86 00:05:07,031 --> 00:05:09,139 is that an optimal solution for? 87 00:05:17,196 --> 00:05:20,068 All right, so the correct answer is C. 88 00:05:20,068 --> 00:05:23,370 So back in the independent set problem what we said is if we remove the right 89 00:05:23,370 --> 00:05:27,440 most vertex, then what's left is optimal for the residual independent 90 00:05:27,440 --> 00:05:31,410 set problem we get by plucking off the right most two vertices. 91 00:05:31,410 --> 00:05:34,240 Here when we remove the nth item from the optimal solution S, 92 00:05:34,240 --> 00:05:39,705 the claim is what we get is optimal for the knapsack problem involving the first 93 00:05:39,705 --> 00:05:44,870 n-1 items and a residual knapsack capacity of W-w sub n. 94 00:05:44,870 --> 00:05:50,102 So the original knapsack capacity with space reserved, 95 00:05:50,102 --> 00:05:52,940 or deleted, for the nth item. 96 00:05:54,490 --> 00:05:55,750 So before I give you a quick proof, 97 00:05:55,750 --> 00:05:59,730 let me just briefly explain why a couple of the other answers are not correct. 98 00:05:59,730 --> 00:06:02,170 So first of all, answer B, I hope you could rule out quickly. 99 00:06:02,170 --> 00:06:03,580 It just doesn't type check. 100 00:06:03,580 --> 00:06:07,570 So capital W, that's the knapsack capacity, so that's in units of size. 101 00:06:07,570 --> 00:06:10,140 Little v sub n, that's the item's value. 102 00:06:10,140 --> 00:06:11,210 So that's in dollars, so 103 00:06:11,210 --> 00:06:13,610 it doesn't really make sense to talk the difference between those two. 104 00:06:13,610 --> 00:06:16,280 They're just apples and oranges. 105 00:06:16,280 --> 00:06:19,070 Part D, if you were worried about feasibility at any point. 106 00:06:19,070 --> 00:06:21,920 So, if you take capital S and you remove the nth item, what have you done? 107 00:06:21,920 --> 00:06:27,370 You have taken a set of items whose size was at most capital W by feasibility of s, 108 00:06:27,370 --> 00:06:30,080 and you've removed an item with size wn from it. 109 00:06:30,080 --> 00:06:34,570 So, what remains has total size at most capital W minus little w sub n. 110 00:06:34,570 --> 00:06:37,110 So, S minus n wouldn't be feasible for 111 00:06:37,110 --> 00:06:41,200 this reduced to this residual knapsack capacity W- w sub n. 112 00:06:42,510 --> 00:06:47,180 A much more subtle point is part A, that's a very natural one to guess. 113 00:06:47,180 --> 00:06:48,820 That turns out to not be correct. 114 00:06:48,820 --> 00:06:53,695 So it turns out there might be smarter ways of using the first n-1 items, 115 00:06:53,695 --> 00:07:00,120 than S minus item n, if you had a full knapsack capacity of W to work with. 116 00:07:00,120 --> 00:07:01,390 So that's a subtler point. 117 00:07:01,390 --> 00:07:02,710 And it's a good exercise for 118 00:07:02,710 --> 00:07:05,030 you to actually convince yourself that A is wrong. 119 00:07:05,030 --> 00:07:07,930 That there's no reason that when you take out item n from S and 120 00:07:07,930 --> 00:07:11,260 you still keep using the original knapsack capacity that this has to be optimal. 121 00:07:11,260 --> 00:07:12,880 That's not going to be true. 122 00:07:12,880 --> 00:07:14,630 All right, so why is capital C correct? 123 00:07:14,630 --> 00:07:17,624 Well, this is going to have the same spirit of case two of our weighted 124 00:07:17,624 --> 00:07:19,395 independent set thought experiment. 125 00:07:23,131 --> 00:07:24,240 So let me give you the proof. 126 00:07:24,240 --> 00:07:28,130 The proof is going to be the usual contradiction analogous to case 2 of our 127 00:07:28,130 --> 00:07:31,090 argument in the weighted independent set problem. 128 00:07:31,090 --> 00:07:34,200 So suppose there was something better than S with n removed with the residual 129 00:07:34,200 --> 00:07:36,070 capacity, W- wn. 130 00:07:36,070 --> 00:07:42,100 Call that supposedly better solution S*, so what can we do to get a contradiction? 131 00:07:42,100 --> 00:07:45,990 Well, let's just take S* which involves only the first n-1 items. 132 00:07:45,990 --> 00:07:52,210 Let's add item n to it since S* has total size and most W-w sub n and 133 00:07:52,210 --> 00:07:57,380 item n has size W sub n, the result has total size of W so 134 00:07:57,380 --> 00:08:02,830 it's a feasible solution to take S* and extend it by item number n. 135 00:08:02,830 --> 00:08:06,610 And if S* had more value than S with n removed, 136 00:08:06,610 --> 00:08:11,100 then S* with n included has more value than S. 137 00:08:11,100 --> 00:08:13,958 So for example, if S had total value 1,100, 138 00:08:13,958 --> 00:08:18,941 100 of which was coming from the nth item, then S with n removed had value 1000. 139 00:08:18,941 --> 00:08:22,443 If S* was better, it had a value 1,050. 140 00:08:22,443 --> 00:08:27,334 Well then we just put n back in and it has value 1150 which contradicts the purported 141 00:08:27,334 --> 00:08:30,760 optimal value of S* which had total value merely 1100. 142 00:08:30,760 --> 00:08:32,660 So notice what's going on here. 143 00:08:32,660 --> 00:08:38,200 So in taking away W sub n for the knapsack capacity before we look at the residual 144 00:08:38,200 --> 00:08:42,600 problem we're in effect reserving a buffer for item n if we ever need it. 145 00:08:42,600 --> 00:08:46,420 That's how we know we're feasible when we stick n back into the solution S*. 146 00:08:46,420 --> 00:08:49,618 That's analogous to deleting the penultimate vertex of the path, 147 00:08:49,618 --> 00:08:53,208 again as a buffer to ensure feasibility when we include the nth vertex back in 148 00:08:53,208 --> 00:08:54,512 independent set problem. 149 00:08:57,342 --> 00:09:00,075 So what was the point of this whole thought experiment which we've now 150 00:09:00,075 --> 00:09:00,590 completed? 151 00:09:00,590 --> 00:09:03,370 Well, again the point was to say the optimal solution, 152 00:09:03,370 --> 00:09:06,130 whatever it is, it has to have only one of two forms. 153 00:09:06,130 --> 00:09:09,380 We've narrowed the list of candidates down to two possibilities. 154 00:09:09,380 --> 00:09:12,950 Either you just inherit the optimal solution with one less item in the same 155 00:09:12,950 --> 00:09:16,760 knapsack capacity, or you look at the optimal solution with one less item and 156 00:09:16,760 --> 00:09:20,240 less knapsack capacity by W sub n, and you extend that by item n. 157 00:09:20,240 --> 00:09:23,310 But those are the only two possibilities. 158 00:09:23,310 --> 00:09:26,400 So again, if we only knew which of these two cases were true. 159 00:09:26,400 --> 00:09:29,490 If we only knew whether or not item n was in the optimum solution or 160 00:09:29,490 --> 00:09:33,660 not, in some sense we could recursively compute the rest of the solution. 161 00:09:33,660 --> 00:09:37,419 So just as that was enough to get us going with a dynamic programming algorithm for 162 00:09:37,419 --> 00:09:40,017 weighted independent sets, so it goes with knapsack, 163 00:09:40,017 --> 00:09:41,694 as I'll show you on the next video.