1 00:00:00,012 --> 00:00:04,212 Here is the analysis. Step 1 of our algorithm sorts the items 2 00:00:04,212 --> 00:00:07,062 in non-increasing order of value per size. 3 00:00:07,062 --> 00:00:12,562 Step 2 of our greedy heuristic picks the maximum prefix of items who fit into the 4 00:00:12,562 --> 00:00:15,912 kapsack. Let's say that maximum prefix comprises 5 00:00:15,912 --> 00:00:19,262 the first k items. The k plus first item is the first one 6 00:00:19,262 --> 00:00:23,662 that does not fit in the knapsack given the previous selections. 7 00:00:23,662 --> 00:00:27,197 So here is the cool part. Remember we added this step 3 to our 8 00:00:27,197 --> 00:00:31,632 greedy algorithm so that it considered two different candidate solutions and 9 00:00:31,632 --> 00:00:35,662 picked the better of them. The first candidate solution was just the 10 00:00:35,662 --> 00:00:39,107 outcome of step two. So therefore, we can definitely say the 11 00:00:39,107 --> 00:00:43,667 output of our three-step greedy algorithm is at least as good as the total value of 12 00:00:43,667 --> 00:00:48,170 these first k items chosen in step 2. The second candidate considered in step 13 00:00:48,170 --> 00:00:52,116 three of our greedy algorithm is the maximum value item all by itself. 14 00:00:52,116 --> 00:00:56,732 So one such item all by itself that would be considered by the greedy algorithm was 15 00:00:56,732 --> 00:01:00,257 this k plus first item, the item on which we got stuck in step 2. 16 00:01:00,257 --> 00:01:04,761 So therefore, whatever the value of our output of our three-step greedy algorithm 17 00:01:04,761 --> 00:01:09,681 is, it's definitely at least the value of item k plus one in isolation. 18 00:01:09,681 --> 00:01:12,509 So let's add these two inequalities together. 19 00:01:12,509 --> 00:01:17,414 On the left hand side, what do we get? We get double the value of our three-step 20 00:01:17,414 --> 00:01:21,243 greedy algorithm. On the right hand side, we get the first 21 00:01:21,243 --> 00:01:25,783 k items and also the item k plus one. That is, we get the total value of the 22 00:01:25,783 --> 00:01:29,212 first k plus one items. So now, we're in a position to connect 23 00:01:29,212 --> 00:01:32,227 this inequality to the fractional greedy solution. 24 00:01:32,227 --> 00:01:36,235 So remember, what is the greedy fractional solution, well, it packs the 25 00:01:36,235 --> 00:01:40,480 first k items, and then the item in which it gets stuck, namely, item k plus one, 26 00:01:40,480 --> 00:01:44,650 it fills up the knapsack fully with a suitable fraction of item k plus one, and 27 00:01:44,650 --> 00:01:48,943 the value that the algorithm gets is prorated according to the fraction that 28 00:01:48,943 --> 00:01:51,982 it fits in. But what we have here on the right-hand 29 00:01:51,982 --> 00:01:55,857 side in green is even better than the greedy fractional solution. 30 00:01:55,857 --> 00:01:58,342 This is the first k plus one items, 100% of them. 31 00:01:58,342 --> 00:02:02,917 The greedy fractional solution has the first k times, 100%, and some fraction of 32 00:02:02,917 --> 00:02:05,702 item k plus one. So, the value of the fractional greedy 33 00:02:05,702 --> 00:02:08,177 solution is even worse than the right-hand side here. 34 00:02:08,177 --> 00:02:12,671 And the whole point of our thought experiment was to argue that the greedy 35 00:02:12,671 --> 00:02:15,517 fractional solution is even better than optimal, 36 00:02:15,517 --> 00:02:19,433 that has value at least as much as every feasible knapsack solution. 37 00:02:19,433 --> 00:02:23,248 Divide both sides by two and you'll see that we proved the theorem. 38 00:02:23,248 --> 00:02:27,441 The output of the three-step greedy algorithm is guaranteed to be at least 39 00:02:27,441 --> 00:02:31,907 50% that of an optimal solution. That's cool. 40 00:02:31,907 --> 00:02:36,718 We have a non-trivial worst-case performance guarantee for our simple and 41 00:02:36,718 --> 00:02:41,288 blazingly fast three-step greedy heuristic for the knapsack problem. 42 00:02:41,288 --> 00:02:46,490 But, maybe you were hoping to do a little bit better than within 50% of an optimal 43 00:02:46,490 --> 00:02:49,645 solution. Maybe in the back of your mind, you were 44 00:02:49,645 --> 00:02:53,897 really rooting for something like 90% or even better. 45 00:02:53,897 --> 00:02:57,585 There's a few roads we can take that might lead to better performance 46 00:02:57,585 --> 00:03:00,254 guarantees. The best case scenario is if we could 47 00:03:00,254 --> 00:03:04,069 just sharpen our analysis of this greedy heuristic, that is, don't change the 48 00:03:04,069 --> 00:03:07,879 algorithm, don't make any new assumptions, just have a better proof and 49 00:03:07,879 --> 00:03:10,506 maybe the 50% one prove to some other number. 50 00:03:10,506 --> 00:03:13,899 That would be the best case. The second thing we could do is if we 51 00:03:13,899 --> 00:03:17,659 really love this algorithm, for example, it being so fast, we could try to 52 00:03:17,659 --> 00:03:21,518 identify additional assumptions on instances that allow us to prove better 53 00:03:21,518 --> 00:03:25,524 performance guarantees than this 50% guarantee which holds for all instances. 54 00:03:25,524 --> 00:03:29,044 The third thing we're going to try to do is just come up with a better algorithm, 55 00:03:29,044 --> 00:03:32,720 that in fact, has better performance guarantees, so what I want to show you on 56 00:03:32,720 --> 00:03:36,475 this slide is that the first case, the best case scenario of just sharpening the 57 00:03:36,475 --> 00:03:39,947 analysis of this greedy algorithm, is not feasible The analysis cannot be 58 00:03:39,947 --> 00:03:42,750 sharpened. [INAUDIBLE] the 50% can be realized for 59 00:03:42,750 --> 00:03:45,562 certain instances. Then, we'll tackle the other 2 60 00:03:45,562 --> 00:03:49,730 approaches, and we will in fact get better performance guarantees, either 61 00:03:49,730 --> 00:03:54,020 under extra assumptions about the instances, or alternatively, using a more 62 00:03:54,020 --> 00:03:57,503 sophisticated algorithm. So here's an example that shows that 63 00:03:57,503 --> 00:04:01,915 there really are knapsack instances on which this 3 step greedy algorithm might 64 00:04:01,915 --> 00:04:05,740 obtain merely 50% of the value of an optimal solution. 65 00:04:05,740 --> 00:04:10,612 In this example, we're going to set the knapsack capacity, capital W, to be 1000. 66 00:04:10,612 --> 00:04:15,040 There's going to be three items. Item one will have a value of 502 and a 67 00:04:15,040 --> 00:04:17,995 size of 501. The second and third items are going to 68 00:04:17,995 --> 00:04:20,724 be identical, both are going to have value 500, 69 00:04:20,724 --> 00:04:25,638 both are going to have size 500. So what is the greedy algorithm going to 70 00:04:25,638 --> 00:04:30,048 do? Well, in step 1, it sorts the items according to the value for, per size 71 00:04:30,048 --> 00:04:34,556 ratio, and you'll notice that the first item has a ratio slightly bigger than 72 00:04:34,556 --> 00:04:36,670 one, so it's going to be considered first 73 00:04:36,670 --> 00:04:39,990 before items two or three. In step 2 of the greedy algorithm, it 74 00:04:39,990 --> 00:04:43,945 packs item one in the knapsack, that leaves only 499 units of residual 75 00:04:43,945 --> 00:04:46,910 capacity, not enough room for either item two or item three. 76 00:04:46,910 --> 00:04:51,598 So step 2 of the greedy algorithm just outputs the first item by itself. 77 00:04:51,598 --> 00:04:55,817 Step three of the algorithm considers the max value item in isolation, 78 00:04:55,817 --> 00:04:59,417 that again is item one. For this reason, the greedy algorithm 79 00:04:59,417 --> 00:05:03,356 will output the solution which is just item one for a value of 502. 80 00:05:03,356 --> 00:05:07,798 And I'm sure you've noticed that there's a better solution if you [INAUDIBLE] item 81 00:05:07,798 --> 00:05:10,399 one and instead choose items two and three. 82 00:05:10,399 --> 00:05:13,552 They both fit in a knapsack, they fill it fully, 83 00:05:13,552 --> 00:05:17,304 for a combined value of 1000, which is essentially twice as big as the value of 84 00:05:17,304 --> 00:05:20,930 the greedy solution. So what does this example teach us? Well, 85 00:05:20,930 --> 00:05:25,148 it argues that if we want perfomance guarantees better than 50%, we have one 86 00:05:25,148 --> 00:05:27,779 of two options. First, if we really want to keep 87 00:05:27,779 --> 00:05:31,859 analyzing our three-step greedy algorithm, we're going to have to make 88 00:05:31,859 --> 00:05:36,348 extra assumptions about the instances. In order to prove performance guarantee 89 00:05:36,348 --> 00:05:39,348 is better than 50%. There are some instances out there where 90 00:05:39,348 --> 00:05:43,229 the performance is bad as 50% of optimal. The second approach we can take is to 91 00:05:43,229 --> 00:05:46,931 look at better algorithms which might have better guarantees on worst case 92 00:05:46,931 --> 00:05:49,337 instances. So in the next slide, I'll show you a 93 00:05:49,337 --> 00:05:53,309 refined analysis of the three-step greedy algorithm, conditions under which it 94 00:05:53,309 --> 00:05:56,766 performs much better than 50%. In a separate sequence of videos, we'll 95 00:05:56,766 --> 00:05:59,362 develop a dynamic programming base heuristic, 96 00:05:59,362 --> 00:06:02,987 which is guaranteed to have much better performance, on all instances. 97 00:06:02,987 --> 00:06:07,132 So there's a few different assumptions you can impose on knapsack instances that 98 00:06:07,132 --> 00:06:10,352 will enable you to prove better performace gurantees for greedy 99 00:06:10,352 --> 00:06:12,637 heuristics. Let me just show you one, that's both 100 00:06:12,637 --> 00:06:18,484 simple and also it's a condition that's met in many knapsack instances that arise 101 00:06:18,484 --> 00:06:21,691 in practice. So let's focus on knapsack instances 102 00:06:21,691 --> 00:06:25,967 where no item is especially big compared to the knapsack capacity. 103 00:06:25,967 --> 00:06:30,794 For concreteness, let's say that the size of every item is at most 10% of the 104 00:06:30,794 --> 00:06:35,002 knapsack capacity, capital W. As you'll see, there's nothing special 105 00:06:35,002 --> 00:06:37,667 about the number 10%. I'm just using that to keep the 106 00:06:37,667 --> 00:06:40,612 discussion concrete. So I claim that without changing our 107 00:06:40,612 --> 00:06:44,182 algorithm at all using exactly the same three-step greedy algorithm. 108 00:06:44,182 --> 00:06:48,187 Actually, here, we can even get away with the original two-step greedy algorithm. 109 00:06:48,187 --> 00:06:51,837 We can prove performance guarantee is much better than 50% as long as the 110 00:06:51,837 --> 00:06:56,136 instance satisfies this property. Why is this assumption useful? Well, lets 111 00:06:56,136 --> 00:06:58,557 think about step 2 of our greedy algorithm. 112 00:06:58,557 --> 00:07:02,445 So we've sorted the items in non-increasing order, of value per size. 113 00:07:02,445 --> 00:07:06,616 We're packing the items one at time. Now, at some point we get stuck, there's 114 00:07:06,616 --> 00:07:10,898 some item k plus one where if we tried to put it in the knapsack, we would overflow 115 00:07:10,898 --> 00:07:15,063 the capacity. Well, by assumption, this k plus one'th 116 00:07:15,063 --> 00:07:19,625 item, its size is at most 10% of the original knapsack capacity. 117 00:07:19,625 --> 00:07:24,833 So if sticking in item k plus one would cause it to overflow, that means there's 118 00:07:24,833 --> 00:07:29,061 less than 10% of the knapsack capacity currently available. 119 00:07:29,061 --> 00:07:34,411 That is, the knapsack is currently 90% full, or more with the items that we've 120 00:07:34,411 --> 00:07:37,808 already packed in so far. So this is sounding pretty good. 121 00:07:37,808 --> 00:07:42,140 Our greedy criterion ensures that whatever fraction of knapsack we wind up 122 00:07:42,140 --> 00:07:46,700 using, we're using in the most valuable possible way and we just noticed that 123 00:07:46,700 --> 00:07:51,193 this assumption on the instance, implies that we use almost all of the knapsack. 124 00:07:51,193 --> 00:07:54,767 To make this intutition precise, we comapre the output of our greedy 125 00:07:54,767 --> 00:07:58,455 algorith, even just step two, to our favorite hypothetical benchmark, 126 00:07:58,455 --> 00:08:02,713 mainly, the greedy fractional solution. What's the difference between these two 127 00:08:02,713 --> 00:08:06,989 solutions? Well, they're almost the same, the only difference is that the greedy 128 00:08:06,989 --> 00:08:11,012 fractional solution gets to pack that last bit of the knapsack, which we know 129 00:08:11,012 --> 00:08:14,253 is at least 10%, with a suitable fraction of item k plus one. 130 00:08:14,253 --> 00:08:19,393 So, the value that we're missing in our solution is that final 10% at most of the 131 00:08:19,393 --> 00:08:23,462 greedy fractional solution. Now in the greedy fractional solution, 132 00:08:23,462 --> 00:08:28,108 the items are packed in decreasing order of bag per buck and a value per size. 133 00:08:28,108 --> 00:08:32,185 So, this last 10% of the greedy fractional solution is also the least 134 00:08:32,185 --> 00:08:35,743 important. It's making the least valuable use of capacity. 135 00:08:35,743 --> 00:08:40,772 So this final at most 10% of the greedy fractional solution can account for, at 136 00:08:40,772 --> 00:08:45,051 most, 10% of its overall value. So whatever we're missing in our solution 137 00:08:45,051 --> 00:08:49,483 for our greedy algorithm, it's at most 10% of the overall value of the greedy 138 00:08:49,483 --> 00:08:52,983 fractional solution. That is, we're getting at least 90% of 139 00:08:52,983 --> 00:08:57,595 the value of the greedy fractional solution. And we know from our thought 140 00:08:57,595 --> 00:09:01,538 experiment that the greedy fractional solution is even better than optimal, is 141 00:09:01,538 --> 00:09:05,477 at least as good as any feasible knapsack solution, therefore, the output of our 142 00:09:05,477 --> 00:09:09,221 greedy algorithm, even if we don't use the third step, is 90% as good as the 143 00:09:09,221 --> 00:09:13,761 value of an optimal solution. All of this reasoning holds equally well 144 00:09:13,761 --> 00:09:17,385 with a number different than 10%, so for example if you only knew that 145 00:09:17,385 --> 00:09:21,123 every size was at most 20% of the knapsack, then you'd get that the greedy 146 00:09:21,123 --> 00:09:23,343 solution would be at least 80% of optimal. 147 00:09:23,343 --> 00:09:27,156 On the other hand if you knew that every item had size at the most 1% of the 148 00:09:27,156 --> 00:09:31,161 knapsack capacity then just the two-step greedy algorithm would get you within 149 00:09:31,161 --> 00:09:31,345 99%.