1
00:00:00,012 --> 00:00:04,212
Here is the analysis. 
Step 1 of our algorithm sorts the items 

2
00:00:04,212 --> 00:00:07,062
in non-increasing order of value per 
size. 

3
00:00:07,062 --> 00:00:12,562
Step 2 of our greedy heuristic picks the 
maximum prefix of items who fit into the 

4
00:00:12,562 --> 00:00:15,912
kapsack. 
Let's say that maximum prefix comprises 

5
00:00:15,912 --> 00:00:19,262
the first k items. 
The k plus first item is the first one 

6
00:00:19,262 --> 00:00:23,662
that does not fit in the knapsack given 
the previous selections. 

7
00:00:23,662 --> 00:00:27,197
So here is the cool part. 
Remember we added this step 3 to our 

8
00:00:27,197 --> 00:00:31,632
greedy algorithm so that it considered 
two different candidate solutions and 

9
00:00:31,632 --> 00:00:35,662
picked the better of them. 
The first candidate solution was just the 

10
00:00:35,662 --> 00:00:39,107
outcome of step two. 
So therefore, we can definitely say the 

11
00:00:39,107 --> 00:00:43,667
output of our three-step greedy algorithm 
is at least as good as the total value of 

12
00:00:43,667 --> 00:00:48,170
these first k items chosen in step 2. 
The second candidate considered in step 

13
00:00:48,170 --> 00:00:52,116
three of our greedy algorithm is the 
maximum value item all by itself. 

14
00:00:52,116 --> 00:00:56,732
So one such item all by itself that would 
be considered by the greedy algorithm was 

15
00:00:56,732 --> 00:01:00,257
this k plus first item, the item on which 
we got stuck in step 2. 

16
00:01:00,257 --> 00:01:04,761
So therefore, whatever the value of our 
output of our three-step greedy algorithm 

17
00:01:04,761 --> 00:01:09,681
is, it's definitely at least the value of 
item k plus one in isolation. 

18
00:01:09,681 --> 00:01:12,509
So let's add these two inequalities 
together. 

19
00:01:12,509 --> 00:01:17,414
On the left hand side, what do we get? We 
get double the value of our three-step 

20
00:01:17,414 --> 00:01:21,243
greedy algorithm. 
On the right hand side, we get the first 

21
00:01:21,243 --> 00:01:25,783
k items and also the item k plus one. 
That is, we get the total value of the 

22
00:01:25,783 --> 00:01:29,212
first k plus one items. 
So now, we're in a position to connect 

23
00:01:29,212 --> 00:01:32,227
this inequality to the fractional greedy 
solution. 

24
00:01:32,227 --> 00:01:36,235
So remember, what is the greedy 
fractional solution, well, it packs the 

25
00:01:36,235 --> 00:01:40,480
first k items, and then the item in which 
it gets stuck, namely, item k plus one, 

26
00:01:40,480 --> 00:01:44,650
it fills up the knapsack fully with a 
suitable fraction of item k plus one, and 

27
00:01:44,650 --> 00:01:48,943
the value that the algorithm gets is 
prorated according to the fraction that 

28
00:01:48,943 --> 00:01:51,982
it fits in. 
But what we have here on the right-hand 

29
00:01:51,982 --> 00:01:55,857
side in green is even better than the 
greedy fractional solution. 

30
00:01:55,857 --> 00:01:58,342
This is the first k plus one items, 100% 
of them. 

31
00:01:58,342 --> 00:02:02,917
The greedy fractional solution has the 
first k times, 100%, and some fraction of 

32
00:02:02,917 --> 00:02:05,702
item k plus one. 
So, the value of the fractional greedy 

33
00:02:05,702 --> 00:02:08,177
solution is even worse than the 
right-hand side here. 

34
00:02:08,177 --> 00:02:12,671
And the whole point of our thought 
experiment was to argue that the greedy 

35
00:02:12,671 --> 00:02:15,517
fractional solution is even better than 
optimal, 

36
00:02:15,517 --> 00:02:19,433
that has value at least as much as every 
feasible knapsack solution. 

37
00:02:19,433 --> 00:02:23,248
Divide both sides by two and you'll see 
that we proved the theorem. 

38
00:02:23,248 --> 00:02:27,441
The output of the three-step greedy 
algorithm is guaranteed to be at least 

39
00:02:27,441 --> 00:02:31,907
50% that of an optimal solution. 
That's cool. 

40
00:02:31,907 --> 00:02:36,718
We have a non-trivial worst-case 
performance guarantee for our simple and 

41
00:02:36,718 --> 00:02:41,288
blazingly fast three-step greedy 
heuristic for the knapsack problem. 

42
00:02:41,288 --> 00:02:46,490
But, maybe you were hoping to do a little 
bit better than within 50% of an optimal 

43
00:02:46,490 --> 00:02:49,645
solution. 
Maybe in the back of your mind, you were 

44
00:02:49,645 --> 00:02:53,897
really rooting for something like 90% or 
even better. 

45
00:02:53,897 --> 00:02:57,585
There's a few roads we can take that 
might lead to better performance 

46
00:02:57,585 --> 00:03:00,254
guarantees. 
The best case scenario is if we could 

47
00:03:00,254 --> 00:03:04,069
just sharpen our analysis of this greedy 
heuristic, that is, don't change the 

48
00:03:04,069 --> 00:03:07,879
algorithm, don't make any new 
assumptions, just have a better proof and 

49
00:03:07,879 --> 00:03:10,506
maybe the 50% one prove to some other 
number. 

50
00:03:10,506 --> 00:03:13,899
That would be the best case. 
The second thing we could do is if we 

51
00:03:13,899 --> 00:03:17,659
really love this algorithm, for example, 
it being so fast, we could try to 

52
00:03:17,659 --> 00:03:21,518
identify additional assumptions on 
instances that allow us to prove better 

53
00:03:21,518 --> 00:03:25,524
performance guarantees than this 50% 
guarantee which holds for all instances. 

54
00:03:25,524 --> 00:03:29,044
The third thing we're going to try to do 
is just come up with a better algorithm, 

55
00:03:29,044 --> 00:03:32,720
that in fact, has better performance 
guarantees, so what I want to show you on 

56
00:03:32,720 --> 00:03:36,475
this slide is that the first case, the 
best case scenario of just sharpening the 

57
00:03:36,475 --> 00:03:39,947
analysis of this greedy algorithm, is not 
feasible The analysis cannot be 

58
00:03:39,947 --> 00:03:42,750
sharpened. 
[INAUDIBLE] the 50% can be realized for 

59
00:03:42,750 --> 00:03:45,562
certain instances. 
Then, we'll tackle the other 2 

60
00:03:45,562 --> 00:03:49,730
approaches, and we will in fact get 
better performance guarantees, either 

61
00:03:49,730 --> 00:03:54,020
under extra assumptions about the 
instances, or alternatively, using a more 

62
00:03:54,020 --> 00:03:57,503
sophisticated algorithm. 
So here's an example that shows that 

63
00:03:57,503 --> 00:04:01,915
there really are knapsack instances on 
which this 3 step greedy algorithm might 

64
00:04:01,915 --> 00:04:05,740
obtain merely 50% of the value of an 
optimal solution. 

65
00:04:05,740 --> 00:04:10,612
In this example, we're going to set the 
knapsack capacity, capital W, to be 1000. 

66
00:04:10,612 --> 00:04:15,040
There's going to be three items. 
Item one will have a value of 502 and a 

67
00:04:15,040 --> 00:04:17,995
size of 501. 
The second and third items are going to 

68
00:04:17,995 --> 00:04:20,724
be identical, 
both are going to have value 500, 

69
00:04:20,724 --> 00:04:25,638
both are going to have size 500. 
So what is the greedy algorithm going to 

70
00:04:25,638 --> 00:04:30,048
do? Well, in step 1, it sorts the items 
according to the value for, per size 

71
00:04:30,048 --> 00:04:34,556
ratio, and you'll notice that the first 
item has a ratio slightly bigger than 

72
00:04:34,556 --> 00:04:36,670
one, 
so it's going to be considered first 

73
00:04:36,670 --> 00:04:39,990
before items two or three. 
In step 2 of the greedy algorithm, it 

74
00:04:39,990 --> 00:04:43,945
packs item one in the knapsack, that 
leaves only 499 units of residual 

75
00:04:43,945 --> 00:04:46,910
capacity, not enough room for either item 
two or item three. 

76
00:04:46,910 --> 00:04:51,598
So step 2 of the greedy algorithm just 
outputs the first item by itself. 

77
00:04:51,598 --> 00:04:55,817
Step three of the algorithm considers the 
max value item in isolation, 

78
00:04:55,817 --> 00:04:59,417
that again is item one. 
For this reason, the greedy algorithm 

79
00:04:59,417 --> 00:05:03,356
will output the solution which is just 
item one for a value of 502. 

80
00:05:03,356 --> 00:05:07,798
And I'm sure you've noticed that there's 
a better solution if you [INAUDIBLE] item 

81
00:05:07,798 --> 00:05:10,399
one and instead choose items two and 
three. 

82
00:05:10,399 --> 00:05:13,552
They both fit in a knapsack, 
they fill it fully, 

83
00:05:13,552 --> 00:05:17,304
for a combined value of 1000, which is 
essentially twice as big as the value of 

84
00:05:17,304 --> 00:05:20,930
the greedy solution. 
So what does this example teach us? Well, 

85
00:05:20,930 --> 00:05:25,148
it argues that if we want perfomance 
guarantees better than 50%, we have one 

86
00:05:25,148 --> 00:05:27,779
of two options. 
First, if we really want to keep 

87
00:05:27,779 --> 00:05:31,859
analyzing our three-step greedy 
algorithm, we're going to have to make 

88
00:05:31,859 --> 00:05:36,348
extra assumptions about the instances. 
In order to prove performance guarantee 

89
00:05:36,348 --> 00:05:39,348
is better than 50%. 
There are some instances out there where 

90
00:05:39,348 --> 00:05:43,229
the performance is bad as 50% of optimal. 
The second approach we can take is to 

91
00:05:43,229 --> 00:05:46,931
look at better algorithms which might 
have better guarantees on worst case 

92
00:05:46,931 --> 00:05:49,337
instances. 
So in the next slide, I'll show you a 

93
00:05:49,337 --> 00:05:53,309
refined analysis of the three-step greedy 
algorithm, conditions under which it 

94
00:05:53,309 --> 00:05:56,766
performs much better than 50%. 
In a separate sequence of videos, we'll 

95
00:05:56,766 --> 00:05:59,362
develop a dynamic programming base 
heuristic, 

96
00:05:59,362 --> 00:06:02,987
which is guaranteed to have much better 
performance, on all instances. 

97
00:06:02,987 --> 00:06:07,132
So there's a few different assumptions 
you can impose on knapsack instances that 

98
00:06:07,132 --> 00:06:10,352
will enable you to prove better 
performace gurantees for greedy 

99
00:06:10,352 --> 00:06:12,637
heuristics. 
Let me just show you one, that's both 

100
00:06:12,637 --> 00:06:18,484
simple and also it's a condition that's 
met in many knapsack instances that arise 

101
00:06:18,484 --> 00:06:21,691
in practice. 
So let's focus on knapsack instances 

102
00:06:21,691 --> 00:06:25,967
where no item is especially big compared 
to the knapsack capacity. 

103
00:06:25,967 --> 00:06:30,794
For concreteness, let's say that the size 
of every item is at most 10% of the 

104
00:06:30,794 --> 00:06:35,002
knapsack capacity, capital W. 
As you'll see, there's nothing special 

105
00:06:35,002 --> 00:06:37,667
about the number 10%. 
I'm just using that to keep the 

106
00:06:37,667 --> 00:06:40,612
discussion concrete. 
So I claim that without changing our 

107
00:06:40,612 --> 00:06:44,182
algorithm at all using exactly the same 
three-step greedy algorithm. 

108
00:06:44,182 --> 00:06:48,187
Actually, here, we can even get away with 
the original two-step greedy algorithm. 

109
00:06:48,187 --> 00:06:51,837
We can prove performance guarantee is 
much better than 50% as long as the 

110
00:06:51,837 --> 00:06:56,136
instance satisfies this property. 
Why is this assumption useful? Well, lets 

111
00:06:56,136 --> 00:06:58,557
think about step 2 of our greedy 
algorithm. 

112
00:06:58,557 --> 00:07:02,445
So we've sorted the items in 
non-increasing order, of value per size. 

113
00:07:02,445 --> 00:07:06,616
We're packing the items one at time. 
Now, at some point we get stuck, there's 

114
00:07:06,616 --> 00:07:10,898
some item k plus one where if we tried to 
put it in the knapsack, we would overflow 

115
00:07:10,898 --> 00:07:15,063
the capacity. 
Well, by assumption, this k plus one'th 

116
00:07:15,063 --> 00:07:19,625
item, its size is at most 10% of the 
original knapsack capacity. 

117
00:07:19,625 --> 00:07:24,833
So if sticking in item k plus one would 
cause it to overflow, that means there's 

118
00:07:24,833 --> 00:07:29,061
less than 10% of the knapsack capacity 
currently available. 

119
00:07:29,061 --> 00:07:34,411
That is, the knapsack is currently 90% 
full, or more with the items that we've 

120
00:07:34,411 --> 00:07:37,808
already packed in so far. 
So this is sounding pretty good. 

121
00:07:37,808 --> 00:07:42,140
Our greedy criterion ensures that 
whatever fraction of knapsack we wind up 

122
00:07:42,140 --> 00:07:46,700
using, we're using in the most valuable 
possible way and we just noticed that 

123
00:07:46,700 --> 00:07:51,193
this assumption on the instance, implies 
that we use almost all of the knapsack. 

124
00:07:51,193 --> 00:07:54,767
To make this intutition precise, we 
comapre the output of our greedy 

125
00:07:54,767 --> 00:07:58,455
algorith, even just step two, to our 
favorite hypothetical benchmark, 

126
00:07:58,455 --> 00:08:02,713
mainly, the greedy fractional solution. 
What's the difference between these two 

127
00:08:02,713 --> 00:08:06,989
solutions? Well, they're almost the same, 
the only difference is that the greedy 

128
00:08:06,989 --> 00:08:11,012
fractional solution gets to pack that 
last bit of the knapsack, which we know 

129
00:08:11,012 --> 00:08:14,253
is at least 10%, with a suitable fraction 
of item k plus one. 

130
00:08:14,253 --> 00:08:19,393
So, the value that we're missing in our 
solution is that final 10% at most of the 

131
00:08:19,393 --> 00:08:23,462
greedy fractional solution. 
Now in the greedy fractional solution, 

132
00:08:23,462 --> 00:08:28,108
the items are packed in decreasing order 
of bag per buck and a value per size. 

133
00:08:28,108 --> 00:08:32,185
So, this last 10% of the greedy 
fractional solution is also the least 

134
00:08:32,185 --> 00:08:35,743
important. It's making the least valuable 
use of capacity. 

135
00:08:35,743 --> 00:08:40,772
So this final at most 10% of the greedy 
fractional solution can account for, at 

136
00:08:40,772 --> 00:08:45,051
most, 10% of its overall value. 
So whatever we're missing in our solution 

137
00:08:45,051 --> 00:08:49,483
for our greedy algorithm, it's at most 
10% of the overall value of the greedy 

138
00:08:49,483 --> 00:08:52,983
fractional solution. 
That is, we're getting at least 90% of 

139
00:08:52,983 --> 00:08:57,595
the value of the greedy fractional 
solution. And we know from our thought 

140
00:08:57,595 --> 00:09:01,538
experiment that the greedy fractional 
solution is even better than optimal, is 

141
00:09:01,538 --> 00:09:05,477
at least as good as any feasible knapsack 
solution, therefore, the output of our 

142
00:09:05,477 --> 00:09:09,221
greedy algorithm, even if we don't use 
the third step, is 90% as good as the 

143
00:09:09,221 --> 00:09:13,761
value of an optimal solution. 
All of this reasoning holds equally well 

144
00:09:13,761 --> 00:09:17,385
with a number different than 10%, 
so for example if you only knew that 

145
00:09:17,385 --> 00:09:21,123
every size was at most 20% of the 
knapsack, then you'd get that the greedy 

146
00:09:21,123 --> 00:09:23,343
solution would be at least 80% of 
optimal. 

147
00:09:23,343 --> 00:09:27,156
On the other hand if you knew that every 
item had size at the most 1% of the 

148
00:09:27,156 --> 00:09:31,161
knapsack capacity then just the two-step 
greedy algorithm would get you within 

149
00:09:31,161 --> 00:09:31,345
99%.