1
00:00:00,012 --> 00:00:04,024
In this video, we'll develop our second 
dynamic programming solution for the 

2
00:00:04,024 --> 00:00:07,191
knapsack problem. 
Why do we need a second algorithm? Well, 

3
00:00:07,191 --> 00:00:11,377
it's a crucial building block in our 
heuristic that achieves arbitrarily close 

4
00:00:11,377 --> 00:00:15,092
approximation to the knapsack problem in 
polynomial time. 

5
00:00:15,092 --> 00:00:19,283
Recall that in the knapsack problem, the 
input is 2n+1 numbers. 

6
00:00:19,283 --> 00:00:24,247
There's n items, each one has a positive 
value, Vi, and a positive size, Wi. 

7
00:00:24,247 --> 00:00:27,332
Additionally, we're given a knapsack 
capacity, 

8
00:00:27,332 --> 00:00:30,094
capital W. 
Back in the dynamic programming section 

9
00:00:30,094 --> 00:00:33,664
of this course, we already devised an 
algorithm for this problem. 

10
00:00:33,664 --> 00:00:37,731
That algorithm assumed that the item 
sizes were integral and also that the 

11
00:00:37,731 --> 00:00:41,394
knapsack capacity is an integer. 
The running time of that dynamic 

12
00:00:41,394 --> 00:00:44,821
programming algorithm was O(n), the 
number of items, times capital W, the 

13
00:00:44,821 --> 00:00:48,046
knapsack capacity. 
In particular, we get a polynomial time 

14
00:00:48,046 --> 00:00:51,945
algorithm for the special case of 
knapsack when the knapsack capacity is 

15
00:00:51,945 --> 00:00:55,132
not too big, 
when it's polynomial in n. 

16
00:00:55,132 --> 00:00:59,592
It turns out that this is not the right 
special case on which to build an 

17
00:00:59,592 --> 00:01:03,877
algorithm which has arbitrarily good 
approximation in polynomial time. 

18
00:01:03,877 --> 00:01:08,687
Rather, for that kind of heuristic, what 
we really want is a polynomial time 

19
00:01:08,687 --> 00:01:13,738
solution for the special case where item 
values are small, that is precisely the 

20
00:01:13,738 --> 00:01:16,505
solution that we're going to provide in 
this video. 

21
00:01:16,505 --> 00:01:19,967
We will be assuming that the integer 
values are all integers. 

22
00:01:19,967 --> 00:01:24,657
The sizes of the knapsack capacity can be 
whatever and the dynamic program solution 

23
00:01:24,657 --> 00:01:28,309
that we developed will have running time. 
Big O of n^2 times Vmax. 

24
00:01:28,309 --> 00:01:31,104
Vmax here denotes the maximum value of 
any item. 

25
00:01:31,104 --> 00:01:35,549
The running times of the two algorithms 
are more in parallel than the notation 

26
00:01:35,549 --> 00:01:38,697
might suggest. 
Probably, it looks like there's an extra 

27
00:01:38,697 --> 00:01:41,971
factor of n in the second dynamic 
programming algorithm, 

28
00:01:41,971 --> 00:01:47,247
but that's the wrong way to think. 
Think, for the first algorithm, think of 

29
00:01:47,247 --> 00:01:51,512
w as an upper bound on the largest total 
size that a feasible solution might 

30
00:01:51,512 --> 00:01:54,067
possess. 
In the second algorithm, think of n times 

31
00:01:54,067 --> 00:01:58,682
Vmax as an upper bound on the maximum 
total value that any feasible solution 

32
00:01:58,682 --> 00:02:01,692
might possess. 
In the first algorithm, we're focused on 

33
00:02:01,692 --> 00:02:04,257
size, 
in the second algorithm, we're focused on 

34
00:02:04,257 --> 00:02:06,812
value, 
and in any other case, the running time 

35
00:02:06,812 --> 00:02:11,467
is the maximum quantity that we have to 
worry about times the number of items, n. 

36
00:02:11,467 --> 00:02:15,852
Now, since all of you are survivors of my 
dynamic programming boot camp, I don't 

37
00:02:15,852 --> 00:02:20,117
feel the need to develop this solution in 
great detail, let me just jump to the 

38
00:02:20,117 --> 00:02:23,802
relevant subproblems. 
So the subproblems here will be a twist 

39
00:02:23,802 --> 00:02:27,667
on the ones that we used in our first 
dynamic programming algorithm for 

40
00:02:27,667 --> 00:02:30,232
knapsack. 
The part that's going to be exactly the 

41
00:02:30,232 --> 00:02:34,482
same is we're going to have an index i, 
i specifies which prefix of items we're 

42
00:02:34,482 --> 00:02:38,919
permitted to use in a given subproblem. 
Now, in general, in the knapsack problem, 

43
00:02:38,919 --> 00:02:43,079
you're simultaneously trying to get the 
value high while keeping the weight 

44
00:02:43,079 --> 00:02:45,453
small. 
So the way we organized things in our 

45
00:02:45,453 --> 00:02:49,588
first dynamic programming algorithm is 
the second index into a subproblem 

46
00:02:49,588 --> 00:02:53,124
specified the residual capacity x that we 
were allowed to use, 

47
00:02:53,124 --> 00:02:57,996
and then, while respecting that capacity 
x, we asked for the largest value that we 

48
00:02:57,996 --> 00:03:00,636
could obtain just using the first I, 
items. 

49
00:03:00,636 --> 00:03:03,069
Here, we're going to turn that on its 
head. 

50
00:03:03,069 --> 00:03:07,766
We're going to have a second parameter x, 
which is the value that we're striving 

51
00:03:07,766 --> 00:03:12,381
for. So we want to get value x or more 
and we're going to try to minimize the 

52
00:03:12,381 --> 00:03:17,022
total size that we need to use subjects 
to getting the value at least x. 

53
00:03:17,022 --> 00:03:20,421
So the first index, i, arranges as usual 
from 0 to n, 

54
00:03:20,421 --> 00:03:25,163
this corresponds to all the prefixes of 
items you might be interested in. 

55
00:03:25,163 --> 00:03:30,340
The second parameter, x, ranges over all 
of the targets for total value that you 

56
00:03:30,340 --> 00:03:34,018
might be interested in. 
Now, remember, we're assuming in this 

57
00:03:34,018 --> 00:03:36,378
video, that every item value is an 
integer, 

58
00:03:36,378 --> 00:03:40,707
so that means any subset of items will 
have an integral total value so we only 

59
00:03:40,707 --> 00:03:45,244
need to worry about integer values of x. 
In addition, we never need to worry about 

60
00:03:45,244 --> 00:03:48,992
trying to achieve a total value bigger 
than n, the number of items, 

61
00:03:48,992 --> 00:03:51,787
times v max, the largest value of any 
item. 

62
00:03:51,787 --> 00:03:56,726
In fact, you could replace this upward 
value n times Vmax by the sub of the vi's 

63
00:03:56,726 --> 00:03:59,546
if you prefer. 
So if we're given choice of i and x, 

64
00:03:59,546 --> 00:04:04,100
[COUGH] I'm going to use capital S sub 
i,x to denote the optimal value of that 

65
00:04:04,100 --> 00:04:08,072
corresponding subproblem. 
So, that is, the minimum weight, the 

66
00:04:08,072 --> 00:04:13,417
minimum total size that is sufficient to 
achieve a total value of x or more while 

67
00:04:13,417 --> 00:04:16,701
being restricted to using just the first 
i items. 

68
00:04:16,701 --> 00:04:21,616
If there is in fact no way to get the 
target value of x using just the first i 

69
00:04:21,616 --> 00:04:24,728
items, 
that is if the sum of the values of these 

70
00:04:24,728 --> 00:04:29,617
items is already less than x, we just 
define this to be plus infinity. 

71
00:04:29,617 --> 00:04:31,889
So let's write down the natural 
occurrence. 

72
00:04:31,889 --> 00:04:35,562
Let's assume here that i is at least 1. 
So, the structure here is going to be 

73
00:04:35,562 --> 00:04:39,577
exactly the same as it was in our first 
dynamic programming algorithm for the 

74
00:04:39,577 --> 00:04:42,714
knapsack problem. 
Specifically, let's focus on a subproblem 

75
00:04:42,714 --> 00:04:46,566
with a choice of i and a choice of x. 
That final item i is either in this 

76
00:04:46,566 --> 00:04:49,227
optimal solution for the subproblem or 
it's not. 

77
00:04:49,227 --> 00:04:52,452
That gives us two cases and the 
recurrence is just going to do 

78
00:04:52,452 --> 00:04:56,431
brute-force search over the two case, 
because we're trying to minimize the 

79
00:04:56,431 --> 00:05:00,352
total size required to achieve a given 
value target, here, the brute-force 

80
00:05:00,352 --> 00:05:03,088
search is going to take the form of a 
minimum. 

81
00:05:03,088 --> 00:05:07,394
What are the two candidates for an 
optimal solution? Well, the boring one is 

82
00:05:07,394 --> 00:05:11,688
when you don't even bother to use item i, 
then of course, you just inherit an 

83
00:05:11,688 --> 00:05:15,857
optimal solution, a minimum sized 
solution using just the first i-1 items 

84
00:05:15,857 --> 00:05:19,915
achieving the same value target x. 
On the other hand, suppose that in an 

85
00:05:19,915 --> 00:05:23,592
optimal soluton to the subproblem, you do 
actually use item i. 

86
00:05:23,592 --> 00:05:28,156
Well, this contributes to the size of 
item i to the weight of this optimal 

87
00:05:28,156 --> 00:05:30,536
solution. 
So that's going to be w sub i. 

88
00:05:30,536 --> 00:05:35,196
What can we say about the rest of the 
items in this optimal solution? Well, 

89
00:05:35,196 --> 00:05:38,737
because the solution here achieves a 
value target of x, 

90
00:05:38,737 --> 00:05:43,727
the items in this solution other than i, 
must be achieving by themselves a value 

91
00:05:43,727 --> 00:05:47,368
target of x minus v sub i. 
And, of course, by the usual cut and 

92
00:05:47,368 --> 00:05:52,546
paste argument amongst all subsets of the 
first i-1 items with that property, they 

93
00:05:52,546 --> 00:05:55,603
better will have the minimum total 
weight. 

94
00:05:55,603 --> 00:06:00,408
So one quick edge case in this second 
case, if vi is actually bigger than x and 

95
00:06:00,408 --> 00:06:04,871
we have a negative index here, we just 
interpret this quantity as 0, 

96
00:06:04,871 --> 00:06:08,612
that's because item i alone meets the 
value target of x. 

97
00:06:08,612 --> 00:06:16,144
Let's now wrap things up with the 
pseudocode of the new dynamic programming 

98
00:06:16,144 --> 00:06:20,156
algorithm. 
So A will be our usual table. 

99
00:06:20,156 --> 00:06:26,245
It has two dimensions, because 
subproblems are indexed by two 

100
00:06:26,245 --> 00:06:33,103
parameters, i, the prefix that ranges 
from 0 to n, and x, the value target that 

101
00:06:33,103 --> 00:06:36,402
ranges from 0 to, to the maximum 
imaginable value, 

102
00:06:36,402 --> 00:06:39,851
let's say n times Vax. 
So the base case is when i equal 0, that 

103
00:06:39,851 --> 00:06:44,984
is, you're not allowed to use any items. 
In this case, we have to fill it in with 

104
00:06:44,984 --> 00:06:48,599
plus infinity. 
Well, except if x itself is 0, then the 

105
00:06:48,599 --> 00:06:51,891
answer is 0. 
Now, we just populate the table using the 

106
00:06:51,891 --> 00:06:55,983
recurrence in a double for loop. 
The structure here is exactly the same as 

107
00:06:55,983 --> 00:06:59,081
in our first knapsack dynamic programming 
algorithm. 

108
00:06:59,081 --> 00:07:03,841
In the first dynamic programming solution 
that we developed for knapsack, we could 

109
00:07:03,841 --> 00:07:08,047
return the correct answer in constant 
time given the filled-in table. 

110
00:07:08,047 --> 00:07:12,512
That's because of one of the subproblems 
in that dynamic programming algorithm, 

111
00:07:12,512 --> 00:07:16,657
literally was the original problem. 
When i=n and x is equal to the full map 

112
00:07:16,657 --> 00:07:20,557
sack capacity capital W, the 
responsibility of that subproblem was to 

113
00:07:20,557 --> 00:07:24,952
compute the max value solution subject to 
the capacity capital W using all of the 

114
00:07:24,952 --> 00:07:27,612
items, that's literally the original 
problem. 

115
00:07:27,612 --> 00:07:32,227
By contrast, in this dynamic programming 
algorithm, none of the subproblems are 

116
00:07:32,227 --> 00:07:35,532
literally the original problem that we 
wanted to solve. 

117
00:07:35,532 --> 00:07:39,472
In this new dynamic programming 
algorithm, however, none of the 

118
00:07:39,472 --> 00:07:44,087
subproblems correspond directly to the 
original problem that we wanted to solve, 

119
00:07:44,087 --> 00:07:47,922
none of them tell you the maximum value 
of a feasible solution. 

120
00:07:47,922 --> 00:07:51,342
To see why, let's inspect the largest 
batch of subproblems. 

121
00:07:51,342 --> 00:07:56,086
When i is equal to n and you can use 
whatever subset of items that you want. 

122
00:07:56,086 --> 00:08:00,108
The second index of one of these problems 
is a target value x. 

123
00:08:00,108 --> 00:08:04,900
That might be a number, like say 17,231. 
So after you've run this algorithm and 

124
00:08:04,900 --> 00:08:10,077
you've filled in the whole table, what do 
you know in this subproblem? You will 

125
00:08:10,077 --> 00:08:14,675
have computed the smallest total size of 
a bunch, bunch of items that has total 

126
00:08:14,675 --> 00:08:18,929
value at least 17,231. 
So that's going to be some number, maybe 

127
00:08:18,929 --> 00:08:22,146
it's 11,298. 
But, what if your knapsack capacity's 

128
00:08:22,146 --> 00:08:27,269
only 10,000? Then, this is not a feasible 
solution, so it doesn't do you any good. 

129
00:08:27,269 --> 00:08:30,956
Okay, well that's not quite true, it does 
do you some good. 

130
00:08:30,956 --> 00:08:36,456
If you know that every single solution 
that gets value 17,231 or more has size 

131
00:08:36,456 --> 00:08:42,944
bigger than your knapsack capacity, well 
then, you know that the optimal solution 

132
00:08:42,944 --> 00:08:48,117
has to be less than 17,231. 
There is no feasible way to get a total 

133
00:08:48,117 --> 00:08:51,492
value that high. 
Now, you realize that if you knew this 

134
00:08:51,492 --> 00:08:55,778
information for every single target value 
x, and you do once you've filled in the 

135
00:08:55,778 --> 00:08:59,442
entire table, that's sufficient to figure 
out the optimal solution, 

136
00:08:59,442 --> 00:09:02,244
figure out the biggest value of a 
feasible solution. 

137
00:09:02,244 --> 00:09:05,842
All you need to do is scan through this 
final batch of subproblems. 

138
00:09:05,842 --> 00:09:10,070
You begin with the largest conceivable 
target value x, and then you get less and 

139
00:09:10,070 --> 00:09:13,137
less ambitious as you keep finding 
infeasible solutions. 

140
00:09:13,137 --> 00:09:17,872
So you scan from high target values to 
low target values and you're looking for 

141
00:09:17,872 --> 00:09:22,682
the first that is the largest target 
value x, such that there exists a subset 

142
00:09:22,682 --> 00:09:27,442
of items meeting that target value whose 
total size is at most your knapsack 

143
00:09:27,442 --> 00:09:30,727
capacity. 
That is going to be the optimal solution, 

144
00:09:30,727 --> 00:09:35,824
that first target value x, that can be 
physically met given your knapsack is at 

145
00:09:35,824 --> 00:09:38,311
capacity. 
Analyzing the running time is 

146
00:09:38,311 --> 00:09:43,285
straightforward, so how many subproblems 
are there, or equivalently, how many 

147
00:09:43,285 --> 00:09:47,833
iterations of a double for loop do we 
have? Well, it's n outer iterations, then 

148
00:09:47,833 --> 00:09:51,193
n times Vmax inner iterations, so that's 
n^2 Vmax overall. 

149
00:09:51,193 --> 00:09:55,988
Our brute-force searches over only two 
candidates, so we only do constant work 

150
00:09:55,988 --> 00:10:01,597
per iteration of the double for loop. 
In the final line, we do a scan through 

151
00:10:01,597 --> 00:10:06,664
the largest batch of subproblems, so 
that's O of n times Vmax. 

152
00:10:06,664 --> 00:10:13,796
So the running time is dominated just by 
the constant work per subproblem, and so, 

153
00:10:13,796 --> 00:10:18,851
our overall running time is O of n 
squared times Vmax, as promised.