This sequence of videos is about the 
design and analysis of heuristics. 
Algorithms which are generally guaranteed 
to be fast but not guaranteed to be 100% 
correct. 
As a case study, we're going to revisit 
the knapsack problem that we thought 
about earlier in the That they name a 
programming section. 
Let briefly review the 3 strategies for 
dealing with NP complete problems all of 
these are relevant for the knapsack 
problems. 
The first strategy is to identify 
computationally tractable special cases 
of the NP complete problem that you're 
grappling with. 
If you're lucky, the instance is relevant 
for your application will fall squarely 
into one of these computationally 
tractable special cases. Even otherwise, 
it is useful have these kinds of sub 
routines lying around. 
Perhaps with a little bit of extra work 
you can reduce your problem instances to 
one of these computational tractable 
special cases. 
A dynamic programming solution to the 
knapsack problem identifies one such 
special case. 
Namely, knapsack instances where the 
knapsack capacity is small. 
For example, if it's comparable with the 
number of items. 
The second approach is the design and 
analysis of heuristics. 
Algorithms which are not guaranteed to be 
100% optimal. 
Although more to say about those in a 
second. 
The third strategy is to insist on a 
correct algorithm and being an NP 
complete problem, you're then expecting 
to see exponential running time but still 
they have running time quality to be 
better than what you have achieved with 
naive brute force search. 
A dynamic programming solution to an 
abstract problem can be viewed as a 
contribution in this direction. 
Naive brute force search enumerating 
overall possible subsets of items. 
It's going to take time proportional to 2 
to the n, where n is the number of items. 
The dynamic programming algorithm takes 
time proportional to n the number of 
items times W for knapsack capacity. 
For most problems of interest there's 
then a programming solution or be at 
exponential for reasons we've discussed 
in the past will be a Significant 
improvement over naive brute-force 
search. 
[SOUND] Let's now return to the subject 
of heuristics, the topic for today. 
Happily, despite being NP-complete, the 
knapsack problem admits some quite useful 
heuristics. 
We're going to study two of them. 
First we'll use the greedy algorithm 
design paragon to come up with a pretty 
good heuristic. 
Pleasingly, this pretty good Greedy 
heruistic is also blazingly fast. 
We'll then pull out a different tool, 
mainly dynamic programming to develop yet 
another heuristic. 
It's going to be polynomial time, not as 
fast as a Greedy solution, but it's 
accuracy will be excellent. 
It will get within a 1-epsilon factor of 
optimal where epsilon is a paramater that 
we can tune as we wish. 
Now, there's zillions and zillions of 
algorithms in the world, and anybody can 
write one of them down. 
So, if someone proposes an algorithm to 
you, the owness is on them, to convince 
you that it's a good idea. 
To convince you that it's, in some sense, 
a good algorithm. 
Periodically, in these courses, I've been 
making that argument in 2 ways. 
First of all, I provide you with an 
argument, that the algorithm that I 
proposed, is correct. 
It always is guaranteed to solve the 
problem that we're interested in. 
Secondly, I argue that it solves the 
problem using a reasonable amount of 
resources. 
So 1 great exhibit A would be say 
Dijkstra's algorithm. 
We wanted to solve the shortest path 
problem with a single source non-negative 
edge lengths. 
I gave you a proof that Dijkstra's 
algorithm always achieves that goal, and 
I showed you an implementation so that 
the running time was not much more than 
what it takes to just read the input. 
Now, of course, this best case scenario 
is not going to be realized for 
NP-Complete problems, and in particular 
for the nap sack problem. 
So you have to give up either on the 
correctness or on the running time. 
You have to make a compromise. 
A tricky question then, is how do you 
argue that the compromise you're making 
is not overly severe how do you argue 
that your approach to an NP-complete 
Problem is a good one. 
Heuristics relax correctness, and so one 
would like an argument that says, 
correctness is not relaxed by too much. 
In the best case scenario, when you 
propose a heuristic, you should supply a 
performance guarantee which says that the 
solution is almost correct in some sense 
on every single instance. 
Or failing that, at least on many 
instances of interest. 
Now in practice you do bump into problems 
that are so hard you wind up having to 
resort to heuristics where you really 
don't have a good understanding of how or 
when they work, where you really just 
cross your fingers, run the heuristic and 
hope for the best. 
Local search is probably the preeminent 
example of this type of heuristic, 
something like the of the K means. 
An algorithm for clustering. 
But, for the problem we're going to talk 
about today, we're going to seek out 
heuristics that have worse case 
performance guarantees. 
And of course, a broader theme of this 
discussion, and the reason I'm using 
knapsack as an example is this will show 
us how our programmer toolbox or 
algorithm toolbox is now rich enough to 
design not just lots of different exact 
algorithms, but also lots of useful 
heuristics for complete problems. 
So let me briefly remind you what the 
knapsack problem is: As input, we are 
given 2 n+1 numbers. 
There's n items. 
Each one has a postiive value and a 
positive weight. 
The final number is a knapsack capacity 
Capital W. 
The goal is to pack the knapsack with the 
most valuable set of items. 
That is, you're seeking out a subset S of 
items and it should have the property 
that the sum of the sizes of the items in 
your set S is bounded above by the 
knapsack capital W subject to that 
constraint, your subset of chosen items 
should maximize the sum. 
More of the values. 
This is a fundamental problem that comes 
up all the time, in particular as a 
subroutine. 
Really, when you have a constrain 
resource that you want to use in the best 
way possible, that is basically the 
knapsack problem. 
So let's turn to heuristics for the 
knapsack problem. 
So now that I've told you that we're 
happy with heuristics, we're willing to 
relax correctness, this all of a sudden 
resuscitates the greedy algorithm design 
paradigm as a feasible approach to the 
knapsack Knapsack problem. 
Because the knapsack problem is empty 
complete, we certainly are not expecting 
to find a exactly correct greedy 
algorithm, but maybe there's a greedy 
algorithm which is pretty good, and we're 
expecting most greedy algorithms are 
going to run extremely quickly. 
So let's talk through a potential greedy 
approach to the knapsack problem. 
Probably the first idea you'd have would 
be to consider the items in some order. 
and when you consider an item, you make 
an irrevocable decision at that point, 
whether to include it in your solution. 
Or not. 
So the question then is, in what order 
should you look at the items? Well, 
what's our objective? Our objective is to 
maximize the value of our set. 
So obviously, high value items are 
important, so maybe the first idea would 
be to sort the items, from highest value, 
to lowest value. 
But if you thik about that proposal for 
say, 30 sconds, you quickly realize that 
this is a little naive. 
This is not the whole story. 
If you had a high value item that fills 
up the whole knapsack, seems like that's 
not quite as useful if you had an almost 
as higih value item that basically had 
size close to zero. 
That didn't use up any of the knapsack at 
all. 
Remember that each item has 2 properties, 
not just its value but also its size, and 
both of these are important. 
We want to prefer items that have a lot 
of value but we also want to prefer items 
that are small. 
So I hope many of you are now sensing a 
strong parallel between the present 
discussion. 
and the discussion we had with our first 
serious study of agreeom. 
That was scheduling jobs on a single 
machine to minimize the sum of the 
weighted completion times. 
In that old scheduling problem, just like 
in this one, each object had 2 
parameters, back then it was a job length 
and a job weight or a job priority, and 
you had a preference for jobs that were 
higher weight, you wanted. 
Them to be first, and you had a 
preference for shortage ops, you wanted 
them to be first. 
Here again we have items, two parameters, 
the value and the size, high value is 
good, low size is good. 
Back in the scheduling problem, the key 
idea was to look at jobs ratios and 
schedule in that order. 
By analogy, here in the knapsack problem, 
if you want to take the 2 parameters of 
each item, form a single parameter by 
which we can then sort the jobs, the 
natural first cut to look at That is a 
ratio.Since we prefer high values, we 
prefer low weights, the sensible think we 
look at is the ratio of the valuable item 
divided by its size and we then consider 
items from the highest values of these 
ratios to the lowest values of these 
ratios. 
One way to think of this is that we 
consider the items in order of 
non-decreasing bang per buck, we're 
always trying to get the most value for 
our money where money is the amount of 
the knapsack capacity that we're 
spending. 
So now that we have our greed ordering we 
just proceed through the items one at a 
time and. 
And we pack the items into this knapsack 
in this ordering. 
Now what happens here and didn't actually 
trouble us in the scheduling problem is 
at some point we might no longer be able 
to pack items into the knapsack. 
The thing might get full. 
So once that happens, once we reach an 
item which doesn't fit into the knapsack 
given the items that we've already packed 
into it, we just stop the A greedy 
algorithm. 
If you think about it, the way I've 
written this algorithm, aborting as soon 
as you fail to pack 1 item is not really 
what you would do in practice. 
You would do something a little bit 
smarter. 
If you fail to pack some given item, say 
item k +1, well maybe item k+2, it has a 
worse ratio but maybe it's much smaller. 
Maybe item k+2 does fit in the knapsack. 
Then there's no reason not to just take 
it. 
So in practice you'd go through the 
entire list of items, and you would just 
pack whatever fits; skipping whatever 
doesn't fit. 
For simplicity, I'm going to analyze this 
slightly worse heurisitc in the slides to 
follow. 
So just a real quick example. 
Let's consider the following 3 item 
instance. 
So, I've taken the liberty of sorting the 
items by ratio. 
The 1st item has value 2 in size 1 for 
ratio 2. 
The 2nd item has a ratio of 4/3rd's value 
4 size 3. 
And the 3rd item has a ratio of 1, value 
in weight equal to 3. 
Let's say the nap-sack capasity is 5. 
So what does the greedy solution do. 
It first packs in item number 1, then 
there's a residual knapsack capacity of 
4. 
So there's room for item number 2 it 
packs that. 
Now there's a residual knapsack capacity 
of only 1 so there isn't any room for 
item number 3 so it halts. 
So the greedy algorithm is just going to 
output the set 1, 2. 
In this particular example, you will 
notice that this, greedy solution of 
value 6, is in fact the optimal solution. 
You can't do better. 
But, lets look at a second example with 
just 2 items. 
So in this particular instance, what is 
the value of the output of the greedy 
algorithm, and what is the value of an 
optimal solution. 
So the correct answer is A. 
Since the knapsack capacity is a 1000 and 
the sum of the sizes of the jobs is 
1,001. 
There's no room for both of them. 
The greedy algorithm unfortunately, 
because the first tiny item has a smaller 
ratio. 
Well pack in item number 1, and that 
leaves no room for item number 2, so the, 
value of the greedy algorithm solution is 
just 2, where as the optimal solution is 
of course, to just take the second item. 
Yeah, its ratio is worse, but on the 
other hand, it fills up the whole 
knapsack. 
And so overall, your value is 1,000, 
which obviously blows away the Greedy 
solution. 
This quiz shows that at least for general 
instances the Greedy knapsack heuristic 
we've devised so far can be arbitararily 
bad. 
It can be terrible with respect to an 
optimal solution. 
There is, however, a simple fix to 
address this issue. 
We're going to add a very simple step 
three to our greedy heuristic. 
So the new step three is just going to 
compare two candidate solutions in return 
which everyone is better, which everyone 
has a bigger total value. 
The first candidate is just what we were 
doing before, it's just the output of 
step two of the algorithm. 
The second candidate is whichever item by 
itself has the maximum value. 
In other words, this new greedy 
algorithm, it just runs the old one, but 
then it does a final sanity check. 
It looks at each item individually and it 
says, well, if this item just by itself 
dominates the solution I've computed thus 
far, I return this single item by itself. 
Instead. 
There's no question that adding this new 
Step 3 fixes the problem in the criz, 
quiz in the last slide. 
If you add this Step 3, then we will 
choose that item that has value 1,000 
instead of the lousy solution of value 2 
that we computed in Step 2. 
But it may seem that this Step 3 is just 
kind of a hack. 
Meant to address that specific instance. 
But remarkably just adding step three 
transforms this greedy heuristic into one 
that has a reasonably good worst-case 
performance guarantee. 
Precisely, the value of the solution 
computed by this three-step greedy 
algorthim is always at least 50% of the 
value of an optimal solution. 
So, the jargan for this kind of algorithm 
is a 1/2 approximation algorithm. 
For every instance, it's guaranteed to 
compute a solution at least 1/2 that of 
an optimal solution. 
Also, as you would hope, from a greedy 
heuristic, it runs extremely quickly. 
Basically all we do is sort the items, so 
the running time is o of n, log n. 
Again. 
In the next video, we'll provide a proof 
of this theorem. 
And, as we'll see, under some additional 
assumptions about the type of instances 
the performance guarantee will in fact be 
much, much better than 50%,