[MUSIC] Next, let me mention, 
embarrassingly briefly, linear 
programming. 
So linear programming is a problem that, 
on the one hand, is efficiently solvable 
both in theory and in practice. 
And, on the other hand, is very general. 
It includes as a special case, bipartite 
matching, maximum flow, and tons of other 
stuff. 
The general problem solved by linear 
programming is to optimize a linear 
function, maximize or minimize it doesn't 
matter, optimize a linear function over a 
set of linear constraints. 
Geometrically, that corresponds to 
finding the best point, the optimal 
point, in the feasible region which is 
representable as the inner section 
intersection of half spaces. 
So let me just draw a cartoon picture, 
which is going to be in the plane. 
Although let me emphasize, the power of 
linear programming is that you can 
efficiently solve problems that are in a 
massively higher dimensions. 
Not 2 dimensions, but think millions of 
them. 
Dimensions. 
So in the plane, the intersection of half 
planes, assuming it's bounded, is just 
going to be a polygon and you could think 
about optimizing a linear function are 
just trying to push as far as possible in 
some direction subject to being inside 
the feasible region. 
So, for example, maybe you want to push 
as far to the northeast as possible 
Subject to being one of these blue 
points, somewhere in this polygon. 
For example, the maximum flow of problem 
is easily encoded as a special case of 
linear programming. 
You use one dimension, that is, one 
decision variable for each edge. 
The decision variable describes how much 
flow you route on a given edge. 
The linear objective function would be 
the maximized The sum of the flows on the 
edges going out of the source vertex. 
And then you would have linear 
constraints just insisting that the 
amount of flow coming into a vertex is 
the same as the amount of flow coming out 
of a vertex. 
And, linear programming code is not just 
maximum programming flow, but tons and 
tons of other problems as well. 
Despite its generality linear programming 
problems can be solved efficiently both 
in theory and in practice. 
The systematic formalization and 
algorithm solution of linear programs was 
pioneer by George Dantzig back in the 
1940's. 
In particular, Dantzig invented what's 
called the simplex method which remains 
to date one of the most important. 
Practical methods for solving linear 
programs. 
Have you ever heard that story about the 
student who walks in late to a class, 
sees what he thinks is homework written 
up on the chalk board and solves them, 
not knowing that they are in fact major 
open research questions? 
Well, maybe you thought that story was 
apocryphal, assuming I did for many 
years, But it's not. 
Turns out it's totally about George 
Dantzig. 
In 1939, he walked into his graduate 
statistics class late. 
The 2 major open questions in the field 
were written on the chalkboard, and he 
solved them within a few days. 
That wound up being his PhD dissertation 
before he got interested in linear 
programming. 
Now, the algorithms used to efficiently 
solve one of your programs, they're 
pretty complicated, both versions of the 
simplex method and so called Interior 
Point algorithms. 
So, it may or may not be a good use of 
your time learning more about how the 
guts of these algorithms work. 
But what I'm most surely is a good use of 
your time is how to be an educated 
clients of these algorithms that is to 
get some practice formulating problems as 
linear programs and solving linear 
programs using either op, open source 
solvers. 
Or commercial linear programming solvers. 
Those solvers are a very powerful black 
box tool to have in your algorithm 
assists toolbox. 
In fact there are black box subroutines 
available to you that are strictly more 
powerful than linear programming. 
One generalization is convex programming, 
so linear functions are special case of 
convex functions and if you want to 
minimize the convex function or 
equivalently maximize a concave function 
subject to convex constraints that again 
is a problem you can solve efficiently 
both in theory in terms of polynomial 
time and in practice. 
May be the problem size is you can reach 
on. 
Quite as big with linear programming, but 
there pretty close. 
The different generalization is energer 
linear programming, so these are like 
linear program but where you add the 
extra constraint that certain decision 
variables have to take on and interger 
variable. 
So something like 1/2 as a value for a 
decsion variable would be disallowed in 
an interger program. 
Now these in general are no longer 
solvable efficiently in theory, it's easy 
to encode empty complete problems, like 
lets say vertex cover as a special case 
of engineer programming. 
But there is quite a bit of advance 
technology out there. 
For solving at least in special domains, 
integer programs. 
So again if you have an empty complete 
problem and you really need to solve it, 
integer programming is a technique your 
going to want to learn more about. 
What are some other topics we haven't had 
time to discuss? Well, first of all, for 
many of topics we have discussed, we've 
only scratched the surface. 
There's beautiful and useful stuff beyond 
what we've discussed in these discussed 
in these classes. 
On topics ranging from data structures, 
for example, with hashing in research 
trees, to graph algorithms. We already 
mentioned matchings and flows. 
To approximation alogrithm like better 
heuristics for the travelling salesman 
problem. 
A topic we've barely discussed at all is 
geometric algorithms, one exception being 
a divide and conquer algorithm for the 
closest pair problem that we discussed in 
part one. 
Roughly speaking, the study of geometric 
algorithms has two flavors, low 
dimensional problems and high dimensional 
Dimensional problems. 
So when I say low dimensional, I mean 
probably 2 or 3 dimension. So problems in 
the plane or problems in free space. 
A conical computational problem here 
would be the convex hull problem. 
Which means I give you a bunch of points 
and then you want to know which points 
are on the convex hull of the point set. 
So, here's a way to think about the 
convex hull problem in the plane. 
So, take a wooden board, a flat wooden 
board, that represents the plane. 
Now, pound a bunch of nails into the 
wooden board. 
Those are representing the points in the 
plane. 
Now, to compute the convex hull, what you 
can do is you can take a rubber band, 
stretch it out really far so that it 
bounds. 
All of the nails you pounded into the 
wooden board and now let the wo, rubber 
band go, and it's going to snap around 
the boundry nails. 
That's a way to computer the convex hull 
in 2 dimensions. And so the question is, 
how do you do that efficiently given just 
a bunch of coordinates of points? The 
state-of-the-art here are divide and 
conquer algorythm, very much in the 
spirit of the closest pair problem back 
in Part 1. 
So why would you care about computing 
convex hulls? Well, there's lots of 
reasons. 
But, you know, as one example, suppose 
you were doing some 3D graphics, and you 
had moving objects in your scene, and you 
wanted to know whether, when two objects 
collide. 
Because then you have to respond 
appropriately in rendering the scene. 
Well, to know where the two objects 
collide, you don't have to remember all 
of the details of the objects. 
You just have to keep track of their 
convex hulls. 
They collide exactly when their convex 
hulls intersect, and so that's one reason 
you might want to do that computation. 
A very different flavor of geometric 
algorithms is the high-dimensional case. 
And here, I'm thinking of the number of 
dimensions as being a thou, in the 
thousands say, or perhaps even quite a 
bit more than that. 
Now, why would you ever have points in 
the thousands of dimensions? Well, 
imagine you were doing say, information 
retrieval, imagine you had a bunch of 
documents. 
Now, documents don't really have geometry 
intrinsically, but it can be useful to 
imbue them with geometry. 
How do you do that? Well, imagine you 
make a list of all of the words in the 
world. 
That you care about. 
So maybe say, 10,000 words that you think 
are interesting. 
And for a given document, you just count 
the number of occurrences of each of 
these 10,000 interesting words in the 
document. 
So maybe lots of them are zero, the words 
don't occur at all. 
You know, but maybe there's some word 
that occurs seven times in the document. 
The document. 
So then you'd give it a seven in that 
coordinate and boom, you've now 
represented each document as a point in 
10,000 dimensional space. 
One coordinate for each of the 
interesting words, the value of the 
coordinate being the frequency of that 
word in that document. 
Now you can start asking questions like, 
given a new document is it similar to any 
of the documents you've already been 
storing? And geometrically, that just 
corresponds to asking something called a 
nearest neighbor query. 
Given a bunch of points and possibly high 
dimensional space, and given a new point, 
which of the previous points is closest, 
say a new distance, to the new point you 
where just given? That would be a 
canonical question you would ask, in high 
dimensional computational geometry. 
In these Design and Analysis of 
Algorithms courses, we've been focusing 
on the most classical style of 
computation, where you're given upfront 
an input. 
You do a computation and you produce some 
output. 
And you take a bow, and you leave the 
stage. 
But if you think about computation in the 
real world, there's plenty of algorithms 
whose work Is never finished, algorithms 
that in effect run forever. 
Two applications that we've mentioned in 
passing in this class are the caching 
problems, so for example if you're 
writing an algorithm to manage a cache or 
an algorithm system, that algorithms work 
is never really done. 
It's going to have to manage the cache 
Indefinitely. 
Similarly, you can think about routing in 
a network, say in Internet routing. 
Again, that algorithm's work is never 
done. 
There's always going to be link failures. 
There's always going to be new nodes. 
There's always going to be new data to 
route. 
So, it has to keep making routing 
decisions indefinitely, as long as the 
Internet exists. 
As you would hope, both the theory and 
the practice of understanding algorithms 
that run indefinitely, making decisions 
in real time, is based quite squarely on 
the lessons that we've learned in the 
classical computational paradigm here. 
But it is an interesting. 
Topic worthy of study in its own right. 
A major concern of algorithms research in 
the 21st century is massive data sets, 
meaning data sets that are way too big to 
fit in the main memory of a single 
machine. 
One hot topic has been so-called 
streaming algorithms, that is algorithms 
which have to in effect take a firehose 
of data being generated at a torrential 
pace and boil it down using small space 
into just a few accurate statistics. 
So you might think, for example, about an 
algorithm running locally at a network 
router with data packets flying through 
the router at a ridiculous pace or an 
algorithm which is responsible for 
summarizing the data generated by a bunch 
of telescopic observations. 
A final, important topic by which the 
current theory is actually quite immature 
is understanding how to exploit 
parallelism in tackling massive data 
sets. 
For example, using the distributive 
system's approach exported by the map 
produced Hadoop Systems.