[MUSIC] Next, let me mention, embarrassingly briefly, linear programming. So linear programming is a problem that, on the one hand, is efficiently solvable both in theory and in practice. And, on the other hand, is very general. It includes as a special case, bipartite matching, maximum flow, and tons of other stuff. The general problem solved by linear programming is to optimize a linear function, maximize or minimize it doesn't matter, optimize a linear function over a set of linear constraints. Geometrically, that corresponds to finding the best point, the optimal point, in the feasible region which is representable as the inner section intersection of half spaces. So let me just draw a cartoon picture, which is going to be in the plane. Although let me emphasize, the power of linear programming is that you can efficiently solve problems that are in a massively higher dimensions. Not 2 dimensions, but think millions of them. Dimensions. So in the plane, the intersection of half planes, assuming it's bounded, is just going to be a polygon and you could think about optimizing a linear function are just trying to push as far as possible in some direction subject to being inside the feasible region. So, for example, maybe you want to push as far to the northeast as possible Subject to being one of these blue points, somewhere in this polygon. For example, the maximum flow of problem is easily encoded as a special case of linear programming. You use one dimension, that is, one decision variable for each edge. The decision variable describes how much flow you route on a given edge. The linear objective function would be the maximized The sum of the flows on the edges going out of the source vertex. And then you would have linear constraints just insisting that the amount of flow coming into a vertex is the same as the amount of flow coming out of a vertex. And, linear programming code is not just maximum programming flow, but tons and tons of other problems as well. Despite its generality linear programming problems can be solved efficiently both in theory and in practice. The systematic formalization and algorithm solution of linear programs was pioneer by George Dantzig back in the 1940's. In particular, Dantzig invented what's called the simplex method which remains to date one of the most important. Practical methods for solving linear programs. Have you ever heard that story about the student who walks in late to a class, sees what he thinks is homework written up on the chalk board and solves them, not knowing that they are in fact major open research questions? Well, maybe you thought that story was apocryphal, assuming I did for many years, But it's not. Turns out it's totally about George Dantzig. In 1939, he walked into his graduate statistics class late. The 2 major open questions in the field were written on the chalkboard, and he solved them within a few days. That wound up being his PhD dissertation before he got interested in linear programming. Now, the algorithms used to efficiently solve one of your programs, they're pretty complicated, both versions of the simplex method and so called Interior Point algorithms. So, it may or may not be a good use of your time learning more about how the guts of these algorithms work. But what I'm most surely is a good use of your time is how to be an educated clients of these algorithms that is to get some practice formulating problems as linear programs and solving linear programs using either op, open source solvers. Or commercial linear programming solvers. Those solvers are a very powerful black box tool to have in your algorithm assists toolbox. In fact there are black box subroutines available to you that are strictly more powerful than linear programming. One generalization is convex programming, so linear functions are special case of convex functions and if you want to minimize the convex function or equivalently maximize a concave function subject to convex constraints that again is a problem you can solve efficiently both in theory in terms of polynomial time and in practice. May be the problem size is you can reach on. Quite as big with linear programming, but there pretty close. The different generalization is energer linear programming, so these are like linear program but where you add the extra constraint that certain decision variables have to take on and interger variable. So something like 1/2 as a value for a decsion variable would be disallowed in an interger program. Now these in general are no longer solvable efficiently in theory, it's easy to encode empty complete problems, like lets say vertex cover as a special case of engineer programming. But there is quite a bit of advance technology out there. For solving at least in special domains, integer programs. So again if you have an empty complete problem and you really need to solve it, integer programming is a technique your going to want to learn more about. What are some other topics we haven't had time to discuss? Well, first of all, for many of topics we have discussed, we've only scratched the surface. There's beautiful and useful stuff beyond what we've discussed in these discussed in these classes. On topics ranging from data structures, for example, with hashing in research trees, to graph algorithms. We already mentioned matchings and flows. To approximation alogrithm like better heuristics for the travelling salesman problem. A topic we've barely discussed at all is geometric algorithms, one exception being a divide and conquer algorithm for the closest pair problem that we discussed in part one. Roughly speaking, the study of geometric algorithms has two flavors, low dimensional problems and high dimensional Dimensional problems. So when I say low dimensional, I mean probably 2 or 3 dimension. So problems in the plane or problems in free space. A conical computational problem here would be the convex hull problem. Which means I give you a bunch of points and then you want to know which points are on the convex hull of the point set. So, here's a way to think about the convex hull problem in the plane. So, take a wooden board, a flat wooden board, that represents the plane. Now, pound a bunch of nails into the wooden board. Those are representing the points in the plane. Now, to compute the convex hull, what you can do is you can take a rubber band, stretch it out really far so that it bounds. All of the nails you pounded into the wooden board and now let the wo, rubber band go, and it's going to snap around the boundry nails. That's a way to computer the convex hull in 2 dimensions. And so the question is, how do you do that efficiently given just a bunch of coordinates of points? The state-of-the-art here are divide and conquer algorythm, very much in the spirit of the closest pair problem back in Part 1. So why would you care about computing convex hulls? Well, there's lots of reasons. But, you know, as one example, suppose you were doing some 3D graphics, and you had moving objects in your scene, and you wanted to know whether, when two objects collide. Because then you have to respond appropriately in rendering the scene. Well, to know where the two objects collide, you don't have to remember all of the details of the objects. You just have to keep track of their convex hulls. They collide exactly when their convex hulls intersect, and so that's one reason you might want to do that computation. A very different flavor of geometric algorithms is the high-dimensional case. And here, I'm thinking of the number of dimensions as being a thou, in the thousands say, or perhaps even quite a bit more than that. Now, why would you ever have points in the thousands of dimensions? Well, imagine you were doing say, information retrieval, imagine you had a bunch of documents. Now, documents don't really have geometry intrinsically, but it can be useful to imbue them with geometry. How do you do that? Well, imagine you make a list of all of the words in the world. That you care about. So maybe say, 10,000 words that you think are interesting. And for a given document, you just count the number of occurrences of each of these 10,000 interesting words in the document. So maybe lots of them are zero, the words don't occur at all. You know, but maybe there's some word that occurs seven times in the document. The document. So then you'd give it a seven in that coordinate and boom, you've now represented each document as a point in 10,000 dimensional space. One coordinate for each of the interesting words, the value of the coordinate being the frequency of that word in that document. Now you can start asking questions like, given a new document is it similar to any of the documents you've already been storing? And geometrically, that just corresponds to asking something called a nearest neighbor query. Given a bunch of points and possibly high dimensional space, and given a new point, which of the previous points is closest, say a new distance, to the new point you where just given? That would be a canonical question you would ask, in high dimensional computational geometry. In these Design and Analysis of Algorithms courses, we've been focusing on the most classical style of computation, where you're given upfront an input. You do a computation and you produce some output. And you take a bow, and you leave the stage. But if you think about computation in the real world, there's plenty of algorithms whose work Is never finished, algorithms that in effect run forever. Two applications that we've mentioned in passing in this class are the caching problems, so for example if you're writing an algorithm to manage a cache or an algorithm system, that algorithms work is never really done. It's going to have to manage the cache Indefinitely. Similarly, you can think about routing in a network, say in Internet routing. Again, that algorithm's work is never done. There's always going to be link failures. There's always going to be new nodes. There's always going to be new data to route. So, it has to keep making routing decisions indefinitely, as long as the Internet exists. As you would hope, both the theory and the practice of understanding algorithms that run indefinitely, making decisions in real time, is based quite squarely on the lessons that we've learned in the classical computational paradigm here. But it is an interesting. Topic worthy of study in its own right. A major concern of algorithms research in the 21st century is massive data sets, meaning data sets that are way too big to fit in the main memory of a single machine. One hot topic has been so-called streaming algorithms, that is algorithms which have to in effect take a firehose of data being generated at a torrential pace and boil it down using small space into just a few accurate statistics. So you might think, for example, about an algorithm running locally at a network router with data packets flying through the router at a ridiculous pace or an algorithm which is responsible for summarizing the data generated by a bunch of telescopic observations. A final, important topic by which the current theory is actually quite immature is understanding how to exploit parallelism in tackling massive data sets. For example, using the distributive system's approach exported by the map produced Hadoop Systems.