1 00:00:00,012 --> 00:00:04,103 [MUSIC] Next, let me mention, embarrassingly briefly, linear 2 00:00:04,103 --> 00:00:07,665 programming. So linear programming is a problem that, 3 00:00:07,665 --> 00:00:12,360 on the one hand, is efficiently solvable both in theory and in practice. 4 00:00:12,360 --> 00:00:17,562 And, on the other hand, is very general. It includes as a special case, bipartite 5 00:00:17,562 --> 00:00:20,902 matching, maximum flow, and tons of other stuff. 6 00:00:20,902 --> 00:00:25,777 The general problem solved by linear programming is to optimize a linear 7 00:00:25,777 --> 00:00:31,477 function, maximize or minimize it doesn't matter, optimize a linear function over a 8 00:00:31,477 --> 00:00:35,777 set of linear constraints. Geometrically, that corresponds to 9 00:00:35,777 --> 00:00:40,802 finding the best point, the optimal point, in the feasible region which is 10 00:00:40,802 --> 00:00:45,062 representable as the inner section intersection of half spaces. 11 00:00:45,062 --> 00:00:49,127 So let me just draw a cartoon picture, which is going to be in the plane. 12 00:00:49,127 --> 00:00:53,507 Although let me emphasize, the power of linear programming is that you can 13 00:00:53,507 --> 00:00:57,682 efficiently solve problems that are in a massively higher dimensions. 14 00:00:57,682 --> 00:01:00,572 Not 2 dimensions, but think millions of them. 15 00:01:00,572 --> 00:01:04,087 Dimensions. So in the plane, the intersection of half 16 00:01:04,087 --> 00:01:09,264 planes, assuming it's bounded, is just going to be a polygon and you could think 17 00:01:09,264 --> 00:01:14,391 about optimizing a linear function are just trying to push as far as possible in 18 00:01:14,391 --> 00:01:18,313 some direction subject to being inside the feasible region. 19 00:01:18,313 --> 00:01:23,222 So, for example, maybe you want to push as far to the northeast as possible 20 00:01:23,222 --> 00:01:27,574 Subject to being one of these blue points, somewhere in this polygon. 21 00:01:27,574 --> 00:01:32,459 For example, the maximum flow of problem is easily encoded as a special case of 22 00:01:32,459 --> 00:01:36,002 linear programming. You use one dimension, that is, one 23 00:01:36,002 --> 00:01:40,649 decision variable for each edge. The decision variable describes how much 24 00:01:40,649 --> 00:01:45,001 flow you route on a given edge. The linear objective function would be 25 00:01:45,001 --> 00:01:49,637 the maximized The sum of the flows on the edges going out of the source vertex. 26 00:01:49,637 --> 00:01:53,552 And then you would have linear constraints just insisting that the 27 00:01:53,552 --> 00:01:58,137 amount of flow coming into a vertex is the same as the amount of flow coming out 28 00:01:58,137 --> 00:02:01,297 of a vertex. And, linear programming code is not just 29 00:02:01,297 --> 00:02:05,582 maximum programming flow, but tons and tons of other problems as well. 30 00:02:05,582 --> 00:02:11,892 Despite its generality linear programming problems can be solved efficiently both 31 00:02:11,892 --> 00:02:16,477 in theory and in practice. The systematic formalization and 32 00:02:16,477 --> 00:02:22,412 algorithm solution of linear programs was pioneer by George Dantzig back in the 33 00:02:22,412 --> 00:02:25,792 1940's. In particular, Dantzig invented what's 34 00:02:25,792 --> 00:02:31,290 called the simplex method which remains to date one of the most important. 35 00:02:32,482 --> 00:02:35,192 Practical methods for solving linear programs. 36 00:02:35,192 --> 00:02:39,712 Have you ever heard that story about the student who walks in late to a class, 37 00:02:39,712 --> 00:02:44,272 sees what he thinks is homework written up on the chalk board and solves them, 38 00:02:44,272 --> 00:02:48,082 not knowing that they are in fact major open research questions? 39 00:02:48,082 --> 00:02:52,397 Well, maybe you thought that story was apocryphal, assuming I did for many 40 00:02:52,397 --> 00:02:55,661 years, But it's not. Turns out it's totally about George 41 00:02:55,661 --> 00:02:58,153 Dantzig. In 1939, he walked into his graduate 42 00:02:58,153 --> 00:03:01,639 statistics class late. The 2 major open questions in the field 43 00:03:01,639 --> 00:03:05,377 were written on the chalkboard, and he solved them within a few days. 44 00:03:05,377 --> 00:03:09,503 That wound up being his PhD dissertation before he got interested in linear 45 00:03:09,503 --> 00:03:12,653 programming. Now, the algorithms used to efficiently 46 00:03:12,653 --> 00:03:16,838 solve one of your programs, they're pretty complicated, both versions of the 47 00:03:16,838 --> 00:03:20,122 simplex method and so called Interior Point algorithms. 48 00:03:20,122 --> 00:03:24,078 So, it may or may not be a good use of your time learning more about how the 49 00:03:24,078 --> 00:03:29,445 guts of these algorithms work. But what I'm most surely is a good use of 50 00:03:29,445 --> 00:03:34,009 your time is how to be an educated clients of these algorithms that is to 51 00:03:34,009 --> 00:03:38,970 get some practice formulating problems as linear programs and solving linear 52 00:03:38,970 --> 00:03:42,052 programs using either op, open source solvers. 53 00:03:42,052 --> 00:03:48,047 Or commercial linear programming solvers. Those solvers are a very powerful black 54 00:03:48,047 --> 00:03:51,822 box tool to have in your algorithm assists toolbox. 55 00:03:51,822 --> 00:03:56,330 In fact there are black box subroutines available to you that are strictly more 56 00:03:56,330 --> 00:04:00,826 powerful than linear programming. One generalization is convex programming, 57 00:04:00,826 --> 00:04:04,994 so linear functions are special case of convex functions and if you want to 58 00:04:04,994 --> 00:04:09,213 minimize the convex function or equivalently maximize a concave function 59 00:04:09,213 --> 00:04:13,759 subject to convex constraints that again is a problem you can solve efficiently 60 00:04:13,759 --> 00:04:17,146 both in theory in terms of polynomial time and in practice. 61 00:04:17,146 --> 00:04:19,812 May be the problem size is you can reach on. 62 00:04:19,812 --> 00:04:23,544 Quite as big with linear programming, but there pretty close. 63 00:04:23,544 --> 00:04:28,266 The different generalization is energer linear programming, so these are like 64 00:04:28,266 --> 00:04:32,790 linear program but where you add the extra constraint that certain decision 65 00:04:32,790 --> 00:04:35,721 variables have to take on and interger variable. 66 00:04:35,721 --> 00:04:40,320 So something like 1/2 as a value for a decsion variable would be disallowed in 67 00:04:40,320 --> 00:04:44,309 an interger program. Now these in general are no longer 68 00:04:44,309 --> 00:04:49,349 solvable efficiently in theory, it's easy to encode empty complete problems, like 69 00:04:49,349 --> 00:04:53,267 lets say vertex cover as a special case of engineer programming. 70 00:04:53,267 --> 00:04:56,832 But there is quite a bit of advance technology out there. 71 00:04:56,832 --> 00:05:00,657 For solving at least in special domains, integer programs. 72 00:05:00,657 --> 00:05:05,626 So again if you have an empty complete problem and you really need to solve it, 73 00:05:05,626 --> 00:05:11,230 integer programming is a technique your going to want to learn more about. 74 00:05:11,230 --> 00:05:16,192 What are some other topics we haven't had time to discuss? Well, first of all, for 75 00:05:16,192 --> 00:05:20,132 many of topics we have discussed, we've only scratched the surface. 76 00:05:20,132 --> 00:05:25,047 There's beautiful and useful stuff beyond what we've discussed in these discussed 77 00:05:25,047 --> 00:05:28,507 in these classes. On topics ranging from data structures, 78 00:05:28,507 --> 00:05:33,237 for example, with hashing in research trees, to graph algorithms. We already 79 00:05:33,237 --> 00:05:37,387 mentioned matchings and flows. To approximation alogrithm like better 80 00:05:37,387 --> 00:05:40,490 heuristics for the travelling salesman problem. 81 00:05:40,490 --> 00:05:46,160 A topic we've barely discussed at all is geometric algorithms, one exception being 82 00:05:46,160 --> 00:05:51,602 a divide and conquer algorithm for the closest pair problem that we discussed in 83 00:05:51,602 --> 00:05:55,120 part one. Roughly speaking, the study of geometric 84 00:05:55,120 --> 00:06:00,292 algorithms has two flavors, low dimensional problems and high dimensional 85 00:06:00,292 --> 00:06:03,556 Dimensional problems. So when I say low dimensional, I mean 86 00:06:03,556 --> 00:06:07,971 probably 2 or 3 dimension. So problems in the plane or problems in free space. 87 00:06:07,971 --> 00:06:11,931 A conical computational problem here would be the convex hull problem. 88 00:06:11,931 --> 00:06:16,199 Which means I give you a bunch of points and then you want to know which points 89 00:06:16,199 --> 00:06:20,529 are on the convex hull of the point set. So, here's a way to think about the 90 00:06:20,529 --> 00:06:24,494 convex hull problem in the plane. So, take a wooden board, a flat wooden 91 00:06:24,494 --> 00:06:28,232 board, that represents the plane. Now, pound a bunch of nails into the 92 00:06:28,232 --> 00:06:31,196 wooden board. Those are representing the points in the 93 00:06:31,196 --> 00:06:33,915 plane. Now, to compute the convex hull, what you 94 00:06:33,915 --> 00:06:37,787 can do is you can take a rubber band, stretch it out really far so that it 95 00:06:37,787 --> 00:06:40,432 bounds. All of the nails you pounded into the 96 00:06:40,432 --> 00:06:44,792 wooden board and now let the wo, rubber band go, and it's going to snap around 97 00:06:44,792 --> 00:06:48,326 the boundry nails. That's a way to computer the convex hull 98 00:06:48,326 --> 00:06:53,107 in 2 dimensions. And so the question is, how do you do that efficiently given just 99 00:06:53,107 --> 00:06:57,590 a bunch of coordinates of points? The state-of-the-art here are divide and 100 00:06:57,590 --> 00:07:02,098 conquer algorythm, very much in the spirit of the closest pair problem back 101 00:07:02,098 --> 00:07:04,796 in Part 1. So why would you care about computing 102 00:07:04,796 --> 00:07:07,288 convex hulls? Well, there's lots of reasons. 103 00:07:07,288 --> 00:07:11,598 But, you know, as one example, suppose you were doing some 3D graphics, and you 104 00:07:11,598 --> 00:07:16,035 had moving objects in your scene, and you wanted to know whether, when two objects 105 00:07:16,035 --> 00:07:18,021 collide. Because then you have to respond 106 00:07:18,021 --> 00:07:21,285 appropriately in rendering the scene. Well, to know where the two objects 107 00:07:21,285 --> 00:07:24,386 collide, you don't have to remember all of the details of the objects. 108 00:07:24,386 --> 00:07:26,588 You just have to keep track of their convex hulls. 109 00:07:26,588 --> 00:07:30,134 They collide exactly when their convex hulls intersect, and so that's one reason 110 00:07:30,134 --> 00:07:34,761 you might want to do that computation. A very different flavor of geometric 111 00:07:34,761 --> 00:07:38,944 algorithms is the high-dimensional case. And here, I'm thinking of the number of 112 00:07:38,944 --> 00:07:42,704 dimensions as being a thou, in the thousands say, or perhaps even quite a 113 00:07:42,704 --> 00:07:45,605 bit more than that. Now, why would you ever have points in 114 00:07:45,605 --> 00:07:49,539 the thousands of dimensions? Well, imagine you were doing say, information 115 00:07:49,539 --> 00:07:52,352 retrieval, imagine you had a bunch of documents. 116 00:07:52,352 --> 00:07:57,487 Now, documents don't really have geometry intrinsically, but it can be useful to 117 00:07:57,487 --> 00:08:01,502 imbue them with geometry. How do you do that? Well, imagine you 118 00:08:01,502 --> 00:08:04,262 make a list of all of the words in the world. 119 00:08:04,262 --> 00:08:07,112 That you care about. So maybe say, 10,000 words that you think 120 00:08:07,112 --> 00:08:09,737 are interesting. And for a given document, you just count 121 00:08:09,737 --> 00:08:13,162 the number of occurrences of each of these 10,000 interesting words in the 122 00:08:13,162 --> 00:08:15,562 document. So maybe lots of them are zero, the words 123 00:08:15,562 --> 00:08:18,087 don't occur at all. You know, but maybe there's some word 124 00:08:18,087 --> 00:08:20,867 that occurs seven times in the document. The document. 125 00:08:20,867 --> 00:08:24,602 So then you'd give it a seven in that coordinate and boom, you've now 126 00:08:24,602 --> 00:08:28,237 represented each document as a point in 10,000 dimensional space. 127 00:08:28,237 --> 00:08:31,862 One coordinate for each of the interesting words, the value of the 128 00:08:31,862 --> 00:08:35,257 coordinate being the frequency of that word in that document. 129 00:08:35,257 --> 00:08:39,702 Now you can start asking questions like, given a new document is it similar to any 130 00:08:39,702 --> 00:08:44,021 of the documents you've already been storing? And geometrically, that just 131 00:08:44,021 --> 00:08:47,799 corresponds to asking something called a nearest neighbor query. 132 00:08:47,799 --> 00:08:52,571 Given a bunch of points and possibly high dimensional space, and given a new point, 133 00:08:52,571 --> 00:08:57,683 which of the previous points is closest, say a new distance, to the new point you 134 00:08:57,683 --> 00:09:01,989 where just given? That would be a canonical question you would ask, in high 135 00:09:01,989 --> 00:09:06,316 dimensional computational geometry. In these Design and Analysis of 136 00:09:06,316 --> 00:09:10,756 Algorithms courses, we've been focusing on the most classical style of 137 00:09:10,756 --> 00:09:13,854 computation, where you're given upfront an input. 138 00:09:13,854 --> 00:09:16,933 You do a computation and you produce some output. 139 00:09:16,933 --> 00:09:19,648 And you take a bow, and you leave the stage. 140 00:09:19,648 --> 00:09:24,784 But if you think about computation in the real world, there's plenty of algorithms 141 00:09:24,784 --> 00:09:28,907 whose work Is never finished, algorithms that in effect run forever. 142 00:09:28,907 --> 00:09:33,192 Two applications that we've mentioned in passing in this class are the caching 143 00:09:33,192 --> 00:09:37,347 problems, so for example if you're writing an algorithm to manage a cache or 144 00:09:37,347 --> 00:09:40,887 an algorithm system, that algorithms work is never really done. 145 00:09:40,887 --> 00:09:43,737 It's going to have to manage the cache Indefinitely. 146 00:09:43,737 --> 00:09:47,712 Similarly, you can think about routing in a network, say in Internet routing. 147 00:09:47,712 --> 00:09:49,972 Again, that algorithm's work is never done. 148 00:09:49,972 --> 00:09:53,732 There's always going to be link failures. There's always going to be new nodes. 149 00:09:53,732 --> 00:09:55,882 There's always going to be new data to route. 150 00:09:55,882 --> 00:09:59,557 So, it has to keep making routing decisions indefinitely, as long as the 151 00:09:59,557 --> 00:10:02,790 Internet exists. As you would hope, both the theory and 152 00:10:02,790 --> 00:10:07,538 the practice of understanding algorithms that run indefinitely, making decisions 153 00:10:07,538 --> 00:10:11,970 in real time, is based quite squarely on the lessons that we've learned in the 154 00:10:11,970 --> 00:10:15,982 classical computational paradigm here. But it is an interesting. 155 00:10:15,982 --> 00:10:21,943 Topic worthy of study in its own right. A major concern of algorithms research in 156 00:10:21,943 --> 00:10:27,763 the 21st century is massive data sets, meaning data sets that are way too big to 157 00:10:27,763 --> 00:10:31,082 fit in the main memory of a single machine. 158 00:10:31,082 --> 00:10:36,548 One hot topic has been so-called streaming algorithms, that is algorithms 159 00:10:36,548 --> 00:10:42,253 which have to in effect take a firehose of data being generated at a torrential 160 00:10:42,253 --> 00:10:47,489 pace and boil it down using small space into just a few accurate statistics. 161 00:10:47,489 --> 00:10:51,775 So you might think, for example, about an algorithm running locally at a network 162 00:10:51,775 --> 00:10:55,837 router with data packets flying through the router at a ridiculous pace or an 163 00:10:55,837 --> 00:10:59,907 algorithm which is responsible for summarizing the data generated by a bunch 164 00:10:59,907 --> 00:11:04,141 of telescopic observations. A final, important topic by which the 165 00:11:04,141 --> 00:11:09,393 current theory is actually quite immature is understanding how to exploit 166 00:11:09,393 --> 00:11:12,491 parallelism in tackling massive data sets. 167 00:11:12,491 --> 00:11:17,689 For example, using the distributive system's approach exported by the map 168 00:11:17,689 --> 00:11:19,648 produced Hadoop Systems.