1
00:00:00,012 --> 00:00:04,103
[MUSIC] Next, let me mention, 
embarrassingly briefly, linear 

2
00:00:04,103 --> 00:00:07,665
programming. 
So linear programming is a problem that, 

3
00:00:07,665 --> 00:00:12,360
on the one hand, is efficiently solvable 
both in theory and in practice. 

4
00:00:12,360 --> 00:00:17,562
And, on the other hand, is very general. 
It includes as a special case, bipartite 

5
00:00:17,562 --> 00:00:20,902
matching, maximum flow, and tons of other 
stuff. 

6
00:00:20,902 --> 00:00:25,777
The general problem solved by linear 
programming is to optimize a linear 

7
00:00:25,777 --> 00:00:31,477
function, maximize or minimize it doesn't 
matter, optimize a linear function over a 

8
00:00:31,477 --> 00:00:35,777
set of linear constraints. 
Geometrically, that corresponds to 

9
00:00:35,777 --> 00:00:40,802
finding the best point, the optimal 
point, in the feasible region which is 

10
00:00:40,802 --> 00:00:45,062
representable as the inner section 
intersection of half spaces. 

11
00:00:45,062 --> 00:00:49,127
So let me just draw a cartoon picture, 
which is going to be in the plane. 

12
00:00:49,127 --> 00:00:53,507
Although let me emphasize, the power of 
linear programming is that you can 

13
00:00:53,507 --> 00:00:57,682
efficiently solve problems that are in a 
massively higher dimensions. 

14
00:00:57,682 --> 00:01:00,572
Not 2 dimensions, but think millions of 
them. 

15
00:01:00,572 --> 00:01:04,087
Dimensions. 
So in the plane, the intersection of half 

16
00:01:04,087 --> 00:01:09,264
planes, assuming it's bounded, is just 
going to be a polygon and you could think 

17
00:01:09,264 --> 00:01:14,391
about optimizing a linear function are 
just trying to push as far as possible in 

18
00:01:14,391 --> 00:01:18,313
some direction subject to being inside 
the feasible region. 

19
00:01:18,313 --> 00:01:23,222
So, for example, maybe you want to push 
as far to the northeast as possible 

20
00:01:23,222 --> 00:01:27,574
Subject to being one of these blue 
points, somewhere in this polygon. 

21
00:01:27,574 --> 00:01:32,459
For example, the maximum flow of problem 
is easily encoded as a special case of 

22
00:01:32,459 --> 00:01:36,002
linear programming. 
You use one dimension, that is, one 

23
00:01:36,002 --> 00:01:40,649
decision variable for each edge. 
The decision variable describes how much 

24
00:01:40,649 --> 00:01:45,001
flow you route on a given edge. 
The linear objective function would be 

25
00:01:45,001 --> 00:01:49,637
the maximized The sum of the flows on the 
edges going out of the source vertex. 

26
00:01:49,637 --> 00:01:53,552
And then you would have linear 
constraints just insisting that the 

27
00:01:53,552 --> 00:01:58,137
amount of flow coming into a vertex is 
the same as the amount of flow coming out 

28
00:01:58,137 --> 00:02:01,297
of a vertex. 
And, linear programming code is not just 

29
00:02:01,297 --> 00:02:05,582
maximum programming flow, but tons and 
tons of other problems as well. 

30
00:02:05,582 --> 00:02:11,892
Despite its generality linear programming 
problems can be solved efficiently both 

31
00:02:11,892 --> 00:02:16,477
in theory and in practice. 
The systematic formalization and 

32
00:02:16,477 --> 00:02:22,412
algorithm solution of linear programs was 
pioneer by George Dantzig back in the 

33
00:02:22,412 --> 00:02:25,792
1940's. 
In particular, Dantzig invented what's 

34
00:02:25,792 --> 00:02:31,290
called the simplex method which remains 
to date one of the most important. 

35
00:02:32,482 --> 00:02:35,192
Practical methods for solving linear 
programs. 

36
00:02:35,192 --> 00:02:39,712
Have you ever heard that story about the 
student who walks in late to a class, 

37
00:02:39,712 --> 00:02:44,272
sees what he thinks is homework written 
up on the chalk board and solves them, 

38
00:02:44,272 --> 00:02:48,082
not knowing that they are in fact major 
open research questions? 

39
00:02:48,082 --> 00:02:52,397
Well, maybe you thought that story was 
apocryphal, assuming I did for many 

40
00:02:52,397 --> 00:02:55,661
years, But it's not. 
Turns out it's totally about George 

41
00:02:55,661 --> 00:02:58,153
Dantzig. 
In 1939, he walked into his graduate 

42
00:02:58,153 --> 00:03:01,639
statistics class late. 
The 2 major open questions in the field 

43
00:03:01,639 --> 00:03:05,377
were written on the chalkboard, and he 
solved them within a few days. 

44
00:03:05,377 --> 00:03:09,503
That wound up being his PhD dissertation 
before he got interested in linear 

45
00:03:09,503 --> 00:03:12,653
programming. 
Now, the algorithms used to efficiently 

46
00:03:12,653 --> 00:03:16,838
solve one of your programs, they're 
pretty complicated, both versions of the 

47
00:03:16,838 --> 00:03:20,122
simplex method and so called Interior 
Point algorithms. 

48
00:03:20,122 --> 00:03:24,078
So, it may or may not be a good use of 
your time learning more about how the 

49
00:03:24,078 --> 00:03:29,445
guts of these algorithms work. 
But what I'm most surely is a good use of 

50
00:03:29,445 --> 00:03:34,009
your time is how to be an educated 
clients of these algorithms that is to 

51
00:03:34,009 --> 00:03:38,970
get some practice formulating problems as 
linear programs and solving linear 

52
00:03:38,970 --> 00:03:42,052
programs using either op, open source 
solvers. 

53
00:03:42,052 --> 00:03:48,047
Or commercial linear programming solvers. 
Those solvers are a very powerful black 

54
00:03:48,047 --> 00:03:51,822
box tool to have in your algorithm 
assists toolbox. 

55
00:03:51,822 --> 00:03:56,330
In fact there are black box subroutines 
available to you that are strictly more 

56
00:03:56,330 --> 00:04:00,826
powerful than linear programming. 
One generalization is convex programming, 

57
00:04:00,826 --> 00:04:04,994
so linear functions are special case of 
convex functions and if you want to 

58
00:04:04,994 --> 00:04:09,213
minimize the convex function or 
equivalently maximize a concave function 

59
00:04:09,213 --> 00:04:13,759
subject to convex constraints that again 
is a problem you can solve efficiently 

60
00:04:13,759 --> 00:04:17,146
both in theory in terms of polynomial 
time and in practice. 

61
00:04:17,146 --> 00:04:19,812
May be the problem size is you can reach 
on. 

62
00:04:19,812 --> 00:04:23,544
Quite as big with linear programming, but 
there pretty close. 

63
00:04:23,544 --> 00:04:28,266
The different generalization is energer 
linear programming, so these are like 

64
00:04:28,266 --> 00:04:32,790
linear program but where you add the 
extra constraint that certain decision 

65
00:04:32,790 --> 00:04:35,721
variables have to take on and interger 
variable. 

66
00:04:35,721 --> 00:04:40,320
So something like 1/2 as a value for a 
decsion variable would be disallowed in 

67
00:04:40,320 --> 00:04:44,309
an interger program. 
Now these in general are no longer 

68
00:04:44,309 --> 00:04:49,349
solvable efficiently in theory, it's easy 
to encode empty complete problems, like 

69
00:04:49,349 --> 00:04:53,267
lets say vertex cover as a special case 
of engineer programming. 

70
00:04:53,267 --> 00:04:56,832
But there is quite a bit of advance 
technology out there. 

71
00:04:56,832 --> 00:05:00,657
For solving at least in special domains, 
integer programs. 

72
00:05:00,657 --> 00:05:05,626
So again if you have an empty complete 
problem and you really need to solve it, 

73
00:05:05,626 --> 00:05:11,230
integer programming is a technique your 
going to want to learn more about. 

74
00:05:11,230 --> 00:05:16,192
What are some other topics we haven't had 
time to discuss? Well, first of all, for 

75
00:05:16,192 --> 00:05:20,132
many of topics we have discussed, we've 
only scratched the surface. 

76
00:05:20,132 --> 00:05:25,047
There's beautiful and useful stuff beyond 
what we've discussed in these discussed 

77
00:05:25,047 --> 00:05:28,507
in these classes. 
On topics ranging from data structures, 

78
00:05:28,507 --> 00:05:33,237
for example, with hashing in research 
trees, to graph algorithms. We already 

79
00:05:33,237 --> 00:05:37,387
mentioned matchings and flows. 
To approximation alogrithm like better 

80
00:05:37,387 --> 00:05:40,490
heuristics for the travelling salesman 
problem. 

81
00:05:40,490 --> 00:05:46,160
A topic we've barely discussed at all is 
geometric algorithms, one exception being 

82
00:05:46,160 --> 00:05:51,602
a divide and conquer algorithm for the 
closest pair problem that we discussed in 

83
00:05:51,602 --> 00:05:55,120
part one. 
Roughly speaking, the study of geometric 

84
00:05:55,120 --> 00:06:00,292
algorithms has two flavors, low 
dimensional problems and high dimensional 

85
00:06:00,292 --> 00:06:03,556
Dimensional problems. 
So when I say low dimensional, I mean 

86
00:06:03,556 --> 00:06:07,971
probably 2 or 3 dimension. So problems in 
the plane or problems in free space. 

87
00:06:07,971 --> 00:06:11,931
A conical computational problem here 
would be the convex hull problem. 

88
00:06:11,931 --> 00:06:16,199
Which means I give you a bunch of points 
and then you want to know which points 

89
00:06:16,199 --> 00:06:20,529
are on the convex hull of the point set. 
So, here's a way to think about the 

90
00:06:20,529 --> 00:06:24,494
convex hull problem in the plane. 
So, take a wooden board, a flat wooden 

91
00:06:24,494 --> 00:06:28,232
board, that represents the plane. 
Now, pound a bunch of nails into the 

92
00:06:28,232 --> 00:06:31,196
wooden board. 
Those are representing the points in the 

93
00:06:31,196 --> 00:06:33,915
plane. 
Now, to compute the convex hull, what you 

94
00:06:33,915 --> 00:06:37,787
can do is you can take a rubber band, 
stretch it out really far so that it 

95
00:06:37,787 --> 00:06:40,432
bounds. 
All of the nails you pounded into the 

96
00:06:40,432 --> 00:06:44,792
wooden board and now let the wo, rubber 
band go, and it's going to snap around 

97
00:06:44,792 --> 00:06:48,326
the boundry nails. 
That's a way to computer the convex hull 

98
00:06:48,326 --> 00:06:53,107
in 2 dimensions. And so the question is, 
how do you do that efficiently given just 

99
00:06:53,107 --> 00:06:57,590
a bunch of coordinates of points? The 
state-of-the-art here are divide and 

100
00:06:57,590 --> 00:07:02,098
conquer algorythm, very much in the 
spirit of the closest pair problem back 

101
00:07:02,098 --> 00:07:04,796
in Part 1. 
So why would you care about computing 

102
00:07:04,796 --> 00:07:07,288
convex hulls? Well, there's lots of 
reasons. 

103
00:07:07,288 --> 00:07:11,598
But, you know, as one example, suppose 
you were doing some 3D graphics, and you 

104
00:07:11,598 --> 00:07:16,035
had moving objects in your scene, and you 
wanted to know whether, when two objects 

105
00:07:16,035 --> 00:07:18,021
collide. 
Because then you have to respond 

106
00:07:18,021 --> 00:07:21,285
appropriately in rendering the scene. 
Well, to know where the two objects 

107
00:07:21,285 --> 00:07:24,386
collide, you don't have to remember all 
of the details of the objects. 

108
00:07:24,386 --> 00:07:26,588
You just have to keep track of their 
convex hulls. 

109
00:07:26,588 --> 00:07:30,134
They collide exactly when their convex 
hulls intersect, and so that's one reason 

110
00:07:30,134 --> 00:07:34,761
you might want to do that computation. 
A very different flavor of geometric 

111
00:07:34,761 --> 00:07:38,944
algorithms is the high-dimensional case. 
And here, I'm thinking of the number of 

112
00:07:38,944 --> 00:07:42,704
dimensions as being a thou, in the 
thousands say, or perhaps even quite a 

113
00:07:42,704 --> 00:07:45,605
bit more than that. 
Now, why would you ever have points in 

114
00:07:45,605 --> 00:07:49,539
the thousands of dimensions? Well, 
imagine you were doing say, information 

115
00:07:49,539 --> 00:07:52,352
retrieval, imagine you had a bunch of 
documents. 

116
00:07:52,352 --> 00:07:57,487
Now, documents don't really have geometry 
intrinsically, but it can be useful to 

117
00:07:57,487 --> 00:08:01,502
imbue them with geometry. 
How do you do that? Well, imagine you 

118
00:08:01,502 --> 00:08:04,262
make a list of all of the words in the 
world. 

119
00:08:04,262 --> 00:08:07,112
That you care about. 
So maybe say, 10,000 words that you think 

120
00:08:07,112 --> 00:08:09,737
are interesting. 
And for a given document, you just count 

121
00:08:09,737 --> 00:08:13,162
the number of occurrences of each of 
these 10,000 interesting words in the 

122
00:08:13,162 --> 00:08:15,562
document. 
So maybe lots of them are zero, the words 

123
00:08:15,562 --> 00:08:18,087
don't occur at all. 
You know, but maybe there's some word 

124
00:08:18,087 --> 00:08:20,867
that occurs seven times in the document. 
The document. 

125
00:08:20,867 --> 00:08:24,602
So then you'd give it a seven in that 
coordinate and boom, you've now 

126
00:08:24,602 --> 00:08:28,237
represented each document as a point in 
10,000 dimensional space. 

127
00:08:28,237 --> 00:08:31,862
One coordinate for each of the 
interesting words, the value of the 

128
00:08:31,862 --> 00:08:35,257
coordinate being the frequency of that 
word in that document. 

129
00:08:35,257 --> 00:08:39,702
Now you can start asking questions like, 
given a new document is it similar to any 

130
00:08:39,702 --> 00:08:44,021
of the documents you've already been 
storing? And geometrically, that just 

131
00:08:44,021 --> 00:08:47,799
corresponds to asking something called a 
nearest neighbor query. 

132
00:08:47,799 --> 00:08:52,571
Given a bunch of points and possibly high 
dimensional space, and given a new point, 

133
00:08:52,571 --> 00:08:57,683
which of the previous points is closest, 
say a new distance, to the new point you 

134
00:08:57,683 --> 00:09:01,989
where just given? That would be a 
canonical question you would ask, in high 

135
00:09:01,989 --> 00:09:06,316
dimensional computational geometry. 
In these Design and Analysis of 

136
00:09:06,316 --> 00:09:10,756
Algorithms courses, we've been focusing 
on the most classical style of 

137
00:09:10,756 --> 00:09:13,854
computation, where you're given upfront 
an input. 

138
00:09:13,854 --> 00:09:16,933
You do a computation and you produce some 
output. 

139
00:09:16,933 --> 00:09:19,648
And you take a bow, and you leave the 
stage. 

140
00:09:19,648 --> 00:09:24,784
But if you think about computation in the 
real world, there's plenty of algorithms 

141
00:09:24,784 --> 00:09:28,907
whose work Is never finished, algorithms 
that in effect run forever. 

142
00:09:28,907 --> 00:09:33,192
Two applications that we've mentioned in 
passing in this class are the caching 

143
00:09:33,192 --> 00:09:37,347
problems, so for example if you're 
writing an algorithm to manage a cache or 

144
00:09:37,347 --> 00:09:40,887
an algorithm system, that algorithms work 
is never really done. 

145
00:09:40,887 --> 00:09:43,737
It's going to have to manage the cache 
Indefinitely. 

146
00:09:43,737 --> 00:09:47,712
Similarly, you can think about routing in 
a network, say in Internet routing. 

147
00:09:47,712 --> 00:09:49,972
Again, that algorithm's work is never 
done. 

148
00:09:49,972 --> 00:09:53,732
There's always going to be link failures. 
There's always going to be new nodes. 

149
00:09:53,732 --> 00:09:55,882
There's always going to be new data to 
route. 

150
00:09:55,882 --> 00:09:59,557
So, it has to keep making routing 
decisions indefinitely, as long as the 

151
00:09:59,557 --> 00:10:02,790
Internet exists. 
As you would hope, both the theory and 

152
00:10:02,790 --> 00:10:07,538
the practice of understanding algorithms 
that run indefinitely, making decisions 

153
00:10:07,538 --> 00:10:11,970
in real time, is based quite squarely on 
the lessons that we've learned in the 

154
00:10:11,970 --> 00:10:15,982
classical computational paradigm here. 
But it is an interesting. 

155
00:10:15,982 --> 00:10:21,943
Topic worthy of study in its own right. 
A major concern of algorithms research in 

156
00:10:21,943 --> 00:10:27,763
the 21st century is massive data sets, 
meaning data sets that are way too big to 

157
00:10:27,763 --> 00:10:31,082
fit in the main memory of a single 
machine. 

158
00:10:31,082 --> 00:10:36,548
One hot topic has been so-called 
streaming algorithms, that is algorithms 

159
00:10:36,548 --> 00:10:42,253
which have to in effect take a firehose 
of data being generated at a torrential 

160
00:10:42,253 --> 00:10:47,489
pace and boil it down using small space 
into just a few accurate statistics. 

161
00:10:47,489 --> 00:10:51,775
So you might think, for example, about an 
algorithm running locally at a network 

162
00:10:51,775 --> 00:10:55,837
router with data packets flying through 
the router at a ridiculous pace or an 

163
00:10:55,837 --> 00:10:59,907
algorithm which is responsible for 
summarizing the data generated by a bunch 

164
00:10:59,907 --> 00:11:04,141
of telescopic observations. 
A final, important topic by which the 

165
00:11:04,141 --> 00:11:09,393
current theory is actually quite immature 
is understanding how to exploit 

166
00:11:09,393 --> 00:11:12,491
parallelism in tackling massive data 
sets. 

167
00:11:12,491 --> 00:11:17,689
For example, using the distributive 
system's approach exported by the map 

168
00:11:17,689 --> 00:11:19,648
produced Hadoop Systems.