1 00:00:00,000 --> 00:00:03,478 So in this set of lectures we'll be discussing the minimum cut problem in 2 00:00:03,478 --> 00:00:06,769 graphs and we'll be discussing the randomized contraction algorithm. A 3 00:00:06,769 --> 00:00:10,766 randomized algorithm is just so simple and elegant, it's almost impossible to believe 4 00:00:10,766 --> 00:00:14,488 that it could possibly work, but that's exactly what we'll be proving. So one way 5 00:00:14,488 --> 00:00:17,841 you can think about this set of lectures is as a segue of sorts between our 6 00:00:17,841 --> 00:00:21,194 discussion of randomization and our discussion of graphs. So we just finished 7 00:00:21,194 --> 00:00:24,895 talking about randomization in the context of sorting and searching. We'll pick it up 8 00:00:24,895 --> 00:00:28,465 again towards the end of the class when we discuss hashing, but while we're in the 9 00:00:28,465 --> 00:00:31,513 middle of randomization probability review, I want to give you another 10 00:00:31,513 --> 00:00:34,996 application of randomization in a totally different domain. In particular to the 11 00:00:34,996 --> 00:00:38,636 domain of graphs rather than to Sorting and searching. So that's one high-level 12 00:00:38,636 --> 00:00:42,269 goal of these lectures. A second one is we'll get our feet wet talking about 13 00:00:42,269 --> 00:00:45,950 graphs. And a lot of the next couple weeks, that's what we're gonna be talking 14 00:00:45,950 --> 00:00:49,535 about: fundamental graph primitives. So this will give us an excuse to start 15 00:00:49,535 --> 00:00:53,360 warming up with the vocabulary, some of the basic concepts of the graph, and what 16 00:00:53,360 --> 00:00:57,573 a graph algorithm looks like. Another perk, although it's not one of the main 17 00:00:57,573 --> 00:01:02,198 goals, but I wanna do, I do wanna point out this fact. Is that, at least, compared 18 00:01:02,198 --> 00:01:07,052 to most of the stuff that we're discussing in this class, this is a relatively recent 19 00:01:07,052 --> 00:01:11,449 algorithm, the contraction algorithm. By relatively recent, I mean, okay, it's 20 00:01:11,449 --> 00:01:16,974 twenty years old. Well at least that means most of us, I know not all of us, but most 21 00:01:16,974 --> 00:01:21,294 of us at least were born at the time that this algorithm was invented. And so just 22 00:01:21,294 --> 00:01:25,461 one quick digression. You know, in the intro course like this most of what we're 23 00:01:25,461 --> 00:01:29,578 gonna cover will be oldies but goodies, so from as much as 50 years ago. And while 24 00:01:29,578 --> 00:01:33,745 it's kind of amazing given how much the world and how much technology has changed 25 00:01:33,745 --> 00:01:37,557 over the past 50 years that idea in computer science from that long ago is 26 00:01:37,557 --> 00:01:41,826 still useful, they are. Okay. So it's just sort of an amazing thing about the stuff 27 00:01:41,826 --> 00:01:45,739 that the first generation of computer scientists figured out. Okay. It's still; 28 00:01:45,739 --> 00:01:49,701 it's still relevant to this day. That. Said, algorithm is still a vibrant field 29 00:01:49,701 --> 00:01:53,916 with lots of open questions and when I have an opportunity I'll try and give you 30 00:01:53,916 --> 00:01:58,339 glimpses of that fact. So I do wanna point out here that this is somewhat more reason 31 00:01:58,339 --> 00:02:02,637 algorithms that most of the other ones will see which dates back to the 90s. So 32 00:02:02,637 --> 00:02:07,578 let's talk about graphs. Fundamentally, what a graph does is represent pair wise 33 00:02:07,578 --> 00:02:12,519 relationships amongst a set of objects. So as such, a graph is going to have two 34 00:02:12,519 --> 00:02:16,806 ingredients. So first of all there's the objects that you're talking about and 35 00:02:16,806 --> 00:02:21,092 these have two very common names and you're just gonna have to know both of the 36 00:02:21,092 --> 00:02:25,111 names even though they're completely synonymous. The first name is vertices. 37 00:02:25,111 --> 00:02:30,340 [sound] So vertex is the singular, vertices is the plural. Also known 38 00:02:30,340 --> 00:02:36,003 interchangeably as nodes. I'll be using the notation, capital V for the set of, of 39 00:02:36,003 --> 00:02:40,669 vertices. So those are the objects. Now, we wanna represent pair wise 40 00:02:40,676 --> 00:02:46,349 relationships. So these pairs are gonna be called edges. It'll be noted by denuded by 41 00:02:46,349 --> 00:02:51,196 capital E and there's two flavors of graphs and both are really important, both 42 00:02:51,196 --> 00:02:56,043 come up all the time in application. So you should be aware of both kinds. So 43 00:02:56,043 --> 00:03:00,951 there is undirected graphs and directed graphs and that just depends on whether 44 00:03:00,951 --> 00:03:05,737 the edges themselves are undirected or directed. So edges can be undirected by 45 00:03:05,737 --> 00:03:10,646 which I mean this pair is unordered. There are just two verses of edge. The two n 46 00:03:10,646 --> 00:03:15,125 points say U and V and you don't distinguish one as the first and one as 47 00:03:15,125 --> 00:03:20,171 the second. Or edges can be directed, in which case you have a directed graph and 48 00:03:20,171 --> 00:03:24,937 here a pair is ordered. So you do have a notion of a first vertex or a first end 49 00:03:24,937 --> 00:03:29,823 point and the second vertex or second end point of an edge. Those are often called 50 00:03:29,823 --> 00:03:34,649 the tail and the head, respectively. And once in a while although I'll, I'll try to 51 00:03:34,649 --> 00:03:39,684 use, I'll not use this terminology, you hear directed edges called arcs. Now I 52 00:03:39,684 --> 00:03:44,388 think all of this is much clearer if I just draw some pictures indeed one used to 53 00:03:44,388 --> 00:03:49,370 call graphs dots and lines, the dots would refer to the vertices. So here's four dots 54 00:03:49,370 --> 00:03:54,875 or four vertices. And, the edges would be lines. So the way you denote one of these 55 00:03:54,875 --> 00:04:00,183 edges, is, you just draw a line between the two, end points of that edge, the two 56 00:04:00,183 --> 00:04:05,557 vertices that it corresponds to. So this is an undirected graph with four vertices 57 00:04:05,557 --> 00:04:10,668 and five edges. We could equally well have a directed version of this graph. So 58 00:04:10,668 --> 00:04:15,926 let's, still have four vertices and five edges. But to indicate that this is a 59 00:04:15,926 --> 00:04:21,142 directed graph, and that each edge has a first vertex and a second vertex, we're 60 00:04:21,142 --> 00:04:26,226 gonna add arrows to the lines. So the arrow points to the second vertex, or to 61 00:04:26,226 --> 00:04:31,705 the head of the, of the edge. So the first vertex is often called the [inaudible] of 62 00:04:31,705 --> 00:04:35,911 the edge. So, graphs are completely fundamental. They show up not just in 63 00:04:35,911 --> 00:04:40,466 computer science but in all kinds of different disciplines social sciences and, 64 00:04:40,466 --> 00:04:44,911 and biology being two prominent ones. So, let me just mention a couple of reasons 65 00:04:44,911 --> 00:04:49,411 you might use them just off the top of my head. But literally there's hundreds or 66 00:04:49,411 --> 00:04:53,411 thousands of others. So, a very literal example would be road networks. So, 67 00:04:53,411 --> 00:04:57,744 imagine you type in asking for driving directions from point A to point B and 68 00:04:57,744 --> 00:05:02,077 some web application or software or whatever computes a, a road, a route for 69 00:05:02,077 --> 00:05:06,148 you. What it's doing is it's manipulating some representation of a road network, 70 00:05:06,148 --> 00:05:10,244 which inevitably is going to be stored as a graph, where the vertices correspond to 71 00:05:10,244 --> 00:05:14,065 intersections, and the edges correspond to individual roads. The web Is often 72 00:05:14,065 --> 00:05:18,421 fruitfully thought of as a graph, a directed graph. So here the vertices are 73 00:05:18,421 --> 00:05:22,605 the individual web pages and edges correspond to hyperlinks. So the first 74 00:05:22,605 --> 00:05:27,133 vertex of an edge, the tail, is going to be the page that contains the hyperlink. 75 00:05:27,133 --> 00:05:31,890 The second vertex or the head of the edge is going to be what the hyperlink points 76 00:05:31,890 --> 00:05:36,840 to. So that's the web as a directed graph. Social networks are, quite naturally, 77 00:05:36,840 --> 00:05:42,431 represented as graphs. So here, the vertices correspond to the individuals. In 78 00:05:42,431 --> 00:05:45,766 the social network and the edges corresponds to relationships, say 79 00:05:45,766 --> 00:05:49,960 friendship links. I encourage to think about amongst the popular social networks 80 00:05:49,960 --> 00:05:53,952 these days, which ones are undirected graphs and which ones are directed graphs. 81 00:05:53,952 --> 00:05:58,522 We have some interesting examples of each of those. And often graphs are useful even 82 00:05:58,522 --> 00:06:03,472 when there isn't such an obvious network structure. So, just to mention one example 83 00:06:03,472 --> 00:06:07,693 let me just write down precedents constraints. So, to say what I mean you 84 00:06:07,693 --> 00:06:12,412 might think about you know, say you're a freshman in college and you're looking at 85 00:06:12,412 --> 00:06:16,853 your, your major is [inaudible] a computer science major. Now, you wanna know what 86 00:06:16,853 --> 00:06:21,239 courses to take in what order. And you could think the following graph, where 87 00:06:21,239 --> 00:06:25,291 each of the courses in your major corresponds to a vertex and you draw a 88 00:06:25,291 --> 00:06:29,677 directed edge from one, from course A to course B, if course A is a prerequisite 89 00:06:29,677 --> 00:06:33,967 for course B, that is, it has to be completed before you begin course B. 'Kay, 90 00:06:33,967 --> 00:06:39,321 so that's a way to represent dependencies. Sort of, a temporal ordering between pairs 91 00:06:39,321 --> 00:06:43,657 of, pairs of objects. Using a directed graph. So that's the basic language of 92 00:06:43,657 --> 00:06:47,988 graphs. Let me now talk about cuts in graphs, cuz again this set of lectures is 93 00:06:47,988 --> 00:06:52,556 gonna be about the so-called minimum cut problem. So the definition of a cut of a 94 00:06:52,556 --> 00:06:56,769 graph is very simple. It's just a grouping, a partition, of the vertices of 95 00:06:56,769 --> 00:07:00,982 the graph, into two groups, A and B. And both of those two groups should be 96 00:07:00,982 --> 00:07:05,277 non-empty. So to describe this in pictures, let me give you a cartoon of a 97 00:07:05,277 --> 00:07:10,057 cut in both the undirected and directed cases. So for an undirected graph, you can 98 00:07:10,057 --> 00:07:15,794 imagine drawing your two sets A and B. And once you've defined the two sets, A and B, 99 00:07:15,794 --> 00:07:21,087 the edges then fall into one of three categories. You've got edges with both of 100 00:07:21,087 --> 00:07:26,379 the end points in A. You've got edges with both of the end points in B. And then 101 00:07:26,379 --> 00:07:31,544 you've got edges with one end point in A, and one endpoint in B. So that's 102 00:07:31,544 --> 00:07:36,680 generically what the picture graph looks like viewed through the lens of a 103 00:07:36,680 --> 00:07:41,541 particular cut AB. The picture for directed graphs is similar, you would 104 00:07:41,541 --> 00:07:46,950 again have an A, and you'd again have a B, you have directed edges with both end 105 00:07:46,950 --> 00:07:53,203 points in A, directed edges with both end points in B. And now you have actually two 106 00:07:53,203 --> 00:07:59,009 further categories. So, you have edges who cross the cut from left to right, that is, 107 00:07:59,009 --> 00:08:04,743 whose tail vertex is in a, and whose head vertex in b. And you can also have edges 108 00:08:04,743 --> 00:08:10,194 which cross the cut in the opposite direction that is their tail is in b and 109 00:08:10,194 --> 00:08:16,260 their head is in a. Usually when we talk about cuts we're gonna be concerned with 110 00:08:16,260 --> 00:08:21,790 how many edges cross a given cut. And by that I mean the following the crossing 111 00:08:21,790 --> 00:08:26,644 edges of a cut a, b. Are those that satisfy the following property. So in the 112 00:08:26,644 --> 00:08:30,643 [inaudible] case it's exactly what you think it would be. One of the endpoints is 113 00:08:30,643 --> 00:08:35,030 in A, the other endpoint is in B, that's what it means to cross the cut. Now in the 114 00:08:35,030 --> 00:08:39,189 directed case there's a number of reasonable definitions you could propose 115 00:08:39,189 --> 00:08:43,903 about which edges cross the cut. Typically and in this course we're going to focus on 116 00:08:43,903 --> 00:08:48,339 the case were we only think about edges that cross the cut from the left to the 117 00:08:48,339 --> 00:08:52,720 right. And we ignore edges that cross from the right to the left. So that is the 118 00:08:52,720 --> 00:08:57,209 edges that cross the cut are those with tail in A and head in B. So referring to 119 00:08:57,209 --> 00:09:01,790 our two pictures, our two cartoons of cuts, for the undirected one, all three of 120 00:09:01,790 --> 00:09:06,665 these blue edges would be edges crossing the cut AB, cuz they're the ones that have 121 00:09:06,665 --> 00:09:11,304 one end point on the left side, and one end point on the right side. Now for the 122 00:09:11,304 --> 00:09:16,002 directed one, we only have two crossing edges. So the two that cross from left to 123 00:09:16,002 --> 00:09:20,525 right, with tail in A and head in B. The one that's crossing backward does not 124 00:09:20,525 --> 00:09:25,312 contribute. We don't count it as a crossing edge of the cut. So the next quiz 125 00:09:25,312 --> 00:09:30,993 is just a sanity check that you've absorbed the definition of a cut of a 126 00:09:30,993 --> 00:09:36,883 graph. Right, so the answer to this quiz is the third option. Recall what is the 127 00:09:36,883 --> 00:09:41,531 definition of a cut? It's just the way the group, the vertices into two sets A and B 128 00:09:41,699 --> 00:09:46,178 both should also be non-empty. So we have N vertices and essentially we have one 129 00:09:46,178 --> 00:09:50,602 binary degree of freedom for each. For each vertex we can decide whether or not 130 00:09:50,602 --> 00:09:55,137 it goes in set A or goes in set B. So two choices for each of the N vertices. That 131 00:09:55,137 --> 00:09:59,617 gives us a two to the N possible choices. Two to the N possible cuts overall. Now 132 00:09:59,617 --> 00:10:04,320 that's slightly incorrect because recall that a cut can't have a non-empty set A or 133 00:10:04,320 --> 00:10:08,464 a non-empty set B. So those are two of the two to the N options which were 134 00:10:08,464 --> 00:10:12,592 disallowed. So strictly speaking, the number is two to the end, minus two. But, 135 00:10:12,592 --> 00:10:17,042 two to the end is certainly the closest answer of the four provided. Now the 136 00:10:17,042 --> 00:10:22,840 minimum cut problem is exactly what you'd think it would be. I'd give you as input a 137 00:10:22,840 --> 00:10:28,292 graph, and amongst these exponentially many cuts, I want you to identify one for 138 00:10:28,292 --> 00:10:33,812 me with the fewest number of crossing edges. So a few quick comments so first of 139 00:10:33,812 --> 00:10:39,468 all the name for this cut is a min cut a min cut is one with the fewest number of 140 00:10:39,468 --> 00:10:45,192 crossing edges. Secondly to clarify I am going to allow in the input what is called 141 00:10:45,192 --> 00:10:50,710 parallel edges. There will be lots of applications where parallel edges are sort 142 00:10:50,710 --> 00:10:56,366 of pointless but for minimum cuts it's natural to allow parallel edges that means 143 00:10:56,366 --> 00:11:01,497 you have two edges that correspond Finally, the more seasoned programmers 144 00:11:01,497 --> 00:11:05,399 among you are probably wondering what I mean by, you're given a graph as inputs. 145 00:11:05,399 --> 00:11:09,351 You might be wondering about how, exactly, that's represented. So the next video's 146 00:11:09,351 --> 00:11:13,254 gonna discuss exactly that. The popular ways of representing graphs, and, and how 147 00:11:13,254 --> 00:11:16,959 we're gonna usually do it in this course, specifically via what's called an 148 00:11:16,959 --> 00:11:22,692 adjacency list. Okay so I want to make sure that everybody understands exactly 149 00:11:22,692 --> 00:11:27,800 what the minimum cut problem is asking, so let me draw for you a particular graph. 150 00:11:28,240 --> 00:11:37,187 With A vertices. And quite a few edges. And what I want you to answer is what is 151 00:11:37,187 --> 00:11:42,430 the min-cut value in this graph, that is how many edged cross the man on cut, the 152 00:11:42,430 --> 00:11:50,324 cut with the fewest number of crossing edges. Alright, so the correct answer is 153 00:11:50,324 --> 00:11:56,220 the second choice. The [inaudible] cut value is two, and the cut which shows that 154 00:11:56,220 --> 00:12:01,690 is just to break it, basically, in half, into the two halves. In this case, there 155 00:12:01,690 --> 00:12:07,257 are only two crossing edges. This one and this one. And I'll leave it for you to 156 00:12:07,257 --> 00:12:12,044 check that there's no other edge that has as few as two edges. Now in this case, 157 00:12:12,044 --> 00:12:16,539 we've got a very balanced split when we took the minimum cut. In general, that 158 00:12:16,539 --> 00:12:21,384 need not be true. Sometimes even a single vertex can define the minimum cut of a 159 00:12:21,384 --> 00:12:27,290 graph. And I encourage you to think about a concrete example that proves that. So 160 00:12:27,290 --> 00:12:31,303 why should you care about computing the minimum cut? Well this is one problem 161 00:12:31,303 --> 00:12:35,466 amongst a genre called graph partitioning where you're given a graph and you wanna 162 00:12:35,466 --> 00:12:39,479 break it into two or more pieces. And these kinds of graph partitioning problems 163 00:12:39,479 --> 00:12:43,442 come up all the time in a surprisingly diverse array of applications. So let me 164 00:12:43,442 --> 00:12:47,606 just mention a couple at a high level. So one very obvious one when you have a, when 165 00:12:47,606 --> 00:12:51,568 your graph is representing a physical network. What identifies something like a 166 00:12:51,568 --> 00:12:55,641 [inaudible] allows you to do is identify weaknesses in your network. Then perhaps 167 00:12:55,641 --> 00:12:59,662 it's your own network and you want to understand where you want to soup up the 168 00:12:59,662 --> 00:13:03,683 infrastructure. Because it's in some sense a hot spot of your network or a weak 169 00:13:03,683 --> 00:13:07,601 point. Or maybe there's someone else's network and you want to know where the 170 00:13:07,601 --> 00:13:11,724 weak spot is in their network. In fact there is some declassified documents about 171 00:13:11,724 --> 00:13:15,796 mm, about fifteen years ago or so which showed that the United States and Soviet 172 00:13:15,796 --> 00:13:19,664 Union Militaries, back during the Cold War, were actually quite interested in 173 00:13:19,664 --> 00:13:23,379 computing minimum cuts. Cuz they were looking for things like, for example, 174 00:13:23,379 --> 00:13:27,552 what's the most efficient way to disrupt the other Country's transportation 175 00:13:27,552 --> 00:13:33,481 network. Another application which is a big deal in social network analyst these 176 00:13:33,481 --> 00:13:38,893 days, is the idea of community detection. So the question is amongst a huge graph, 177 00:13:38,893 --> 00:13:42,809 like say the graph of everybody whose on Facebook or something like that, how can 178 00:13:42,809 --> 00:13:46,579 you identify small pockets of people that seem tightly knit. That seem closely 179 00:13:46,579 --> 00:13:50,785 related from which you like to infer that they're a community of some sort. You know 180 00:13:50,785 --> 00:13:54,411 maybe they all go to the same school, maybe they all have the same interest, 181 00:13:54,411 --> 00:13:58,258 maybe they're part of the same biological family Whatever. Now, it's, to some 182 00:13:58,258 --> 00:14:02,354 degree, still an open question, how to best define communities and social 183 00:14:02,354 --> 00:14:06,904 networks. But as a quick and dirty sort of first order heuristic, you can imagine 184 00:14:06,904 --> 00:14:11,113 looking for small regions, which, on the one hand, are highly interconnected 185 00:14:11,113 --> 00:14:15,529 amongst themselves. But quite weakly connected. To the rest of the graph, so 186 00:14:15,529 --> 00:14:20,734 subroutines like the minimum cut problem can be used for identifying these small 187 00:14:20,734 --> 00:14:26,004 densely interconnected, but then weakly connected to everybody else pockets of the 188 00:14:26,004 --> 00:14:31,335 graph. Finally cut problems are also used a lot in, in vision. So for example, one 189 00:14:31,335 --> 00:14:36,792 way you can use them is in what's called image segmentation. So here what's going 190 00:14:36,792 --> 00:14:41,166 on is your given as inputs two D array where each entry is a pixel from some 191 00:14:41,166 --> 00:14:45,766 image. And there's a graph which is very natural to define given a two D array of 192 00:14:45,766 --> 00:14:50,197 pixels, namely you have an edge between two pixels if their neighboring. So for 193 00:14:50,197 --> 00:14:54,456 two pixels that are immediately next to each other, left to right or top to 194 00:14:54,456 --> 00:14:59,187 bottom. You put an edge there. So that gives you, what's called a grid graph. And 195 00:14:59,187 --> 00:15:03,743 now unlike the basic minimum cut problem that we're talking about here. In image 196 00:15:03,743 --> 00:15:08,413 segmentation it's most natural to use edge weights, where the weight of an edge is 197 00:15:08,413 --> 00:15:13,140 basically how likely you expect those two pixels to be coming from a common object. 198 00:15:13,140 --> 00:15:17,103 Why not? You expect two neighboring pixels to come from the same object. Well, 199 00:15:17,103 --> 00:15:21,172 perhaps their color maps are almost exactly the same, and you just expect that 200 00:15:21,172 --> 00:15:25,084 they're part of the same thing. So once you've defined this grid graph with 201 00:15:25,084 --> 00:15:29,413 suitable edge weights, now you run a graph partitioning or mid-cut type sub-routine 202 00:15:29,413 --> 00:15:33,533 and the hope is that the cut that it identifies rips off one of the contiguous 203 00:15:33,533 --> 00:15:38,071 objects in the picture. And you do that a few times and you get the major objects in 204 00:15:38,071 --> 00:15:42,427 the given picture. So this list is by no means exhaustive of the applications of 205 00:15:42,427 --> 00:15:46,230 ming cut and graph partitioning sub routines, but I hope it serves as 206 00:15:46,396 --> 00:15:50,420 sufficient motivation to watch the rest of the lectures in this sequence.