1 00:00:03,200 --> 00:00:06,324 Welcome back. Our topic today is algorithms for 2 00:00:06,324 --> 00:00:11,774 processing undirected graphs which are a model that are widely used for many, many 3 00:00:11,774 --> 00:00:15,364 applications. We'll see plenty of examples and then 4 00:00:15,563 --> 00:00:20,881 we'll look at three fundamental algorithms for processing undirected graphs and 5 00:00:20,881 --> 00:00:26,065 consider some of the challenges of developing algorithms for, for this kind 6 00:00:26,065 --> 00:00:30,799 of structure. So as introduction we'll take a look at 7 00:00:31,051 --> 00:00:35,510 the basic ideas behind undirected graphs and applications. 8 00:00:35,510 --> 00:00:40,357 What is an undirected draft? It's the, the definition is very simple. 9 00:00:40,357 --> 00:00:44,120 It's a set a verticies connected pairwise by edges. 10 00:00:44,318 --> 00:00:49,550 So this is an example of an undirected graph that describes the Paris Metro. 11 00:00:49,748 --> 00:00:55,510 You've got stations and if there's a rail line between the stations there's an edge. 12 00:00:55,510 --> 00:00:58,952 So why do we study graphs and graph algorithms? 13 00:00:58,952 --> 00:01:04,530 Well there are literally thousands of practical applications where graphs are an 14 00:01:04,530 --> 00:01:09,625 appropriate models and we'll take a look at a few others in just a minute. 15 00:01:09,625 --> 00:01:14,790 Because there are so many applications, there's literally hundreds of graph 16 00:01:14,790 --> 00:01:20,505 algorithms known and these are sometimes quite sophisticated, as we'll see and also 17 00:01:20,505 --> 00:01:25,050 very important in understanding what's going on in an application. 18 00:01:25,252 --> 00:01:30,994 Even without the applications a graph is really an interesting and broadly useful 19 00:01:30,994 --> 00:01:36,870 abstraction to try and understand that's so simple to describe, but leads to quite 20 00:01:36,870 --> 00:01:40,045 complex and difficult to understand structures. 21 00:01:40,248 --> 00:01:45,584 In a general graph theory and graph algorithms is a very challenging branch of 22 00:01:45,584 --> 00:01:50,110 computer science and discreet math that we're just introducing now. 23 00:01:50,110 --> 00:01:55,772 So here's an example of a graph, In, in genetics or in genomics, 24 00:01:55,772 --> 00:02:03,547 Where if the network where the nodes are proteins and the edges are interactions 25 00:02:03,547 --> 00:02:09,305 among the proteins. And genomicists are trying to understand 26 00:02:09,305 --> 00:02:15,928 how these biological processes work. Need to understand the nature of 27 00:02:15,928 --> 00:02:20,712 connections like this. Here's another example, the internet 28 00:02:20,712 --> 00:02:27,466 itself, where, every node is a computer and every edge is or, or a node, every, a 29 00:02:27,466 --> 00:02:32,513 computer or a communications device, and every edge is connections. 30 00:02:32,810 --> 00:02:38,526 And of course, the Internet is driven by loops and bounds, so this is a huge craft, 31 00:02:38,748 --> 00:02:44,390 in this lot of needs, lot of interest understanding, properties of this craft. 32 00:02:44,728 --> 00:02:53,310 This is a, social graph having to do the way science is carried out. 33 00:02:53,649 --> 00:03:03,511 So it's the, the nodes are and scientist websites and the edges or, clicks 34 00:03:03,511 --> 00:03:08,357 connecting one to another. This one shows how scientists in different 35 00:03:08,357 --> 00:03:12,872 fields are interacting. And again interesting and important to 36 00:03:12,872 --> 00:03:18,316 understand properties of this graph. You're certainly familiar with Facebook. 37 00:03:18,515 --> 00:03:22,100 That's one of the hugest one of the biggest graphs. 38 00:03:22,100 --> 00:03:25,552 It's absolutely huge, And it's always changing. 39 00:03:25,552 --> 00:03:29,470 It's very dynamic and people want to understand its property. 40 00:03:29,714 --> 00:03:34,920 This is an example of a graph, that was used, in litigation, 41 00:03:35,165 --> 00:03:41,836 Where people were trying to understand, exactly, precisely, the communications 42 00:03:41,836 --> 00:03:48,263 patterns, in a particular corporate culture, that was of great interest to 43 00:03:48,263 --> 00:03:53,043 many people. Another similar example, this is people 44 00:03:53,043 --> 00:03:56,530 lobbying the FCC and how are they connected. 45 00:03:56,768 --> 00:04:03,660 So from the breadth and variety of these examples, you can see that number one. 46 00:04:03,663 --> 00:04:08,814 Graphs are very general model, and number two, graphs can be huge. 47 00:04:09,052 --> 00:04:16,184 This is another one showing the Framingham heart study and connections among people 48 00:04:16,422 --> 00:04:19,830 in the study and how it relates to obesity. 49 00:04:20,050 --> 00:04:26,654 So those examples in this list shows that it's graphs are very flexible and very 50 00:04:26,654 --> 00:04:32,818 dynamic model and that the graphs involved can be huge they could have complex 51 00:04:32,818 --> 00:04:39,056 properties. And so our challenge is to try to figure out efficient algorithms that 52 00:04:39,056 --> 00:04:43,972 can give us some insight into the structure of graphs like this. 53 00:04:43,972 --> 00:04:49,770 That's what we're going to be talking about for the rest of this lecture. 54 00:04:49,998 --> 00:04:57,308 So now back to a starting point which is we need some simple and clear and specific 55 00:04:57,308 --> 00:05:01,952 definitions about basic terms that we're talking about. 56 00:05:02,181 --> 00:05:08,653 And then we'll look at algorithms that work for a small example but also the same 57 00:05:08,653 --> 00:05:14,440 algorithms do what we need for big graphs of the type we just considered. 58 00:05:14,661 --> 00:05:20,263 So this is the terminology for that we'll use for graph processing. 59 00:05:20,484 --> 00:05:26,603 So we have a vertex which is a black dot in this graph and an edge which is black 60 00:05:26,603 --> 00:05:31,763 line connecting two vertices. And then a few more concepts that are 61 00:05:31,763 --> 00:05:36,408 important in many applications. So one is the idea of the path. 62 00:05:36,629 --> 00:05:40,610 That's just a sequence of vertices connected by edges. 63 00:05:40,804 --> 00:05:44,764 And the idea of a cycle, Which is a path who's first and last 64 00:05:44,764 --> 00:05:48,789 vertices are the same. So over on the left, you can see a cycle 65 00:05:48,789 --> 00:05:52,165 of length five, that connects these five vertices. 66 00:05:52,944 --> 00:05:56,385 And wherever you start, you go back to the same place. 67 00:05:56,580 --> 00:06:02,023 So we say the two vertices are connected if there's a path between them. 68 00:06:02,023 --> 00:06:07,618 And then that definition leads us to the idea of a graph dividing up into 69 00:06:07,618 --> 00:06:12,382 connective components, Which is subsets of the graph where each 70 00:06:12,382 --> 00:06:17,372 pair of vertices is connected. And, so one of the algorithms that we're 71 00:06:17,372 --> 00:06:22,816 going to look at today is one for identifying connected components in, in a 72 00:06:22,816 --> 00:06:25,162 graph. So that's just one example. 73 00:06:25,351 --> 00:06:30,524 Here's some examples of problems that might arise in graph processing that are 74 00:06:30,524 --> 00:06:33,867 easily understood just given these basic definitions. 75 00:06:33,867 --> 00:06:38,724 So the very first one is given two vertices, is there a path connecting those 76 00:06:38,724 --> 00:06:42,005 two vertices? Like in the Internet graph, you want to 77 00:06:42,005 --> 00:06:45,790 know, can I get from where I am to where I want to get? 78 00:06:45,969 --> 00:06:50,632 Or maybe you want the shortest path, The most efficient way to get between two 79 00:06:50,632 --> 00:06:53,801 vertices. You might want to know is there a cycle on 80 00:06:53,801 --> 00:06:57,508 the graph? If the graph maybe represents electrical 81 00:06:57,508 --> 00:07:00,736 kind activity a cycle might represent a short circuit, 82 00:07:00,736 --> 00:07:05,226 You would want to check for that. Or maybe you want to systematically go 83 00:07:05,226 --> 00:07:09,431 everywhere in the graph, So there's two related problems called the 84 00:07:09,431 --> 00:07:14,076 Euler tour and the Hamilton tour. Euler tour is, is there a cycle that uses 85 00:07:14,076 --> 00:07:18,093 each edge exactly once? Can I go look around the graph and touch 86 00:07:18,093 --> 00:07:21,482 every edge in it? And the one called the Hamilton tour, 87 00:07:21,482 --> 00:07:25,876 which says well I'm really just interested in getting to the vertices. 88 00:07:25,876 --> 00:07:29,140 Is there a cycle that uses each vertex exactly once? 89 00:07:29,140 --> 00:07:34,143 Or basic connectivity problems. So you want to know, is there some way to 90 00:07:34,143 --> 00:07:38,280 connect all the vertices? Is there a path between each pair of 91 00:07:38,280 --> 00:07:43,084 vertices in the graph or not? Or you might want to know what's called 92 00:07:43,084 --> 00:07:48,421 the minimal spanning tree, which is the shortest set of edges, or the best way to 93 00:07:48,421 --> 00:07:53,159 connect all of the vertices. And then various processing issues related 94 00:07:53,159 --> 00:07:56,628 to connectivity. For example, is there a vertex whose 95 00:07:56,628 --> 00:08:02,011 removal disconnects the graph? Drawing the graph. 96 00:08:02,011 --> 00:08:06,147 Can you draw the graph in the plane with no edges crossing? 97 00:08:06,357 --> 00:08:11,685 Some of the graph with nodes are correspond to places in the plan or cities 98 00:08:11,685 --> 00:08:15,821 on the Earth or whatever, But some of the other graphs are very 99 00:08:15,821 --> 00:08:19,396 abstract, may have nothing to do with geometry. 100 00:08:19,396 --> 00:08:24,933 You might want to know can you draw with no crossing edges or you might have two 101 00:08:24,933 --> 00:08:29,210 graphs, and a vertex named different to represent the same graph. 102 00:08:29,210 --> 00:08:35,577 So one of the biggest challenges in graph processing that we'll address today in 103 00:08:35,577 --> 00:08:42,260 this lecture somewhat is just to decide how difficult are these problems. 104 00:08:43,800 --> 00:08:49,078 Some of them are very easy, some of them are very difficult and some of them, 105 00:08:49,078 --> 00:08:52,151 actually, is unknown how difficult they are. 106 00:08:52,352 --> 00:08:56,628 There's such a broad variety of problems that are simply stated. 107 00:08:56,828 --> 00:09:02,774 One of the first things that we have to be aware of when doing graph processing is 108 00:09:02,774 --> 00:09:09,389 that we, we are entering into a forest of, of all different possibilities and, and we 109 00:09:09,389 --> 00:09:14,800 have to be careful that we're working on a problem that we can solve. 110 00:09:14,800 --> 00:09:20,477 And we'll try to give some insights into that during the lectures that we gave on 111 00:09:20,852 --> 00:09:23,735 graph-processing. So that's our introduction to