Yeah, thanks Gautam for inviting and it's great to be here. nice to see that there are so many people for this talk. I thought this talk would be one of those fringe talks which only three, four people would be interested, but great to see the room is getting full. So, what I'm going to talk about is look at largely efficiency issues when one is dealing with large scale graphs, and, for me the graphs are typically not necessarily the social network style graphs, but largely from the link data community. So we will see what these link data graphs, how they are different from social network graphs. An how big are these, an what are the sum, problems that you see while [NOISE] you're trying to deal with these graphs, and uh, [NOISE] how we can go about solving it. And much of the work, that may come later in the talk, is clearly not just me alone who has worked on it. I just haven't listed all my collaborators. But we can talk about it. I can send you the papers if required. So, first, I don't need to give a introduction about what a graph is, right? Everybody knows what a graph structure looks like. It's one of the most general way of representing information, right. One of the most flexible. Forms. But let's focus on what is a graph database. As compared to, you can look at, actually, any relation database as a graph database. In some sets. So you have foreign keys which link one record to another. So, no issues. So you can actually form a graph structure using these foreign key relationships. And within a triple between two values in the triple, you can actually form a relationship, because they're actually related. That's why they are called Relational Structures. So how is it different form a graph database, which is setting more popular nowadays? The key different is in the way the data gets accessed. Right. In relational databases, the focus is always on indexed access to some triple. So you know this group of values are a relation so I just want to access this entire group and then process it. On the other hand, in graph databases, the focus is more on and currently on this particularly value, tell me all the values that are related to it. It need not be through a single triple relationship. It could be normalized, de-normalized, across foreign key. I really have no control on what kind of relationship I'm looking for. But I could traverse this relationships from a given node. So this is the key difference, and this. If you, again, put it back in the relational world, it can be seen as a huge number of joins that you need to perform, right? Which is not a a fun task to do on relational databases, so you want to avoid these joins. On the other hand this is the only mode of traversal access that you are allowed to do on graph databases. Although we will relax it a little later and say that we want to do both relational style as well as navigational style of access, but this is the main difference between standard databases and graph databases. So some of the examples of what kind of questions you could get on graph databases find all friends of Gautam. So you know, a node Gautam, and you want to find all the relationships find out only those relationships which say it's friendship and locate the other end of. So this is one hop BFS with certain kind of restriction, right from the graph wall. You can actually look at little more complicated issues. So I, there is another the note called Srikanta, and I want to find out all the connections that, not an individual single hop connections, but anyone who can be reached from Srikanta. So I want to contact him and probably sent out resume to him and say hey, please, propagate it in your network. So you should know how valuable that person is in terms of how many people he can reach out. Right? So look at Srikanta and look at all the reachable nodes from there. Right? These are some of the queries. Which you don't often find on relations databases. But these are extremely common when we are dealing with graph databases. >> Essentially they are recursive joins. >> They are recursive joins. >> Including [INAUDIBLE] everything. >> Yeah, right so, recursive join is one way of looking at if you have self join. But it could be simply join stream of joins. Okay? So it could be on the same table or across tables. You really don't have a requirement to stop that. So, it's not, again, something very new in fact, I haven't put the dates, but you can easily guess. Logic databases was well before, I think, probably 80% of this room was even born. Right? People have been working on it, and if you have met Alman or whoever visited TCS you can easily see that these guys >> Database. >> Yeah. They worked on data log, and even before that there was prolog which was from AI side, and these things were 30 years long history, right? So that's where. The real grounds of graph databases were sown, right. And then, of course, once the web came, it's very natural to see the web as a big graph and you're all familiar with page rank and Google's mode of trying to rank pages. Which is largely a graph query, right as we will see soon. And then XML wave hit database community. And XML is both a tree and if you relax it a little graph, alright? So, again, huge amount of work was came out during XML activity, right? I mean XML database research. So whenever you look at many graph papers from about 10, 20 years, 10 to 15 years old, then you will see they all refer back to XML, Xquery X path kind of settings. So, although they are not strictly graphs. Graph databases research flourished during that time. But now there is even bigger beast called Link Data. Okay, which is what I'm really excited about, which I'm going to talk about, in this talk. There the linked data graphs are really graphs. As in XML was mostly tree-structured, with some deviations from the tree. But, link data is mostly graph structures with very little deviating from the graph structure, as in moving into the tree. Very little. But so, we are really seeing true graph requirements in databases now. So, just to give a brief introduction of what these graphs would look like in link data setting, right? So I just was thinking about which are the good examples to give. And then I happened to catch the DVD of The Expendables. So I thought I will talk about. The good old Bruce Willis and Dolph Lundgren who are heroes in this movie, right? So you can form such a graph where Bruce Willis is well known and Dolph Lundgren, and you can have relationships, like both were born in different cities, Ida-Oberstein and, Stockholm. And then both of them have worked in one single movie, The Expendables, and it's a movie, and you can even construct further relationships like, this is a movie made in Hollywood and it's a action movie and these kind of things. And both of them are action heroes, out of which Dolph knows martial arts while Bruce Willis knows how handle a gun. Right. so this is a big this is a small snapshot of a fairly large graph that you can construct just from IMDB data set, just from movies that are out there. Yeah? >> This looks like its very similar to Google Knowledge Graph, that they created. >> Yes. [COUGH] >> First two, three sentences of Wikipedia [INAUDIBLE]. >> Yes. In fact, Google Knowledge Graph is another idea of link data graph that you can see. And, in fact, they have used not just used a Wikipedia, they have used IMDB as well, that used many of these almost structured sources in order to extract. >> So I'm not going to talk about much on how exactly they build the graph. But this is exactly what you are saying. This is the way Google Knowledge Graph also looks like. >> But the problem with Google Knowledge Graph is, they relate each node with some static node. They cannot operate on dynamically changing environment, [INAUDIBLE] >> [COUGH] [INAUDIBLE] So if the query is written in dynamically changing events or some time series type of data change related. >> Yeah. >> They clearly [INAUDIBLE]. >> They have a problem, using the knowledge graph correctly. You are perfectly right. In fact this is actually active research area. I'm not going to talk much about it. But we can discuss this offline. In fact this is also one of my research areas, coincidentally. So looking at how these even kind of, Data stream that's coming in. How you can represent them as graphs, how you can query them and clearly the problems only compound from what I'm talking about here right? So they're much harder problems. Google is still catching up with us on that front >> This one. >> Yeah. >> When I'm looking right here [INAUDIBLE] in one sentence and [INAUDIBLE] makes [INAUDIBLE] so I would like to [INAUDIBLE] [COUGH] So given such a graph that you have constructed from known facts, right? So you can actually look at the graph as simply a bag of edges. So you can allow for multiple edges between nodes, right, this, with the same label, also. It's perfectly okay. RDF does not prevent it. So. And every idea is represented as a triple, which essentially says, what is the source of the edge, what is the target of the edge, and what is the label on the edge? And edges are always directed in RDF setting. And again, in the semantic web definition, when you look at it. So, RDF graphs, every node on edge has a unique identifier in the form of URI, but it's only a minor detail which is not really important and for looking at RDF data as a graph, right? Just just gives you some way of identifying the nodes and edges.