And so, clearly one of the big challenges is that graphs are really big. Big, massive, really massive. You can add any kind of these fancy adjectives. And more importantly than just having the graphs getting larger, the queries that you fire on these graphs are getting more and more interesting and exciting and harder to evaluate. So one is, you can have the standard pattern like queries, which you can, all of you, I think some of you already said, it's similar to SQL with few drawings. Yeah, SQL has drawings, but really done well for draw inquiries. But you can actually go into a little more complicated things in these graphs. And say I wanted to cut some troubles with unbounded recursions. I just want to keep doing bits of validities than have no good solution which is really efficient, right? This is a load in sparkle ex regions people are talking about. And then you have, on these graphs, many inequality that you can, I mean you are all already familiar with page wrap, basically these analytic way we translate the entire graph twice to rank based on the random walk potentials, right? You can also think of our I'll first fire 1 query, get subgraph out, some ad hoc subset of the entire el woody. And then from this I will extract then subgraphs or some kind of K cord of decomposition of this compute standard lease right? If you are familiar with this keyword search and graph this is basically what they do. They fire some inch inside of queiries which will figure out certain notes And then try to connect them up using standard [INAUDIBLE]. Again, these are not so easy to solve using relational databases. The solution most of the current approaches use is, let's take this graph and put it in memory and deal with it. Okay? While it is a valid solution, given that the memory is getting cheaper >> It's still no cheap enough to hold 30 billion triplets. Okay, so we need to come up with better strategies for dealing with it. >> Is this a concept of uh, [INAUDIBLE] sets and rules, in the [INAUDIBLE]. >> Yes. So you can always looks at frequent subgraphs. >> Like, so in fact, there is another, as they say, some kind of >> [INAUDIBLE] >> So, you can relax, you can relax the labels on the ages, and say I will, I don't care about the ages, but Really any kind of structure, we can say I will include just the labels but I relax on the subjects and objects that are there. So people have worked on these kind of. >> Sequence the subgraphs and then also interesting subgraphs. >> Interesting subgraphs, okay. >> [INAUDIBLE]. >> Just to reiterate my point, so far we have never talked about how exactly they're evaluated. So far I have never even mentioned how they're evaluated. >> [INAUDIBLE] >> So created, you can just create any graph you, in the good old way that you you take your relational database, perform all your joints, and store it as a graph. Right, I mean that's one cheap way of answering it. But in other words, there are so many efforts coming, both from automated methods as well as from manually hand-coding stuff, so if you go back and look at it, DBLP, for example they maintain huge records of which paper appeared in which conference, which journal, what author, and so on. And they, again, wrote scripts, from this structured relational table they had. To turn it into and ideas. And then give it out as part of RDF data set, which is essentially a graph. Right? So that's one method of doing it. Iago, free-base kind of efforts, they're there focusing on, Iago and DDPDS, sorry, are focusing more on, let's take some. Almost structured data like Wikipedia, and apply whole bunch of machine learning tools, natural language processing tools, so that you can extract out these facts, x is related to y, in some way, and then put them into the graph form. Right? And psych, again, hand colored many of these knowledge. So, there are many efforts in which the graphs were created. Again, we are, I'm not going too much into any of those. I agree that these are challenges, but I'm looking at a slightly different challenge. [COUGH] So just to continue with this line and try to finish it quickly. Queries are bigger, graphs are bigger. And you can see hey why not use some magic bullet like how to write. Google users account for terabytes so 30 billion should be T right But the problem is that, your graphs are not the same as text tables, right? Text pieces, which Google uses this one. So your graphs have complex inter connectivity, this is known as god knows what are the notes. Right? What kinds of relationships are there? That's one. And today some connections may not exist, but tomorrow you can just add one edge, it seems like a very harmless piece of edge saying Dolph Lundgren and Bruce Willis acted in The Expendables. They never acted before. Suddenly, you put these two together and form a connection. So as you assume the doing the hard loop you're actually partition in the nicely hard all, the guys who works with Bruce Willis. I will put these two down separately. And suddenly, expendables comes, all the partitioning falls apart, right? Which is not the case in the applications settings where hard loop is used. Your partitioning is fine. You don't need to worry about it anymore. Your partitioning logically means the same. But here things can get really messy, even in a single static snapshot, as well as you when have these dynamism in your data set. So therefore data partitioning solutions will also need rethinking. Okay, so given all that, it's not like I'm the only guy who has thought about it and teaching you. There's a whole bunch of people who have worked on it. Okay. There is huge research as well as industry work that's going on for developing graph data management tools for very different kind of applications. Both generic as well as in the analytic's world. So, for example, there is transactional graph database systems, data management systems, like Neo4J, Jena, HyperGraphDB, RDF3x. They're, they really focus on this LOD kind of settings. Right, well, you, until now, you never had analytic s queries on there. You just had more, just coming in, a pattern, and you match it. Right. But we are saying now that SPARQL is getting richer with recursive reasoning kind of queries. Then transactional GDM has to evolve into which supports little more than just a transactional pattern maxing place. And then you have analytic GDM. Analytic, graph database management, like Pregel and Giraph, which, can be seen as. Hadoop style processing for graphs, which have [UNKNOWN] of reasoning. You can compute page rank using Pregel. In fact, this is the clean Pregel. Paper also makes that, they design Pregel so that they can do [UNKNOWN] on massive graphs. [INAUDIBLE] >> Legal is not. But it's open source implementation which is on to of hall loop, called Giraph. >> [INAUDIBLE] >> Giraph is open. >> [INAUDIBLE] >> Yea, it's actually bad at this moment. But, it's not as. >> [INAUDIBLE] >> Giraph lab, yes. Graph lab I haven't put yet because it's not really a GDI GD-ermium. They are more designed for doing machine loading applications on top of some graphs, without really worrying about, exactly. So that's basically the focus.