>> [COUGH] another thing many of these scalable Graph Data management solutions try, except one which is one the board, is that let's put, since we know graphs are really complicated beasts which have very diverse kind of relationships between them. And we also know that ram is getting cheaper, let's assume that I can distribute this graph over many machines. And each machine has GB's of ram, so let me just put all the database in, completely in memory, when you actually need to perform transactions on them, Right? So, if you look at Neo4J it loads the entire graph into memory when you fast fire any one query. Until then, it's peacefully sitting on the disk. So your fast query can easily take few hours, it loads everything into memory. And then after that, every query is like, super fast. Zoop, zoop, zoop, zoop, it comes through. Right? So, average query response times are extremely good, but the first call start queries are pretty bad. But on the other hand, if you are trying to look at billion node graphs, Neo4J really, really struggles even on fairly large servers that we have tried on. So, the ultimate solution is let's put everything on disk, and just do it super fast. To some means, we don't know how, right? So there is one solution, the idea of triple x spin. which is basically what most of my work depends on. Is that, essentially look at triples and look at what kind of access patterns that you have. So you can a, you can access looking source and look for the predicate object. You can look at predicate, and look for source and object. You can look for object and predicate and source, Right? So let's create multiple indexes which are storing the same redundant information in some sense, right? I just store it really compactly in a very compressed format. And then I add some more indexes just to make life easier for few other queries which do not follow this kind of pattern. >> [INAUDIBLE] No, blinks does everything in memory. Blinks, banks, everything. >> [INAUDIBLE] Even the index is in memory. >> Huh. Link. So, there, there the focus is uneven computing the [INAUDIBLE] on memory is expensive. So, you really want to speed that up, so they maintain these fringe nodes and try to do it [CROSSTALK] really fast. Exactly. But everything is in memory. It doesn't ever go into the database, I mean, on disk. >> This [INAUDIBLE]. >> Yeah, it's not mine. I just use it. >> [INAUDIBLE]. >> So the idea of Triple X was done in, at Max Planck Institute. And it's perhaps the fastest idea store I have seen so far. So that's why it's called Triple Express, right? And they have added transactional support. And what I'm working on is to add analytic support on top of the triple X. So they, the transactional site and we are adding analytic site. Trying to do mining operations on even recursive reasoning kind of things. >> [INAUDIBLE] Yes. >> [INAUDIBLE] It will surely make faster, alright? But, again, things are so messy, that constructing such communities [NOISE] is not as easy that you are not as clean as you would see on social networks. >> [INAUDIBLE]. >> Yes. >> And then [INAUDIBLE]. Yes? Yes >> [CROSSTALK] [INAUDIBLE] Exactly, so you are basically trying to implicitly derive some types. I'm trying to move it up the hiearchy. You can do that, but at least I have tried it on YAGO and it fails quite miserably. Because the community start out to be very small in size and then you have really high type hierarchy, really deep type hierarchy and each one has very little filtering. >> [INAUDIBLE] And then we will go [UNKNOWN] [COUGH] [COUGH] [UNKNOWN] could they use that [UNKNOWN] just like they used the one-way diagram? >> Yes. >> [UNKNOWN] Yeah, yeah. >> [UNKNOWN] You can do that, you can do that. But I can assure you, the performance will not any better. I can assure you this. >> [INAUDIBLE]. >> Yeah. >> Standard graphs. >> you can run some community detection mining algorithms and try to just group them. >> [INAUDIBLE]. >> But your queries will not be of that nature. So lets assume again go back to our good old D expendables example. At some point Bruce Willis and Dolph Lundgreen were in two different communities. Dolph was always in martial arts and largely Europe setting, right and he was not in the big hits like Bruce Willis was. So, in any definition of your community structure, you would keep these two separated. And you are at one node, which connects these two. And your queries now are all about the expendables. So, every time you have to touch this community and that community [SOUND] then your community is not really useful, right? So you can easily extrapolate it to even more complicated settings. >> [INAUDIBLE] This is a, yeah. >> [INAUDIBLE] [INAUDIBLE] Yeah. >> [INAUDIBLE] So, we can discuss this further. I mean, if let me just proceed and, I can see that the time is way past. And then we will try now that we have convinced ourselves that link open data is actually a graph with some additional little details like labels and so on. Let's try to look at, what are the standard graph problems? And how they are related to linked open data setting. Okay. I mean these are, you're, at least the things that are in the blue are your undergrad math, right? So you have reachability queries. You are given with two nodes, you want to find if these two are connected. Okay. So we have studied different solutions for it in, graph algorithms courses. And then you actually have under the material, not looking for connectivity alone, you're actually looking for how they're connected. Give me all the nodes that are in between. The shortest path between these two nodes. Clearly if two nodes are reachable, the second shortest path can also be found. That's guaranteed. Right? So these are in some sense, answer somewhat the same setting, but just that shortest path is a little more informative than reachability. And then you have totally arbitrary pattern queries. Right? I can form any kind of pattern in my query template which could have wild cards, which may not have wild cards. And then I want to find out all instance of this in the graph. So these are the three main graph problems in the query. And many other problems that you see can be decomposed into some variant of these, are some group of these. Right? Base rank, for example, you can look at it as some variant of computing many, many reachability and shortest pathways. Right? Steiner trees, for example, well known solution is to use shortest path increase, okay. And if you link them back to SPARQL and audio you can see that the reachability queries by Dempsey don't exist in SPARQL except when you are looking at the extensions like property paths which are coming through. It says that okay, if a can reach b, then pick that b as your binary variable and then start processing something underneath, looking for patterns. So, one of the first examples I gave was, find me all friends of, of all connected nodes from Shea Chantal. This was one of the examples I gave. I could add additional constraint additional structure on the query saying find all reachable nodes from Shea chantal, and return me only those nodes which have certain graph structure around them. Those who work in DCS and. >> [INAUDIBLE] Why don't we just say, from A question mark X? Y and it. >> No, no you need also star. If you just have question mark X it is still 1H. No, but there is no star >> There is no star. Star is coming in the property parts, same thing in existed in XEmacs. So, in the XPath in the Xquery in the beginning they did not have a star but later they added a star. Stars are always pink. Okay. [COUGH]. So, since reachability is being added, clearly shortest path wouldn't have existed before but now again people have realized shortest paths are required. so there is efforts for adding shortest paths as part of this SPARQL extensions and pattern queries always. All of SPARQL can be seen as a pattern query.