>> [COUGH] another thing many of these 
scalable Graph Data management solutions 
try, except one which is one the board, 
is that let's put, since we know graphs 
are really complicated beasts which have 
very diverse kind of relationships 
between them. 
And we also know that ram is getting 
cheaper, let's assume that I can 
distribute this graph over many machines. 
And each machine has GB's of ram, so let 
me just put all the database in, 
completely in memory, when you actually 
need to perform transactions on them, 
Right? 
So, if you look at Neo4J it loads the 
entire graph into memory when you fast 
fire any one query. 
Until then, it's peacefully sitting on 
the disk. 
So your fast query can easily take few 
hours, it loads everything into memory. 
And then after that, every query is like, 
super fast. 
Zoop, zoop, zoop, zoop, it comes through. 
Right? 
So, average query response times are 
extremely good, but the first call start 
queries are pretty bad. 
But on the other hand, if you are trying 
to look at billion node graphs, Neo4J 
really, really struggles even on fairly 
large servers that we have tried on. 
So, the ultimate solution is let's put 
everything on disk, and just do it super 
fast. 
To some means, we don't know how, right? 
So there is one solution, the idea of 
triple x spin. 
which is basically what most of my work 
depends on. 
Is that, essentially look at triples and 
look at what kind of access patterns that 
you have. 
So you can a, you can access looking 
source and look for the predicate object. 
You can look at predicate, and look for 
source and object. 
You can look for object and predicate and 
source, Right? 
So let's create multiple indexes which 
are storing the same redundant 
information in some sense, right? 
I just store it really compactly in a 
very compressed format. 
And then I add some more indexes just to 
make life easier for few other queries 
which do not follow this kind of pattern. 
 >> [INAUDIBLE] No, blinks does 
everything in memory. 
Blinks, banks, everything. 
 >> [INAUDIBLE] Even the index is in 
memory. 
 >> Huh. 
Link. 
So, there, there the focus is uneven 
computing the [INAUDIBLE] on memory is 
expensive. 
So, you really want to speed that up, so 
they maintain these fringe nodes and try 
to do it [CROSSTALK] really fast. 
Exactly. 
But everything is in memory. 
It doesn't ever go into the database, I 
mean, on disk. 
 >> This [INAUDIBLE]. 
 >> Yeah, it's not mine. 
I just use it. 
 >> [INAUDIBLE]. 
 >> So the idea of Triple X was done in, 
at Max Planck Institute. 
And it's perhaps the fastest idea store I 
have seen so far. 
So that's why it's called Triple Express, 
right? 
And they have added transactional 
support. 
And what I'm working on is to add 
analytic support on top of the triple X. 
So they, the transactional site and we 
are adding analytic site. 
Trying to do mining operations on even 
recursive reasoning kind of things. 
 >> [INAUDIBLE] 
Yes. 
 >> [INAUDIBLE] 
It will surely make faster, alright? 
But, again, things are so messy, that 
constructing such communities [NOISE] is 
not as easy that you are not as clean as 
you would see on social networks. 
 >> [INAUDIBLE]. 
 >> Yes. 
 >> And then 
[INAUDIBLE]. 
Yes? 
Yes 
 >> [CROSSTALK] [INAUDIBLE] Exactly, so 
you are basically trying to implicitly 
derive some types. 
I'm trying to move it up the hiearchy. 
You can do that, but at least I have 
tried it on YAGO and it fails quite 
miserably. 
Because the community start out to be 
very small in size and then you have 
really high type hierarchy, really deep 
type hierarchy and each one has very 
little filtering. 
 >> [INAUDIBLE] And then we will go 
[UNKNOWN] [COUGH] [COUGH] [UNKNOWN] could 
they use that [UNKNOWN] just like they 
used the one-way diagram? 
 >> Yes. 
 >> [UNKNOWN] Yeah, yeah. 
 >> [UNKNOWN] You can do that, you can 
do that. 
But I can assure you, the performance 
will not any better. 
I can assure you this. 
 >> [INAUDIBLE]. 
 >> Yeah. 
 >> Standard graphs. 
 >> you can run some community detection 
mining algorithms and try to just group 
them. 
 >> [INAUDIBLE]. 
 >> But your queries will not be of that 
nature. 
So lets assume again go back to our good 
old D expendables example. 
At some point Bruce Willis and Dolph 
Lundgreen were in two different 
communities. 
Dolph was always in martial arts and 
largely Europe setting, right and he was 
not in the big hits like Bruce Willis 
was. 
So, in any definition of your community 
structure, you would keep these two 
separated. 
And you are at one node, which connects 
these two. 
And your queries now are all about the 
expendables. 
So, every time you have to touch this 
community and that community [SOUND] then 
your community is not really useful, 
right? 
So you can easily extrapolate it to even 
more complicated settings. 
 >> [INAUDIBLE] This is a, yeah. 
 >> [INAUDIBLE] [INAUDIBLE] Yeah. 
 >> [INAUDIBLE] So, we can discuss this 
further. 
I mean, if let me just proceed and, I can 
see that the time is way past. 
And then we will try now that we have 
convinced ourselves that link open data 
is actually a graph with some additional 
little details like labels and so on. 
Let's try to look at, what are the 
standard graph problems? 
And how they are related to linked open 
data setting. 
Okay. 
I mean these are, you're, at least the 
things that are in the blue are your 
undergrad math, right? 
So you have reachability queries. 
You are given with two nodes, you want to 
find if these two are connected. 
Okay. 
So we have studied different solutions 
for it in, graph algorithms courses. 
And then you actually have under the 
material, not looking for connectivity 
alone, you're actually looking for how 
they're connected. 
Give me all the nodes that are in 
between. 
The shortest path between these two 
nodes. 
Clearly if two nodes are reachable, the 
second shortest path can also be found. 
That's guaranteed. 
Right? 
So these are in some sense, answer 
somewhat the same setting, but just that 
shortest path is a little more 
informative than reachability. 
And then you have totally arbitrary 
pattern queries. 
Right? 
I can form any kind of pattern in my 
query template which could have wild 
cards, which may not have wild cards. 
And then I want to find out all instance 
of this in the graph. 
So these are the three main graph 
problems in the query. 
And many other problems that you see can 
be decomposed into some variant of these, 
are some group of these. 
Right? 
Base rank, for example, you can look at 
it as some variant of computing many, 
many reachability and shortest pathways. 
Right? 
Steiner trees, for example, well known 
solution is to use shortest path 
increase, okay. 
And if you link them back to SPARQL and 
audio you can see that the reachability 
queries by Dempsey don't exist in SPARQL 
except when you are looking at the 
extensions like property paths which are 
coming through. 
It says that okay, if a can reach b, then 
pick that b as your binary variable and 
then start processing something 
underneath, looking for patterns. 
So, one of the first examples I gave was, 
find me all friends of, of all connected 
nodes from Shea Chantal. 
This was one of the examples I gave. 
I could add additional constraint 
additional structure on the query saying 
find all reachable nodes from Shea 
chantal, and return me only those nodes 
which have certain graph structure around 
them. 
Those who work in DCS and. 
 >> [INAUDIBLE] Why don't we just say, 
from A question mark X? 
Y and it. 
 >> No, no you need also star. 
If you just have question mark X it is 
still 1H. 
No, but there is no star 
 >> There is no star. 
Star is coming in the property parts, 
same thing in existed in XEmacs. 
So, in the XPath in the Xquery in the 
beginning they did not have a star but 
later they added a star. 
Stars are always pink. 
Okay. 
[COUGH]. 
So, since reachability is being added, 
clearly shortest path wouldn't have 
existed before but now again people have 
realized shortest paths are required. 
so there is efforts for adding shortest 
paths as part of this SPARQL extensions 
and pattern queries always. 
All of SPARQL can be seen as a pattern 
query.