And so, clearly one of the big challenges 
is that graphs are really big. 
Big, massive, really massive. 
You can add any kind of these fancy 
adjectives. 
And more importantly than just having the 
graphs getting larger, the queries that 
you fire on these graphs are getting more 
and more interesting and exciting and 
harder to evaluate. 
So one is, you can have the standard 
pattern like queries, which you can, all 
of you, I think some of you already said, 
it's similar to SQL with few drawings. 
Yeah, SQL has drawings, but really done 
well for draw inquiries. 
But you can actually go into a little 
more complicated things in these graphs. 
And say I wanted to cut some troubles 
with unbounded recursions. 
I just want to keep doing bits of 
validities than have no good solution 
which is really efficient, right? 
This is a load in sparkle ex regions 
people are talking about. 
And then you have, on these graphs, many 
inequality that you can, I mean you are 
all already familiar with page wrap, 
basically these analytic way we translate 
the entire graph twice to rank based on 
the random walk potentials, right? 
You can also think of our I'll first fire 
1 query, get subgraph out, some ad hoc 
subset of the entire el woody. 
And then from this I will extract then 
subgraphs or some kind of K cord of 
decomposition of this compute standard 
lease right? 
If you are familiar with this keyword 
search and graph this is basically what 
they do. 
They fire some inch inside of queiries 
which will figure out certain notes And 
then try to connect them up using 
standard [INAUDIBLE]. 
Again, these are not so easy to solve 
using relational databases. 
The solution most of the current 
approaches use is, let's take this graph 
and put it in memory and deal with it. 
Okay? 
While it is a valid solution, given that 
the memory is getting cheaper 
 >> It's still no cheap enough to hold 
30 billion triplets. 
Okay, so we need to come up with better 
strategies for dealing with it. 
 >> Is this a concept of uh, [INAUDIBLE] 
sets and rules, in the 
[INAUDIBLE]. 
 >> Yes. 
So you can always looks at frequent 
subgraphs. 
 >> Like, so in fact, there is another, 
as they say, some kind of 
 >> [INAUDIBLE] 
 >> So, you can relax, you can relax the 
labels on the ages, and say I will, I 
don't care about the ages, but Really any 
kind of structure, we can say I will 
include just the labels but I relax on 
the subjects and objects that are there. 
So people have worked on these kind of. 
 >> Sequence the subgraphs and then also 
interesting subgraphs. 
 >> Interesting subgraphs, okay. 
 >> [INAUDIBLE]. 
 >> Just to reiterate my point, so far 
we have never talked about how exactly 
they're evaluated. 
So far I have never even mentioned how 
they're evaluated. 
 >> [INAUDIBLE] 
 >> So created, you can just create any 
graph you, in the good old way that you 
you take your relational database, 
perform all your joints, and store it as 
a graph. 
Right, I mean that's one cheap way of 
answering it. 
But in other words, there are so many 
efforts coming, both from automated 
methods as well as from manually 
hand-coding stuff, so if you go back and 
look at it, DBLP, for example they 
maintain huge records of which paper 
appeared in which conference, which 
journal, what author, and so on. 
And they, again, wrote scripts, from this 
structured relational table they had. 
To turn it into and ideas. 
And then give it out as part of RDF data 
set, which is essentially a graph. 
Right? 
So that's one method of doing it. 
Iago, free-base kind of efforts, they're 
there focusing on, Iago and DDPDS, sorry, 
are focusing more on, let's take some. 
Almost structured data like Wikipedia, 
and apply whole bunch of machine learning 
tools, natural language processing tools, 
so that you can extract out these facts, 
x is related to y, in some way, and then 
put them into the graph form. 
Right? 
And psych, again, hand colored many of 
these knowledge. 
So, there are many efforts in which the 
graphs were created. 
Again, we are, I'm not going too much 
into any of those. 
I agree that these are challenges, but 
I'm looking at a slightly different 
challenge. 
[COUGH] So just to continue with this 
line and try to finish it quickly. 
Queries are bigger, graphs are bigger. 
And you can see hey why not use some 
magic bullet like how to write. 
Google users account for terabytes so 30 
billion should be T right But the problem 
is that, your graphs are not the same as 
text tables, right? 
Text pieces, which Google uses this one. 
So your graphs have complex inter 
connectivity, this is known as god knows 
what are the notes. 
Right? 
What kinds of relationships are there? 
That's one. 
And today some connections may not exist, 
but tomorrow you can just add one edge, 
it seems like a very harmless piece of 
edge saying Dolph Lundgren and Bruce 
Willis acted in The Expendables. 
They never acted before. 
Suddenly, you put these two together and 
form a connection. 
So as you assume the doing the hard loop 
you're actually partition in the nicely 
hard all, the guys who works with Bruce 
Willis. 
I will put these two down separately. 
And suddenly, expendables comes, all the 
partitioning falls apart, right? 
Which is not the case in the applications 
settings where hard loop is used. 
Your partitioning is fine. 
You don't need to worry about it anymore. 
Your partitioning logically means the 
same. 
But here things can get really messy, 
even in a single static snapshot, as well 
as you when have these dynamism in your 
data set. 
So therefore data partitioning solutions 
will also need rethinking. 
Okay, so given all that, it's not like 
I'm the only guy who has thought about it 
and teaching you. 
There's a whole bunch of people who have 
worked on it. 
Okay. 
There is huge research as well as 
industry work that's going on for 
developing graph data management tools 
for very different kind of applications. 
Both generic as well as in the analytic's 
world. 
So, for example, there is transactional 
graph database systems, data management 
systems, like Neo4J, Jena, HyperGraphDB, 
RDF3x. 
They're, they really focus on this LOD 
kind of settings. 
Right, well, you, until now, you never 
had analytic s queries on there. 
You just had more, just coming in, a 
pattern, and you match it. 
Right. 
But we are saying now that SPARQL is 
getting richer with recursive reasoning 
kind of queries. 
Then transactional GDM has to evolve into 
which supports little more than just a 
transactional pattern maxing place. 
And then you have analytic GDM. 
Analytic, graph database management, like 
Pregel and Giraph, which, can be seen as. 
Hadoop style processing for graphs, which 
have [UNKNOWN] of reasoning. 
You can compute page rank using Pregel. 
In fact, this is the clean Pregel. 
Paper also makes that, they design Pregel 
so that they can do [UNKNOWN] on massive 
graphs. 
[INAUDIBLE] 
 >> Legal is not. 
But it's open source implementation which 
is on to of hall loop, called Giraph. 
 >> [INAUDIBLE] 
 >> Giraph is open. 
 >> [INAUDIBLE] 
 >> Yea, it's actually bad at this 
moment. 
But, it's not as. 
 >> [INAUDIBLE] 
 >> Giraph lab, yes. 
Graph lab I haven't put yet because it's 
not really a GDI GD-ermium. 
They are more designed for doing machine 
loading applications on top of some 
graphs, without really worrying about, 
exactly. 
So that's basically the focus.