Yeah, thanks Gautam for inviting and it's 
great to be here. 
nice to see that there are so many people 
for this talk. 
I thought this talk would be one of those 
fringe talks which only three, four 
people would be interested, but great to 
see the room is getting full. 
So, what I'm going to talk about is look 
at largely efficiency issues when one is 
dealing with large scale graphs, and, for 
me the graphs are typically not 
necessarily the social network style 
graphs, but largely from the link data 
community. 
So we will see what these link data 
graphs, how they are different from 
social network graphs. 
An how big are these, an what are the 
sum, problems that you see while [NOISE] 
you're trying to deal with these graphs, 
and uh, [NOISE] how we can go about 
solving it. 
And much of the work, that may come later 
in the talk, is clearly not just me alone 
who has worked on it. 
I just haven't listed all my 
collaborators. 
But we can talk about it. 
I can send you the papers if required. 
So, first, I don't need to give a 
introduction about what a graph is, 
right? 
Everybody knows what a graph structure 
looks like. 
It's one of the most general way of 
representing information, right. 
One of the most flexible. 
Forms. 
But let's focus on what is a graph 
database. 
As compared to, you can look at, 
actually, any relation database as a 
graph database. 
In some sets. 
So you have foreign keys which link one 
record to another. 
So, no issues. 
So you can actually form a graph 
structure using these foreign key 
relationships. 
And within a triple between two values in 
the triple, you can actually form a 
relationship, because they're actually 
related. 
That's why they are called Relational 
Structures. 
So how is it different form a graph 
database, which is setting more popular 
nowadays? 
The key different is in the way the data 
gets accessed. 
Right. 
In relational databases, the focus is 
always on indexed access to some triple. 
So you know this group of values are a 
relation so I just want to access this 
entire group and then process it. 
On the other hand, in graph databases, 
the focus is more on and currently on 
this particularly value, tell me all the 
values that are related to it. 
It need not be through a single triple 
relationship. 
It could be normalized, de-normalized, 
across foreign key. 
I really have no control on what kind of 
relationship I'm looking for. 
But I could traverse this relationships 
from a given node. 
So this is the key difference, and this. 
If you, again, put it back in the 
relational world, it can be seen as a 
huge number of joins that you need to 
perform, right? 
Which is not a a fun task to do on 
relational databases, so you want to 
avoid these joins. 
On the other hand this is the only mode 
of traversal access that you are allowed 
to do on graph databases. 
Although we will relax it a little later 
and say that we want to do both 
relational style as well as navigational 
style of access, but this is the main 
difference between standard databases and 
graph databases. 
So some of the examples of what kind of 
questions you could get on graph 
databases find all friends of Gautam. 
So you know, a node Gautam, and you want 
to find all the relationships find out 
only those relationships which say it's 
friendship and locate the other end of. 
So this is one hop BFS with certain kind 
of restriction, right from the graph 
wall. 
You can actually look at little more 
complicated issues. 
So I, there is another the note called 
Srikanta, and I want to find out all the 
connections that, not an individual 
single hop connections, but anyone who 
can be reached from Srikanta. 
So I want to contact him and probably 
sent out resume to him and say hey, 
please, propagate it in your network. 
So you should know how valuable that 
person is in terms of how many people he 
can reach out. 
Right? 
So look at Srikanta and look at all the 
reachable nodes from there. 
Right? 
These are some of the queries. 
Which you don't often find on relations 
databases. 
But these are extremely common when we 
are dealing with graph databases. 
 >> Essentially they are recursive 
joins. 
 >> They are recursive joins. 
 >> Including [INAUDIBLE] everything. 
 >> Yeah, right so, recursive join is 
one way of looking at if you have self 
join. 
But it could be simply join stream of 
joins. 
Okay? 
So it could be on the same table or 
across tables. 
You really don't have a requirement to 
stop that. 
So, it's not, again, something very new 
in fact, I haven't put the dates, but you 
can easily guess. 
Logic databases was well before, I think, 
probably 80% of this room was even born. 
Right? 
People have been working on it, and if 
you have met Alman or whoever visited TCS 
you can easily see that these guys 
 >> Database. 
 >> Yeah. 
They worked on data log, and even before 
that there was prolog which was from AI 
side, and these things were 30 years long 
history, right? 
So that's where. 
The real grounds of graph databases were 
sown, right. 
And then, of course, once the web came, 
it's very natural to see the web as a big 
graph and you're all familiar with page 
rank and Google's mode of trying to rank 
pages. 
Which is largely a graph query, right as 
we will see soon. 
And then XML wave hit database community. 
And XML is both a tree and if you relax 
it a little graph, alright? 
So, again, huge amount of work was came 
out during XML activity, right? 
I mean XML database research. 
So whenever you look at many graph papers 
from about 10, 20 years, 10 to 15 years 
old, then you will see they all refer 
back to XML, Xquery X path kind of 
settings. 
So, although they are not strictly 
graphs. 
Graph databases research flourished 
during that time. 
But now there is even bigger beast called 
Link Data. 
Okay, which is what I'm really excited 
about, which I'm going to talk about, in 
this talk. 
There the linked data graphs are really 
graphs. 
As in XML was mostly tree-structured, 
with some deviations from the tree. 
But, link data is mostly graph structures 
with very little deviating from the graph 
structure, as in moving into the tree. 
Very little. 
But so, we are really seeing true graph 
requirements in databases now. 
So, just to give a brief introduction of 
what these graphs would look like in link 
data setting, right? 
So I just was thinking about which are 
the good examples to give. 
And then I happened to catch the DVD of 
The Expendables. 
So I thought I will talk about. 
The good old Bruce Willis and Dolph 
Lundgren who are heroes in this movie, 
right? 
So you can form such a graph where Bruce 
Willis is well known and Dolph Lundgren, 
and you can have relationships, like both 
were born in different cities, 
Ida-Oberstein and, Stockholm. 
And then both of them have worked in one 
single movie, The Expendables, and it's a 
movie, and you can even construct further 
relationships like, this is a movie made 
in Hollywood and it's a action movie and 
these kind of things. 
And both of them are action heroes, out 
of which Dolph knows martial arts while 
Bruce Willis knows how handle a gun. 
Right. 
so this is a big this is a small snapshot 
of a fairly large graph that you can 
construct just from IMDB data set, just 
from movies that are out there. 
Yeah? 
 >> This looks like its very similar to 
Google Knowledge Graph, that they 
created. 
 >> Yes. 
[COUGH] 
 >> First two, three sentences of 
Wikipedia [INAUDIBLE]. 
 >> Yes. 
In fact, Google Knowledge Graph is 
another idea of link data graph that you 
can see. 
And, in fact, they have used not just 
used a Wikipedia, they have used IMDB as 
well, that used many of these almost 
structured sources in order to extract. 
 >> So I'm not going to talk about much 
on how exactly they build the graph. 
But this is exactly what you are saying. 
This is the way Google Knowledge Graph 
also looks like. 
 >> But the problem with Google 
Knowledge Graph is, they relate each node 
with some static node. 
They cannot operate on dynamically 
changing environment, [INAUDIBLE] 
 >> [COUGH] [INAUDIBLE] So if the query 
is written in dynamically changing events 
or some time series type of data change 
related. 
 >> Yeah. 
 >> They clearly [INAUDIBLE]. 
 >> They have a problem, using the 
knowledge graph correctly. 
You are perfectly right. 
In fact this is actually active research 
area. 
I'm not going to talk much about it. 
But we can discuss this offline. 
In fact this is also one of my research 
areas, coincidentally. 
So looking at how these even kind of, 
Data stream that's coming in. 
How you can represent them as graphs, how 
you can query them and clearly the 
problems only compound from what I'm 
talking about here right? 
So they're much harder problems. 
Google is still catching up with us on 
that front 
 >> This one. 
 >> Yeah. 
 >> When I'm looking right here 
[INAUDIBLE] in one sentence and 
[INAUDIBLE] makes [INAUDIBLE] so I would 
like to [INAUDIBLE] [COUGH] So given such 
a graph that you have constructed from 
known facts, right? 
So you can actually look at the graph as 
simply a bag of edges. 
So you can allow for multiple edges 
between nodes, right, this, with the same 
label, also. 
It's perfectly okay. 
RDF does not prevent it. 
So. 
And every idea is represented as a 
triple, which essentially says, what is 
the source of the edge, what is the 
target of the edge, and what is the label 
on the edge? 
And edges are always directed in RDF 
setting. 
And again, in the semantic web 
definition, when you look at it. 
So, RDF graphs, every node on edge has a 
unique identifier in the form of URI, but 
it's only a minor detail which is not 
really important and for looking at RDF 
data as a graph, right? 
Just just gives you some way of 
identifying the nodes and edges.