Now, given this RDF graph, what kind of 
queries I mean not necessarily RDF graph, 
again you can look at it, come back to 
the previous setting of just looking at 
graph databases. 
What kind of queries do you have? 
What kind of query languages that you 
have? 
In the original logic database world, we 
had Datalog. 
Which for most settings, it seemed like a 
relational related SQL kind of query, but 
for the recursive reasoning part. 
Which was in Datalog, which was not at 
that time in SQL, okay? 
And this has generated, I don't know, 
probably 20 or 30 really top class 
research papers. 
Okay? 
And this is just a very low, 
underestimate of the papers that have 
come out. 
And there is a huge work done on how to 
do recursive reasoning in Datalogs. 
Okay? 
And after that again in XPath setting, 
which I was talking about XML wave, which 
came into the database world. 
XPath also resulted in huge numbers of 
papers, again for exactly the same kind 
of problems. 
Like take the first query, 
wikimedia//editions. 
So what you're saying is, start from the 
root of wikimedia and look at any 
reachable note from wikimedia, and look 
for those notes which have editions as 
their type. 
So look at all of them and return those. 
So essentially this xpath returns all 
those paths which start from wikipedia, 
wikimedia and end with editions, okay. 
Editions is something which you have on 
Wikipedia. 
For example, different editions of the 
same page, right? 
so different, so that's one annotation. 
That's a type that you can add to the 
note. 
And you can actually have more 
constraints on it. 
You can specify the path, and you can say 
I want a specific name property, of that 
node, which is down there, that's also. 
So these are more general database type 
queries, once the graph databases, the 
real graph databases came into being, 
there was a, a, file called Blueprints. 
And on top of it, a language Gremlin, 
which is quite popular now for many graph 
databases. 
So, which is, almost, you can look at it 
as a JDBC for graph traversals, okay? 
So you pretty much have a Rich-ability 
queries that you can ask. 
You can ask for parts, you can ask for 
graph structure queries. 
All of them have same kind of JDBC style. 
Okay, you have a cursor, you can get the 
next one, if you talk really 
materializing. 
So, all these are supported in Gremlin 
interface. 
But, one disadvantage of Gremlin has 
always been that, if you want to merge 
the relational queries along with the 
graph traversals, you needed some 
hacking. 
Right? 
You need to write your own programs for 
this. 
So you essentially think of this as you 
want to add graph traversal on top of 
your standard sequel. 
It requires some extra effort, right? 
You need to put in different JDBC 
statements. 
Similarly, if you want to move from 
Gremlin into some kind of sequel like 
query, then you need to do this extra 
program. 
Which was not all that appreciated by 
SPARQL word, which is mainly for ID of 
data. 
Right? 
So there, they said, okay. 
Most of the queries that we focus on in 
SPARQL, which is, which has a recursive 
definition, we will not, get into that. 
Okay, SPARQL is a query language. 
Right, that's basically what SPARQL query 
language. 
So the focus there was, now given that 
there's idea of graph. 
Most of my queries are going to be query 
by patterns. 
So I give you a pattern of the graph that 
I'm interested in, subgraph that I'm 
interested in, okay? 
And retrieve all instances of this 
sub-pattern in my data set. 
So this could be extended to have 
templates of patterns, as in, you can 
have variables saying okay, I'm leaving 
some things unbounded. 
Okay, that's one thing that SPARQL, at 
least the version 1.0 focused on. 
And now, there are extensions which are 
trying to move toward the graph traversal 
support as well. 
So, actually providing the paths and 
trying to find Rich-abilities and so on. 
That's hopefully, it's going to come 
through in SPARQL 1.1 and 1.2. 
People are working on this. 
So how does a SPARQL query look like if 
you go back to the original Dolph Granite 
and Bruce Willis graph, you can ask for a 
query like this, select question mark, 
name, question mark, movie. 
So, which means I want the actor's name 
and the movies, where the certain 
condition's hold. 
These conditions can be seen as a 
subgraph conditions, right? 
So you want this guy to be action hero 
that you're interested in. 
And he should have acted in a movie and 
it should, so this movie is what you're 
looking for. 
And the name that you're looking for. 
The constants that you're adding is that, 
this person you're looking at, question 
mark name, should have worked with 
someone who was born in Stockholm. 
So if you look at the RDF graph that we 
explained, so clearly all you get is 
Bruce Willis for this, Bruce Willis and 
all his movies, not just the Expendables, 
all the movies. 
That's basically what you're getting out 
of this. 
So these are the query languages, yeah? 
 >> All these movies provided for, the 
last two conditions were also satisfied? 
 >> no, last two conditions are largely 
on just a name, so I'm not providing. 
 >> No, I mean, Is it a must that, that 
he would have to work with a person and 
the person has to be born in Stockholm? 
 >> Yes. 
 >> So, only if a set of people are 
there and you have worked with Bruce 
Willis and was born in Stockholm? 
Only those conditions... 
 >> Exactly. 
Exactly. 
So in, that's why I said in a previous 
example it was only Bruce Willis was 
there. 
But you can have, I mean if you, if 
anybody has seen the Expendables, you 
know there is a huge list of people that 
will come out of this. 
Right? 
pretty much Sylvester Stallone and Arnold 
Schwarzenegger. 
Every one of them. 
 >> Who satisfies all these conditions? 
 >> Who satisfies all these conditions? 
So you can find all the movies and their 
names. 
 >> [INAUDIBLE]. 
 >> Not as significantly different, 
right? 
So at least when SPARQL started, it 
seemed like very much like a SQL query. 
All right. 
But the point is that, in sequel, if you 
look at it from the sequel angle, this is 
a bunch of joints, huge bunch of joints. 
And SQL tries to a wide or at least 
whenever you're the world of relational 
databases, you want to minimize the 
joints. 
Right? 
So you come up with strategies for 
materializing these and reuse this. 
Similar ideas can be applied here, but 
the key differences are the kind of 
predicates that you have. 
There could be huge in number, which is 
again not the case in relational 
databases. 
Right? 
You typically have, how many tables are 
there in your database? 
Not more than probably 200 in the extreme 
setting. 
Okay. 
 >> This, this looks like a SQL query. 
We're suppose that. 
 >> This looks like. 
 >> You take the, particularly 
[INAUDIBLE]. 
 >> Yes. 
 >> Then that's not there in SQL. 
 >> That's not there in SQL, but you can 
always turn it around and say, okay, I 
will also model predicates as another 
column. 
All right? 
 >> You have to have a data base which 
stores all possible predicates. 
 >> Predicates as well. 
 >> You have that on a separate table. 
 >> Yes. 
 >> And then query that. 
 >> And then get back. 
 >> Yeah. 
 >> So there are some efforts for that 
kind of thing also. 
So, some kind of you have these metadata 
that you have, you can query the 
metadata, get the table names and then 
query. 
So, you can do that also, but these are 
never the preferred model in relation. 
 >> [CROSSTALK]. 
 >> Yeah, so you're breaking the 
relational ideas there. 
 >> [INAUDIBLE]. 
 >> Yes. 
 >> All of that data? 
 >> You are making a universal database 
and then filing queries on it. 
Right? 
That's one way of looking at it from a 
relational point. 
You don't do all these normalizations. 
You don't do anything. 
You just make it one big huge universal. 
 >> [INAUDIBLE]. 
 >> As you state, 
[INAUDIBLE]. 
 >> Yes. 
 >> [INAUDIBLE]. 
 >> Yeah. 
Everything falls apart. 
 >> [INAUDIBLE]. 
 >> Okay. 
 >> [INAUDIBLE]. 
 >> Yes. 
 >> Do you [INAUDIBLE]. 
I think you want to get to that, right? 
[INAUDIBLE] 
 >> Yes, so that's, that's the meat of 
the, challenge, right? 
So if you look at, so whenever you look 
at these query languages they don't talk 
about how they have to be evaluated. 
So at this point I have no clue how this 
particle has to be evaluated, I don't 
care also. 
These are declarative, right? 
I don't really care about it. 
But you really have to worry about these 
kind of issues, like, should I go for 
universal relation? 
Which means I have to deal with null 
values, storage issues, minimizing the 
search space, whole bunch of things, or 
should I do some other trick? 
Should I normalize only in certain cases, 
not normalize it, right? 
These are the challenges which you will 
find if you try directly translating 
these graph-like queries into relational 
setting. 
 >> [INAUDIBLE]. 
 >> Yes. 
 >> [INAUDIBLE]. 
 >> Yes. 
 >> [INAUDIBLE]. 
 >> I mean see all these normal forms 
only help not only the efficiency but 
also in order to keep the consistency in 
some sense, right? 
The same requirements hold here also. 
In fact, as we will see in a couple of 
slides, one extreme way which you already 
might have seen in the, this example RDF 
graph, and I was looking at. 
You can look at this part, the triple 
part. 
So, I have Bruce Willis, born in Idar 
Oberstein. 
The edge can be my table. 
Just triple pattern table. 
So which, you if you go back to your 
normal form setting, it's almost like b, 
c, and f. 
Right? 
Where you have just key and value, 
nothing else. 
And the predicate is encoded in the table 
name, in the left-hand setting, but now 
you are explicitly storing it. 
That's, right? 
That's, that's basically the way in which 
graphs can be stored. 
 >> The edge list. 
 >> Which list? 
 >> Yeah, that list with the 
[INAUDIBLE]. 
 >> Exactly. 
 >> [INAUDIBLE]. 
 >> Yeah. 
 >> On the SPARQL queries. 
What kind of queries are better suited 
for SPARQL, the ones which are deeper 
which in the sense you have table a right 
to b and then c and then d? 
Or is it b, c, d are all corrected to a 
directly? 
Which, what, where, which kinds of 
queries are better suited for SQL? 
 >> It's, so that depends on entirely 
the application. 
So it's, so let's not worry about SPARQL, 
let's worry about SQL okay, I'm going to, 
so, what kind of queries are better 
suited for SQL? 
 >> Not SQL, SQL is only one side of it. 
 >> But that's only because your 
performance is weak. 
Suppose if I go from main memory 
databases, which support SQL. 
Probably deeper, whatever, it's perfectly 
fine, right? 
So in the same setting, in the same way. 
You cannot ask a question that is partly 
suited for kind of queries. 
Whether your database which really 
implements SPARQL. 
Is it better suited for this? 
So that's the, so in terms of power of 
the language, SPARQL is no more powerful 
than SQL. 
I mean SQL is already too incomplete, so, 
that is you can not get anything more 
powerful than that. 
So once you have that, SPARQL is no more 
powerful. 
So the reason why you came up with a new 
language than just reusing SQL is that 
the ease of use and the way you think. 
Right? 
You go to the add a file and you look at 
it from the relational setting so the way 
in which people think is different. 
So, in XML, people thought in threes. 
While in, relational tables, they looked 
at in table format. 
While in SPARQL, that is in the ideal 
world, people always look it as graphs. 
So, it's just the ease of use, not which 
is more powerful. 
Everything is equally powerful. 
It just simplifies your life, right?