Now, given this RDF graph, what kind of queries I mean not necessarily RDF graph, again you can look at it, come back to the previous setting of just looking at graph databases. What kind of queries do you have? What kind of query languages that you have? In the original logic database world, we had Datalog. Which for most settings, it seemed like a relational related SQL kind of query, but for the recursive reasoning part. Which was in Datalog, which was not at that time in SQL, okay? And this has generated, I don't know, probably 20 or 30 really top class research papers. Okay? And this is just a very low, underestimate of the papers that have come out. And there is a huge work done on how to do recursive reasoning in Datalogs. Okay? And after that again in XPath setting, which I was talking about XML wave, which came into the database world. XPath also resulted in huge numbers of papers, again for exactly the same kind of problems. Like take the first query, wikimedia//editions. So what you're saying is, start from the root of wikimedia and look at any reachable note from wikimedia, and look for those notes which have editions as their type. So look at all of them and return those. So essentially this xpath returns all those paths which start from wikipedia, wikimedia and end with editions, okay. Editions is something which you have on Wikipedia. For example, different editions of the same page, right? so different, so that's one annotation. That's a type that you can add to the note. And you can actually have more constraints on it. You can specify the path, and you can say I want a specific name property, of that node, which is down there, that's also. So these are more general database type queries, once the graph databases, the real graph databases came into being, there was a, a, file called Blueprints. And on top of it, a language Gremlin, which is quite popular now for many graph databases. So, which is, almost, you can look at it as a JDBC for graph traversals, okay? So you pretty much have a Rich-ability queries that you can ask. You can ask for parts, you can ask for graph structure queries. All of them have same kind of JDBC style. Okay, you have a cursor, you can get the next one, if you talk really materializing. So, all these are supported in Gremlin interface. But, one disadvantage of Gremlin has always been that, if you want to merge the relational queries along with the graph traversals, you needed some hacking. Right? You need to write your own programs for this. So you essentially think of this as you want to add graph traversal on top of your standard sequel. It requires some extra effort, right? You need to put in different JDBC statements. Similarly, if you want to move from Gremlin into some kind of sequel like query, then you need to do this extra program. Which was not all that appreciated by SPARQL word, which is mainly for ID of data. Right? So there, they said, okay. Most of the queries that we focus on in SPARQL, which is, which has a recursive definition, we will not, get into that. Okay, SPARQL is a query language. Right, that's basically what SPARQL query language. So the focus there was, now given that there's idea of graph. Most of my queries are going to be query by patterns. So I give you a pattern of the graph that I'm interested in, subgraph that I'm interested in, okay? And retrieve all instances of this sub-pattern in my data set. So this could be extended to have templates of patterns, as in, you can have variables saying okay, I'm leaving some things unbounded. Okay, that's one thing that SPARQL, at least the version 1.0 focused on. And now, there are extensions which are trying to move toward the graph traversal support as well. So, actually providing the paths and trying to find Rich-abilities and so on. That's hopefully, it's going to come through in SPARQL 1.1 and 1.2. People are working on this. So how does a SPARQL query look like if you go back to the original Dolph Granite and Bruce Willis graph, you can ask for a query like this, select question mark, name, question mark, movie. So, which means I want the actor's name and the movies, where the certain condition's hold. These conditions can be seen as a subgraph conditions, right? So you want this guy to be action hero that you're interested in. And he should have acted in a movie and it should, so this movie is what you're looking for. And the name that you're looking for. The constants that you're adding is that, this person you're looking at, question mark name, should have worked with someone who was born in Stockholm. So if you look at the RDF graph that we explained, so clearly all you get is Bruce Willis for this, Bruce Willis and all his movies, not just the Expendables, all the movies. That's basically what you're getting out of this. So these are the query languages, yeah? >> All these movies provided for, the last two conditions were also satisfied? >> no, last two conditions are largely on just a name, so I'm not providing. >> No, I mean, Is it a must that, that he would have to work with a person and the person has to be born in Stockholm? >> Yes. >> So, only if a set of people are there and you have worked with Bruce Willis and was born in Stockholm? Only those conditions... >> Exactly. Exactly. So in, that's why I said in a previous example it was only Bruce Willis was there. But you can have, I mean if you, if anybody has seen the Expendables, you know there is a huge list of people that will come out of this. Right? pretty much Sylvester Stallone and Arnold Schwarzenegger. Every one of them. >> Who satisfies all these conditions? >> Who satisfies all these conditions? So you can find all the movies and their names. >> [INAUDIBLE]. >> Not as significantly different, right? So at least when SPARQL started, it seemed like very much like a SQL query. All right. But the point is that, in sequel, if you look at it from the sequel angle, this is a bunch of joints, huge bunch of joints. And SQL tries to a wide or at least whenever you're the world of relational databases, you want to minimize the joints. Right? So you come up with strategies for materializing these and reuse this. Similar ideas can be applied here, but the key differences are the kind of predicates that you have. There could be huge in number, which is again not the case in relational databases. Right? You typically have, how many tables are there in your database? Not more than probably 200 in the extreme setting. Okay. >> This, this looks like a SQL query. We're suppose that. >> This looks like. >> You take the, particularly [INAUDIBLE]. >> Yes. >> Then that's not there in SQL. >> That's not there in SQL, but you can always turn it around and say, okay, I will also model predicates as another column. All right? >> You have to have a data base which stores all possible predicates. >> Predicates as well. >> You have that on a separate table. >> Yes. >> And then query that. >> And then get back. >> Yeah. >> So there are some efforts for that kind of thing also. So, some kind of you have these metadata that you have, you can query the metadata, get the table names and then query. So, you can do that also, but these are never the preferred model in relation. >> [CROSSTALK]. >> Yeah, so you're breaking the relational ideas there. >> [INAUDIBLE]. >> Yes. >> All of that data? >> You are making a universal database and then filing queries on it. Right? That's one way of looking at it from a relational point. You don't do all these normalizations. You don't do anything. You just make it one big huge universal. >> [INAUDIBLE]. >> As you state, [INAUDIBLE]. >> Yes. >> [INAUDIBLE]. >> Yeah. Everything falls apart. >> [INAUDIBLE]. >> Okay. >> [INAUDIBLE]. >> Yes. >> Do you [INAUDIBLE]. I think you want to get to that, right? [INAUDIBLE] >> Yes, so that's, that's the meat of the, challenge, right? So if you look at, so whenever you look at these query languages they don't talk about how they have to be evaluated. So at this point I have no clue how this particle has to be evaluated, I don't care also. These are declarative, right? I don't really care about it. But you really have to worry about these kind of issues, like, should I go for universal relation? Which means I have to deal with null values, storage issues, minimizing the search space, whole bunch of things, or should I do some other trick? Should I normalize only in certain cases, not normalize it, right? These are the challenges which you will find if you try directly translating these graph-like queries into relational setting. >> [INAUDIBLE]. >> Yes. >> [INAUDIBLE]. >> Yes. >> [INAUDIBLE]. >> I mean see all these normal forms only help not only the efficiency but also in order to keep the consistency in some sense, right? The same requirements hold here also. In fact, as we will see in a couple of slides, one extreme way which you already might have seen in the, this example RDF graph, and I was looking at. You can look at this part, the triple part. So, I have Bruce Willis, born in Idar Oberstein. The edge can be my table. Just triple pattern table. So which, you if you go back to your normal form setting, it's almost like b, c, and f. Right? Where you have just key and value, nothing else. And the predicate is encoded in the table name, in the left-hand setting, but now you are explicitly storing it. That's, right? That's, that's basically the way in which graphs can be stored. >> The edge list. >> Which list? >> Yeah, that list with the [INAUDIBLE]. >> Exactly. >> [INAUDIBLE]. >> Yeah. >> On the SPARQL queries. What kind of queries are better suited for SPARQL, the ones which are deeper which in the sense you have table a right to b and then c and then d? Or is it b, c, d are all corrected to a directly? Which, what, where, which kinds of queries are better suited for SQL? >> It's, so that depends on entirely the application. So it's, so let's not worry about SPARQL, let's worry about SQL okay, I'm going to, so, what kind of queries are better suited for SQL? >> Not SQL, SQL is only one side of it. >> But that's only because your performance is weak. Suppose if I go from main memory databases, which support SQL. Probably deeper, whatever, it's perfectly fine, right? So in the same setting, in the same way. You cannot ask a question that is partly suited for kind of queries. Whether your database which really implements SPARQL. Is it better suited for this? So that's the, so in terms of power of the language, SPARQL is no more powerful than SQL. I mean SQL is already too incomplete, so, that is you can not get anything more powerful than that. So once you have that, SPARQL is no more powerful. So the reason why you came up with a new language than just reusing SQL is that the ease of use and the way you think. Right? You go to the add a file and you look at it from the relational setting so the way in which people think is different. So, in XML, people thought in threes. While in, relational tables, they looked at in table format. While in SPARQL, that is in the ideal world, people always look it as graphs. So, it's just the ease of use, not which is more powerful. Everything is equally powerful. It just simplifies your life, right?