In this video, we'll demonstrate XPath by running a number of queries over our bookstore data. Let's first take a look at the data, we've expanded it slightly over what we've been using in previous videos, but it continues to have pretty much the same structure. We have a number of books. Books have attributes, ISBN, price, sometimes in addition, they have a title sub element, authors with first name and last name. So we have our first course book and our complete book. And our complete book also has a remark as you may recall. Then I've added a couple more books. I've added Hector and Jeff's Database Hints by Jeffrey Ullman and Hector Garcia Molina with a remark, an indispensible companion to your textbook. I've also added Jennifer's Economical Database Hints, for that at a mere price of $25 you get some hints. And then finally just to demonstrate certain expressions, I've inserted three magazines, two National Geographics and a Newsweek. And finally, a magazine seen called Hector and Jeff's Database Hints. So with this data in mind, let's move to the queries. We'll start with simple queries, and get more complicated as we proceed. In this window, we'll be putting our XPath expressions in the upper pane, then we'll execute the query and we'll see the results in the lower pane. The way XPath works, the first part of every expression specifies the document document over which the XPath expression is to be evaluated. So we have the data that you saw in a document called bookstoreq.xml and you'll see in each of our expressions that we begin by specifying that document, and then move ahead to the rest of the XPath expression. Our first expression is a very simple path expression. It says navigate through the XML by going first to the root element called bookstore, then look at all the books... so the elements of book store and finally all the titles of elements. Let's run the query and we will see our results below. So as we can see our results here is actually written in XML little header appears. And then we see the four titles of books that are in our database. Now let's modify our path expression. Instead of only getting book titles, let's get book or magazine titles. We do that by extending our middle matching element here to use a sort of regular expression like syntax, book or magazine, and we put it in parentheses. So now it says match any path of the data that starts at the bookstore element, follows either a book or magazine sub-element, and then finally a title sub-element. When we run the query, we see now that we get not only the titles of our books, but also the titles of our magazines. So far, we've mentioned element names explicitly in our path expressions. But as I mentioned in the introductory video, we can also use what's known as a wild card symbol, the symbol star. Star says to match any element name. So now we're going to start again with bookstore, match any element below bookstore, and finally, find titles of elements below those any elements. Now it so happens that the only elements below bookstore are books and magazines, so when we run the query, we will get exactly the same result. So far we've been navigating with the single slash operator which tells us to go one element at a time. We're at a particular element and then we match sub-elements with the specific tag that we typed. There is also the double slash operator. As you recall from the introductory video, double slash says match myself or any descendants of myself to any length. So if I put a double slash title, what we'll be matching is any title element anywhere at all in the XML tree. We run the query, and again we get exactly the same result because we had already been getting all of the titles which were sub-elements of books or magazines. Now let's get something a little different. Let's put slash, slash, star. Now that's kind of a wild thing to put, because it says I'm going to match any element in the entire tree, and furthermore, it can be of any the element type, let's run the query and see what happens. What we get is a huge result. Let me just scroll down so you can see the result. In fact, what we're getting is every element at every level of the tree, including all of its sub elements. So in fact, the first element in our result is our entire tree because it's our book store element. And we'll go all the way down to the end of the book store. The next element in our result is some children of the book store, so we get the book elements. And we're also going to get their children in the answer. And as we keep scrolling down, we'll see that we get every element of the entire database. That's not a useful query but it does demonstrate the construct, the double slash matching any element in the tree and the star matching any tag of any element. Now let's turn to attributes. Let's suppose we're interested in returning all the ISBN number in the database, so we'll go back to saying book store, books of elements and then we'll get the attribute ISBN. So we type at sign and ISBN. Let's run the query and we get an error. It turns out that attributes cannot be what is called serialized in order to return them in an XML-looking result. So what we need to do actually is obtain the data from the attribute itself, and once we do that, we'll see that we're getting the answer that we desire. So we'll ask for the data of the attribute, run the query, and now we see we have all its ISBN numbers. Now the attribute data is just strings, so we're returning the ISBN numbers as a set of strings with blanks between them. So some of these are sort of peculiarities of how XPath works. Again, we were not able to return an attribute because it didn't know how to structure the result, but once we extracted the data from the attribute, it returned it as string values. So far, we've only seen path expressions with no conditions involved, so let's throw in a condition. Let's say that we're interested in returning books that cost less than $90. So what we're going to do here is navigate to the book, and then we're gonna use the square bracket which says start evaluate a condition at the current point of the navigation. So the condition that I'm going to put is that the price of the book is less than 90. We'll run that query and we'll see that we have two books whose price...three books, I apologize, whose price is less than 90. Now here we return the book that satisfied the condition. What if what we actually want to return is the title of the books whose price is less than 90? What we can do is after evaluating this condition on the book, we can actually continue our navigation. So we just put slash title here. It says find the books, only keep the ones that match the condition, and then continue navigating down to their titles and return that as the result of the query. We run the query and we see our result. Now another type of condition that we can put in square brackets is an existence condition instead of a comparison. If we put, for example, just the label remark inside our square brackets, that says that we should match books that have a remark. So putting an element name in square brackets is an existence condition on that sub element existing. Once we've isolated the books that have a remark, we'll return the title of the books. We run the query and we discover that two of our books have a remark. You can go back and check the data and you'll see that's indeed the case. Let's get a little bit more complicated now. Let's get rid of this query and put in a whole new one. In this query we're going to find the titles of books where the price is less than 90 and where Ullman is one of the authors. So now we have in our square brackets a longer condition. The price is less than ninety and there exists, and implicitly this is an exist, there exists a sub part from the book, author slash author slash last name where the value of that is Ullman. If we satisfy both of those conditions it will return the title of the book, so we run the query and we discover two books that are less than 90 where Ullman is one of the authors. Now let's expand the query by adding another condition. We want not only the last name of the author to be Ullman but the first name to be Jeffrey. So now we're looking for books where Jeffrey Ullman is one of the authors, and the books are less than 90. So we run the query and we get the same result, not surprisingly, since Ullman is always paired with Jeffrey. But actually this is not doing quite what we're expecting, and I'm gonna explain why by demonstrating some changes. Let's say that we change our query to not look for Jeffrey Ullman as an author but to look for Jeffrey Widom. Hopefully, we'll get no answers, but when we run the query we see still get a book, "The First Course In Database Systems." So the two authors of that book, if you look back at the data, are Jeffrey Ullman and Jennifer Widom. So let's see why that book was returned in this query. The reason is, if we look closely at this condition, what we're saying is we're looking for books where the price is less than 90 and there exists an author's author last name path where the value is Widom and there exists an author's author first name path where the value is Jeffrey. Well, that in fact is true. We have one author whose last name is Widom and another author whose first name is Jeffrey. Let's try to formulate the correct that query now. So instead of matching the entire path to the last name and then the entire path to the first name separately through the author's sub elements, what we want to do so if we want to look at each author at a time and within that author look at the last name and first name together. So to modify our query to do that wer're going to use a condition within the condition. Specifically within the author slash author, we'll look at the last name and the first name. This syntax error is temporary once we finish the query, everything will look good. So we put a second bracket there and let me show again what I've done, that said we're looking for books where the price is less than 90 and there exists an author slash author sub element where the last name is Widom and the first name is Jeffrey. Hopefully, we'll get an empty answer here. We execute the query and indeed we do. Now our original goal was to have Jeffrey Ullman, so finally we'll change Jeffrey Ullman, run the query, and now we get the correct answer. Incidentally, it's a very common mistake when we have a condition to put a slash before the condition. If we did that we would get a syntax error. When we write the square bracket, it essentially acts like a slash so when we reference a sub-element name within a square bracket, we're implicitly navigating that sub-element. Next, we're going to try a similar query with a twist. We're going to try to find books where Ullman is an author and Widom is not an author. So we, we navigate the books as usual and we look for cases where there's an authors author last name equals Ullman and there's an authors author last name not equal to Widom. Now you may already detect that the this is not the correct query, but let's go ahead and run. And we see that we got three books but we know the first two books, Widom is an author so as you may detected this is not correct. What this asks for are books where there's an author whose last name is Ullman and there's some author whose last name is not Widom. well, in fact, every book with Ullman as an author has some author whose last name is not Widom. That would be Ullman. So even if I took away this condition and ran the query again, I'll get exactly for the same results. Well, actually I got a syntax error. I forgot to erase the and, so let's get rid of that, run the query, and now we do in fact get the exact same result. So as a reminder, we were trying to find books where the last, where Ullman is an author and Widom is not, in fact we do not have construct yet to write that query. A little later in the demo we'll see how we can in a kind of tricky fashion but for what we've seen so far with path expressions and conditions, we're unable to write that specific query. So far, we've seen two types of conditions in brackets, we saw comparisons and we saw existence constraints where we checked to see whether a particular sub element existed. As you might remember from the intro, we can also put numbers inside in square brackets and those numbers tell us to return the F sub element. Specifically, if we look at this query, we're using slash, slash to navigate directly to authors elements and then we want to return the second author sub element of each author's element. So we run the query and we'll see if we look that our data that Jennifer Widom, Jeffrey Ullman and Hector Garcia Melina each appear once as the second author of a book or a magazine, if we changed this to three, we'll be returning third authors only and we can see only Jennifer Widom as a third author. If we change this to ten, hopefully, we'll get an empty result and in fact, we do. Now let's take a look at some built-in functions and predicates. In this query, we're going to find all books where there's a remark about the book that contains the word great. So we're going to navigate using slash, slash directly to book elements and within the book element, we'll have a condition that invokes the built in predicate contains, which I mentioned in the introductory video, which looks at two strings and checks whether the first string contains the second one. So if we have a book where there's a remark which is a string that contains the word great, then the book matches the condition and will return the title of the book. We run the query and we see that we have one book that has the remark containing the word great. Our next query does something kind of new. I like to call this query a self join but that's probably only because I'm a relationally biased person. But what it's actually doing is querying sort of two instances of our bookstore data at once and joining them together. So we'll see that our Doc Bookstore appears twice in this expression. Let me explain what this expression is doing. It's finding all magazines where there's a book that has the same title as the magazine and here's how it does it. So our first path expression navigates two magazines and then it extracts in the condition the title of the magazines. The magazine will match if the title equals some book title and so to find the book titles, we need to go back to the top of the document so we get a second incidence of the document and we find book titles. Now when we have the equals here, this equals is implicitly be existentially quantified. Did you follow that? Implicitly existentially quantified. That means that even though we're doing equals on what's effectively a set, the condition is satisfied if some element of the set is equal to the first title. Okay. There's a lot of implicit existential quantification going on in equality in XPath and in XQuery, as well, as we'll see later on. In any case, let's run the query and we will get back the fact that the magazine called "Hector and Jeff's Database Hints" has the same title as a book, and if you look back in the data, you'll see we do have a book of the same name. We saw one example of a built in predicate contains. This example shows another built in function, in this case the name function, and it also shows our first example of a navigation axis. We're going to use the parent axis. What this query is going to find is all elements whose parent element tag is not bookstore or book. Of course, this is just for demonstration purposes. It's not really that useful of a query. But let me just walk through the construction of the query. So we're starting with our bookstore and then we're using // which finds all elements. We saw // earlier when we ran the query we saw that it matched every element in the book, in the database. Now, since we've already put in bookstore. We're not going to match the bookstore element itself but we'll match every child of the bookstore element. So what the condition looks for is the tag of the parent of the current element and it sees if it's book store or book and we return the element if it's neither book store or book at the parent tag. Here's how we find the parent tag. So name is a built in function, name operates on an element and it returns the tag of that element. The element we want to look at is the parent of the current element and the way we do that is with the parent navigation axis which is parent colon, colon. Finally, the star is matching the tags of the parents. Well, here we say match any tag of the parent, extract the tag and check if it's book store or book. So when we run the query, we'll see that we get that pack a lot of data but all of them are elements in the database whose parent is not the book store or book. Here's another example of a navigation axis. In this case, we're using following sibling. Following sibling says if we are at a specific point in the tree you should match every sibling so every other element at the same level that's later in the document, that follows the current sibling. So let's walk through this expression and see what we're doing. What this expression is looking for is all books and magazines that have a non-unique title. In other words, all books or magazines where some other book or magazine has this same title. So we navigate down to books or magazine elements, this is what we saw in one of our earlier path expressions, we'll match any book or magazine element and then we want to find one where the title is equal to some title of a later sibling. Now our books and magazines are all at the same level in our data so when we do following sibling, we're going to be matching all other books and magazines that appear after the current one. And again, this star says that we can match on element of any type. We could equivalently put book or magazine in here because we know they're all books or magazines, and we'll do that in a moment, but for now let's just focus on running the query. So we execute the query and we find two answers. We find "Hector And Jeff's Database Hints," which is a book because we had a magazine of the same title and we find "National Geographic," which is a magazine because there's another magazine of the same title. So actually this query was somewhat incomplete. And that was our fault. The way we wrote the query we said that we want to return book or magazine elements when a later one has the same title. So that doesn't actually return all of the ones with non-unique titles, it only returns the first instance of each one with a non unique title. Let's modify the query to do the right thing. What we need to do is not only check whether the title equals the following sibling title of some book or magazine, but whether it might also equal a proceeding one. So we add title equals the same construct using the proceeding sibling axis slash, slash, title... Here we go, and now when we run the query, we see that we get Hector and Jeff's Database Hints and National Geographic, but we also get another instance of National Geographic and another instance of Hector and Jeff's Database Hints. So now we have the correct answer. We don't only get the first instance of duplicated titles, but we get both of them. Now to show the use of the star, we were matching any book or magazine as the following sibling. What if all we were interested in is cases where there's a book that has the same title, but not a magazine, and we can do the same thing here. In that case, we shouldn't get "National Geographic" anymore. Let's run the query and indeed all we get in fact, is "Hector and Jeff's Database Hints," as a magazine because that was the only instance where there was an actual book that had the same title as opposed to matching books or magazines with the star. Don't take a look at this query yet. Let me explain what I'm doing before you try to untangle the syntax to do it. As I mentioned earlier, Xpath revolves around implicit existential quantification. So when we are looking for example, for an author whose name is Ullman, implicitly we will match the path if any author has the last name Ullman. And in general, most of XPath revolves around matching sets of values and then returning things if any element of that set matches the condition. What if we want to do universal quantification, in other words, for all. That turns out to be more complicated, but we can do it in a tricky fashion. So, what I'd like to do with this query is we're going to find books where every author's first name includes J. If we wrote it in the fashion that we might be tempted to, or we just say book author/author first name includes J, then we'll get books where some authors first name contains J. To get books where all author's first names contains J is more difficult and the way we're going to do it is, it's kind of a kluge, we're going to use the built in function count. So here's what we're doing in this query. We're finding all books where the number of authors whose first name includes J is the same as the number of authors of the book without a condition, okay? So, specifically under "book" we count the number of matches of an author's author sub-element, where the built-in function, the built-in predicate contains, is true, where the first name contains J. And so we are counting the number of authors whose first name contains J and we're setting that equal to the count of the first name sub-elements. We'll run the query and we will find, indeed, that there are two books where all of the authors' first name includes J. We can use a related trick to write the query we tried to write earlier but failed to find books where Ullman is an author and Widom is not an author. So with the implicit existential, what happened before is that we found books where there was an author whose name was Ullman and then there was an author whose last name was not Widom. And of course, we still got everything back. What we want to find is books where there's a last name that's Ullman and where none of the authors have the last name of Widom. That's effectively, again, a universal quantification for all. For all of the authors, their last name is not Widom. Since we don't have a for all construct in XPath, we're again going to use the count trick. So in this query, we're looking for books where one of the authors' last name is Ullman and the number of authors using, count again, the number of authors whose last name is Widom is zero. So now we've expressed that query, we run it, and we get the correct answer. That concludes our demonstration of XPath. We've shown a large number of constructs and we've written some fairly complicated queries. On the other hand, we certainly have not covered the entire XPath language. If you're interested in our many online materials, we'll also provide a data and we encourage you to experiment on your own.