In this video, we'll demonstrate
XPath by running a number of
queries over our bookstore data.
Let's first take a look at
the data, we've expanded it slightly
over what we've been using in
previous videos, but it
continues to have pretty much the same structure.
We have a number of books.
Books have attributes, ISBN, price,
sometimes in addition, they have
a title sub element, authors with first name and last name.
So we have our first course book and our complete book.
And our complete book also has a remark as you may recall.
Then I've added a couple more books.
I've added Hector and Jeff's
Database Hints by Jeffrey
Ullman and Hector Garcia Molina with
a remark, an indispensible companion to your textbook.
I've also
added Jennifer's Economical Database
Hints, for that at a
mere price of $25 you get some hints.
And then finally just to demonstrate
certain expressions, I've inserted
three magazines, two National Geographics and a Newsweek.
And finally, a magazine seen
called Hector and Jeff's Database Hints.
So with this data in mind, let's move to the queries.
We'll start with simple queries, and get more complicated as we proceed.
In this window, we'll
be putting our XPath expressions
in the upper pane, then we'll
execute the query and we'll see the results in the lower pane.
The way XPath works, the
first part of every expression
specifies the document document
over which the XPath expression is to be evaluated.
So we have the data that you saw in a document called bookstoreq.xml
and you'll see in each of
our expressions that we begin
by specifying that document, and
then move ahead to the rest of the XPath expression.
Our first expression is a very simple path expression.
It says navigate through the XML
by going first to the
root element called bookstore, then
look at all the books... so
the elements of book store
and finally all the titles of elements.
Let's run the query and we will see our results below.
So as we can see our results
here is actually written in XML little header appears.
And then we see the four titles
of books that are in our database.
Now let's modify our path expression.
Instead of only getting book
titles, let's get book or magazine titles.
We do that by extending
our middle matching element here
to use a sort of regular
expression like syntax, book or
magazine, and we put it in parentheses.
So now it says match any
path of the data that starts
at the bookstore element, follows either
a book or magazine sub-element, and
then finally a title sub-element.
When we run the query, we see
now that we get not only
the titles of our books, but
also the titles of our magazines.
So far, we've mentioned element names
explicitly in our path expressions.
But as I mentioned in the
introductory video, we can also
use what's known as a wild
card symbol, the symbol star.
Star says to match any element name.
So now we're going to start
again with bookstore, match any
element below bookstore, and finally,
find titles of elements below those any elements.
Now it so happens that the
only elements below bookstore are
books and magazines, so when
we run the query, we will get
exactly the same result.
So far we've been navigating with
the single slash operator which tells
us to go one element at a time.
We're at a particular element and
then we match sub-elements with the specific tag that we typed.
There is also the double slash operator.
As you recall from the introductory
video, double slash says
match myself or any
descendants of myself to any length.
So if I put a double
slash title, what we'll
be matching is any title
element anywhere at all in the XML tree.
We run the query, and again
we get exactly the same result
because we had already been getting
all of the titles which were
sub-elements of books or magazines.
Now let's get something a little different.
Let's put slash, slash, star.
Now that's kind of a
wild thing to put, because it
says I'm going to match any
element in the entire tree,
and furthermore, it can be
of any the element type, let's run the query and see what happens.
What we get is a huge result.
Let me just scroll down so you can see the result.
In fact, what we're getting
is every element at
every level of the tree,
including all of its sub elements.
So in fact, the first element
in our result is our entire
tree because it's our book store element.
And we'll go all the way down to the end of the book store.
The next element in our
result is some children of the book store, so we get the book elements.
And we're also going to get their children in the answer.
And as we keep scrolling
down, we'll see that we
get every element of the entire database.
That's not a useful query but
it does demonstrate the construct, the
double slash matching any element
in the tree and the
star matching any tag of any element.
Now let's turn to attributes.
Let's suppose we're interested in
returning all the ISBN number
in the database, so we'll go
back to saying book store,
books of elements and then
we'll get the attribute ISBN.
So we type at sign and ISBN.
Let's run the query and we get an error.
It turns out that attributes
cannot be what is called
serialized in order to
return them in an XML-looking result.
So what we need to do
actually is obtain the data
from the attribute itself, and once
we do that, we'll see that we're getting the answer that we desire.
So we'll ask for the data
of the attribute, run the query,
and now we see we have all its ISBN numbers.
Now the attribute data is
just strings, so we're returning
the ISBN numbers as a
set of strings with blanks between them.
So some of these are sort
of peculiarities of how XPath works.
Again, we were not able to
return an attribute because it
didn't know how to structure the
result, but once we extracted
the data from the attribute, it returned it as string values.
So far, we've only seen
path expressions with no conditions
involved, so let's throw in a condition.
Let's say that we're interested in
returning books that cost less than  $90.
So what we're going
to do here is navigate to
the book, and then we're
gonna use the square bracket which
says start evaluate a
condition at the current point of the navigation.
So the condition that I'm going
to put is that the
price of the book is less than 90.
We'll run that query
and we'll see that we have
two books whose price...three books,
I apologize, whose price
is less than 90.
Now here we return
the book that satisfied the condition.
What if what we actually want
to return is the title of the books whose price is less than 90?
What we can do is
after evaluating this condition on
the book, we can actually continue our navigation.
So we just put slash title here.
It says find the books, only
keep the ones that match
the condition, and then continue navigating
down to their titles and return that as the result of the query.
We run the query and we see our result.
Now another type of condition
that we can put in square brackets
is an existence condition instead of a comparison.
If we put, for example, just
the label remark inside our
square brackets, that says
that we should match books that have a remark.
So putting an element
name in square brackets is an
existence condition on that sub element existing.
Once we've isolated the books
that have a remark, we'll return the title of the books.
We run the query and we discover that two of our books have a remark.
You can go back and check the data and you'll see that's indeed the case.
Let's get a little bit more complicated now.
Let's get rid of this query and put in a whole new one.
In this query we're going
to find the titles of
books where the price is less
than 90 and where Ullman is one of the authors.
So now we have in
our square brackets a longer condition.
The price is less than ninety
and there exists, and implicitly
this is an exist, there exists
a sub part from the
book, author slash author
slash last name where the value of that is Ullman.
If we satisfy both of
those conditions it will return the
title of the book, so we
run the query and we discover
two books that are less
than 90 where Ullman is one of the authors.
Now let's expand the query by adding another condition.
We want not only the last
name of the author to be Ullman but the first name to be Jeffrey.
So now we're looking for books where Jeffrey Ullman is one of the authors,
and the books are less than 90.
So we run the query
and we get the same result,
not surprisingly, since Ullman is always paired with Jeffrey.
But actually this is not
doing quite what we're expecting,
and I'm gonna explain why by demonstrating some changes.
Let's say that we change
our query to not look for
Jeffrey Ullman as an author but to look for Jeffrey Widom.
Hopefully, we'll get no answers, but
when we run the query we
see still get a book, "The First Course In Database Systems."
So the two authors of
that book, if you look back
at the data, are Jeffrey Ullman and Jennifer Widom.
So let's see why that book was returned in this query.
The reason is, if we
look closely at this condition,
what we're saying is we're looking
for books where the price is less
than 90 and there exists
an author's author last
name path where the value is
Widom and there exists an
author's author first name
path where the value is Jeffrey.
Well, that in fact is true.
We have one author whose
last name is Widom and another
author whose first name is Jeffrey.
Let's try to formulate the correct that query now.
So instead of matching the entire
path to the last name and
then the entire path to the
first name separately through the
author's sub elements, what we
want to do so if we
want to look at each author
at a time and within that
author look at the last name and first name together.
So to modify our query to
do that wer're going to use a condition within the condition.
Specifically within the author
slash author, we'll look
at the last name and the first name.
This syntax error is temporary once
we finish the query, everything will look good.
So we put a
second bracket there and let
me show again what I've done,
that said we're looking for books
where the price is less than 90
and there exists an author
slash author sub element where
the last name is Widom and the first name is Jeffrey.
Hopefully, we'll get an empty answer here.
We execute the query and indeed we do.
Now our original goal was
to have Jeffrey Ullman, so finally
we'll change Jeffrey Ullman, run
the query, and now we get the correct answer.
Incidentally, it's a very
common mistake when we have
a condition to put a slash before the condition.
If we did that we would get a syntax error.
When we write the square
bracket, it essentially acts like
a slash so when we reference
a sub-element name within a
square bracket, we're implicitly navigating that sub-element.
Next, we're going to try a similar query with a twist.
We're going to try to find books
where Ullman is an author and Widom is not an author.
So we, we navigate the books
as usual and we look
for cases where there's an
authors author last name equals
Ullman and there's an authors
author last name not equal to Widom.
Now you may already detect that
the this is not the correct query, but let's go ahead and run.
And we see that we
got three books but we
know the first two books,
Widom is an author so as you may detected this is not correct.
What this asks for are books
where there's an author whose
last name is Ullman and there's
some author whose last name is not Widom.
well, in fact, every book with
Ullman as an author has some author whose last name is not Widom.
That would be Ullman.
So even if I took
away this condition and ran
the query again, I'll get exactly for the same results.
Well, actually I got a syntax error.
I forgot to erase
the and, so let's get rid
of that, run the query,
and now we do in fact get the exact same result.
So as a reminder, we were
trying to find books where the
last, where Ullman is an
author and Widom is not, in fact
we do not have construct yet to write that query.
A little later in the demo
we'll see how we can in
a kind of tricky fashion but
for what we've seen so far
with path expressions and conditions,
we're unable to write that specific query.
So far, we've seen two
types of conditions in brackets,
we saw comparisons and we
saw existence constraints where we
checked to see whether a particular sub element existed.
As you might remember from the
intro, we can also put numbers
inside in square brackets and those
numbers tell us to return the F sub element.
Specifically, if we look at
this query, we're using slash, slash
to navigate directly to authors
elements and then we want
to return the second author
sub element of each author's element.
So we run the query and we'll
see if we look that our
data that Jennifer Widom, Jeffrey
Ullman and Hector Garcia Melina
each appear once as the
second author of a book
or a magazine, if we
changed this to three, we'll
be returning third authors only
and we can see only Jennifer Widom as a third author.
If we change this to ten,
hopefully, we'll get an empty
result and in fact, we do.
Now let's take a look at some built-in functions and predicates.
In this query, we're going to
find all books where there's
a remark about the book that contains the word great.
So we're going to navigate using slash,
slash directly to book
elements and within the
book element, we'll have a condition
that invokes the built in
predicate contains, which I
mentioned in the introductory video, which
looks at two strings and checks
whether the first string contains the second one.
So if we have a book
where there's a remark which is
a string that contains the
word great, then the book
matches the condition and will return the title of the book.
We run the query and we
see that we have one book that
has the remark containing the word great.
Our next query does something kind of new.
I like to call this query
a self join but that's
probably only because I'm a relationally biased person.
But what it's actually doing
is querying sort of two
instances of our bookstore data
at once and joining them together.
So we'll see that our Doc
Bookstore appears twice in this expression.
Let me explain what this expression is doing.
It's finding all magazines where
there's a book that has
the same title as the
magazine and here's how it does it.
So our first path expression navigates
two magazines and then
it extracts in the condition the title of the magazines.
The magazine will match if
the title equals some book
title and so to
find the book titles, we need
to go back to the top
of the document so we get
a second incidence of the
document and we find book titles.
Now when we have the equals
here, this equals is implicitly be existentially quantified.
Did you follow that?
Implicitly existentially quantified.
That means that even though
we're doing equals on what's
effectively a set, the condition
is satisfied if some element
of the set is equal to the first title.
Okay.
There's a lot of implicit existential
quantification going on in
equality in XPath and
in XQuery, as well, as we'll see later on.
In any case, let's run the
query and we will get
back the fact that the
magazine called "Hector and Jeff's
Database Hints" has the same
title as a book, and
if you look back in the data,
you'll see we do have a book of the same name.
We saw one example of a built in predicate contains.
This example shows another built
in function, in this case
the name function, and it
also shows our first example of a navigation axis.
We're going to use the parent axis.
What this query is going
to find is all elements
whose parent element tag
is not bookstore or book.
Of course, this is just for demonstration purposes.
It's not really that useful of a query.
But let me just walk through the construction of the query.
So we're starting with our
bookstore and then we're using
// which finds all elements.
We saw // earlier when
we ran the query we saw
that it matched every element in the book, in the database.
Now, since we've already put in bookstore.
We're not going to match the
bookstore element itself but we'll
match every child of the bookstore element.
So what the condition looks for
is the tag of the
parent of the current element
and it sees if it's book
store or book and we
return the element if it's
neither book store or book at the parent tag.
Here's how we find the parent tag.
So name is a built
in function, name operates on
an element and it returns the tag of that element.
The element we want to
look at is the parent of
the current element and the
way we do that is with the
parent navigation axis which is parent colon, colon.
Finally, the star is matching the tags of the parents.
Well, here we say match any tag
of the parent, extract the tag
and check if it's book store or book.
So when we run the query,
we'll see that we get that
pack a lot of data but
all of them are elements
in the database whose parent is not the book store or book.
Here's another example of a navigation axis.
In this case, we're using following sibling.
Following sibling says if we
are at a specific point in the
tree you should match
every sibling so every
other element at the same
level that's later in
the document, that follows the current sibling.
So let's walk through this expression and see what we're doing.
What this expression is looking
for is all books and
magazines that have a non-unique title.
In other words, all books or magazines
where some other book or magazine
has this same title.
So we navigate down to
books or magazine elements, this
is what we saw in one of
our earlier path expressions, we'll
match any book or magazine
element and then we
want to find one where
the title is equal to
some title of a later sibling.
Now our books and magazines are
all at the same level in
our data so when we
do following sibling, we're going
to be matching all other
books and magazines that appear after the current one.
And again, this star says that
we can match on element of any type.
We could equivalently put book or
magazine in here because we
know they're all books or magazines,
and we'll do that in a moment,
but for now let's just focus on running the query.
So we execute the query and we find two answers.
We find "Hector And Jeff's Database
Hints," which is a book because
we had a magazine of the
same title and we find
"National Geographic," which is
a magazine because there's another magazine of the same title.
So actually this query was somewhat incomplete.
And that was our fault.
The way we wrote the query we
said that we want to return
book or magazine elements when a later one has the same title.
So that doesn't actually return
all of the ones with non-unique
titles, it only returns the
first instance of each one with a non unique title.
Let's modify the query to do the right thing.
What we need to do is not
only check whether the title
equals the following sibling title
of some book or magazine, but
whether it might also equal a proceeding one.
So we add title equals the
same construct using the proceeding
sibling axis slash, slash,
title...  Here we
go, and now when we
run the query, we see
that we get Hector and Jeff's Database
Hints and National Geographic, but
we also get another instance
of National Geographic and another
instance of Hector and Jeff's Database Hints.
So now we have the correct answer.
We don't only get the first
instance of duplicated titles, but we get both of them.
Now to show the use of
the star, we were matching
any book or magazine as the following sibling.
What if all we were interested
in is cases where there's
a book that has the
same title, but not a
magazine, and we can do the same thing here.
In that case, we shouldn't get "National Geographic" anymore.
Let's run the query and indeed
all we get in fact, is
"Hector and Jeff's Database Hints," as
a magazine because that was
the only instance where there
was an actual book that had
the same title as opposed to
matching books or magazines with the star.
Don't take a look at this query yet.
Let me explain what I'm doing
before you try to untangle the syntax to do it.
As I mentioned earlier, Xpath
revolves around implicit existential quantification.
So when we are looking for
example, for an author whose
name is Ullman, implicitly we will
match the path if any
author has the last name Ullman.
And in general, most of
XPath revolves around matching
sets of values and then
returning things if any element of that set matches the condition.
What if we want to
do universal quantification, in other words, for all.
That turns out to be more complicated, but we can do it in a tricky fashion.
So, what I'd like
to do with this query is we're
going to find books where every
author's first name includes
J. If we wrote
it in the fashion that we
might be tempted to, or we
just say book author/author first
name includes J, then we'll
get books where some authors
first name contains J. To
get books where all author's
first names contains J is
more difficult and the way
we're going to do it is, it's
kind of a kluge, we're
going to use the built in function count.
So here's what we're doing in this query.
We're finding all books where
the number of authors whose
first name includes J is
the same as the number
of authors of the book without
a condition, okay?
So, specifically under "book" we
count the number of
matches of an author's author
sub-element, where the built-in function,
the built-in predicate contains, is true,
where the first name contains J.
And so we are counting the number
of authors whose first name contains
J and we're setting that
equal to the count of the first name sub-elements.
We'll run the query and we
will find, indeed, that there
are two books where all of
the authors' first name includes J.
We can use a related trick
to write the query we tried
to write earlier but failed to
find books where Ullman is an author and Widom is not an author.
So with the implicit existential,
what happened before is that
we found books where there was
an author whose name was
Ullman and then there was
an author whose last name was not Widom.
And of course, we still got everything back.
What we want to find is
books where there's a last
name that's Ullman and where none
of the authors have the last name of Widom.
That's effectively, again, a universal
quantification for all.
For all of the authors, their last name is not Widom.
Since we don't have a for
all construct in XPath, we're again going to use the count trick.
So in this query, we're looking
for books where one of
the authors' last name is Ullman
and the number of authors using,
count again, the number of
authors whose last name is Widom is zero.
So now we've expressed that
query, we run it, and we get the correct answer.
That concludes our demonstration of XPath.
We've shown a large number of
constructs and we've written some fairly complicated queries.
On the other hand, we certainly have not covered the entire XPath language.
If you're interested in our many
online materials, we'll also provide
a data and we encourage you to experiment on your own.