This video introduces the basics of XML.
XML can be thought of as
a data model, an alternative to
the relational model, for structuring data.
In addition to introducing XML,
we will compare it to the
relational model, although it
is not critical to have watched
the relational model videos in order to get something out of this one.
The full name of XML is the extensible markup language.
XML is a standard for
data representation and exchange, and
it was designed initially for exchanging
information on the Internet.
Now don't worry if you
can't read the little snippet in the corner of the video here.
You're not expected to at this point.
XML can be thought
of as a document format similar
to HTML, if you're familiar with HTML.
Most people are.
The big difference is that
the tags in an HTML document
describe the content of the
data rather than how to
format the data, which is
what the tags in HTML tend to represent.
XML also has a streaming
format or a streaming standard,
and that's typically for the use
of XML in programs, for
admitting XML and consuming XML.
So now let's take a look at the XML data itself.
You see on the left side of the video a portion of an XML document.
The entire document is available
from the website for the course.
XML has three basic components.
Again, fairly similar to HTML.
The first is tagged element.
So, for example let's take a look at this element here.
This is an element saying,
that the data here is a first name.
So we have a opening tag and we have a matching closing tag.
We also have nesting development.
So for example here we have an element that's authored.
We have the opening tag here, the
closing tag here, and we
have a nesting of the first name and last name elements.
Even larger we have
a book element here with opening
and closing tags with a
nesting of numerous elements inside
and the entire document actually is
one element whose opening tag
is bookstore and the closing tag
isn't visible on the video here.
So that's what elements consist
of, an opening tag, text or
other sub-elements and a closing tag.
In addition we have have
attributes so each element
may have within its opening
tag and let's take a look at the book element here.
A set of attributes and
an attribute consists of
an attribute name, the equal
sign and then an attribute value.
So, our book element
right here has three attributes.
One called ISPN, one called
Price and one called
Edition. And any element
can have any number of
attributes as long as the attribute names are unique.
And finally, the third component of
XML is the text itself
which is depicted here in black.
So, within elements, we can have strengths.
We have a strength all right
here, we have a title
here, here we have a remark.
And so, that's generally sort
of, think of XML as
a tree, the strings form, or
the text form, the leaf element of the tree.
So, again, those are the three major components of xml.
Look's a lot like HTML, except
the tags are describing the
content of the data, and not how to format it.
Now let's spend some time comparing the relational model against XML.
Again, it's not critical, that you
learn about the relational model and
you can skip this material if
you're not interested, but in many
cases when designing an application that's dealing
with data you might have to
make a decision whether you want
to use a relational database or whether
you want to store the data in XML.
So let's look at a
few different aspects of the
data and how it's used and
how it compares between relational and XML.
Let's start with the structure of the data itself.
So as we learn, the structure
in a relational model is basically a set of tables.
So we define the set of columns and we have a set of rows.
XML is generally, again it's
usually in a document or
a string format, but if you
think about the structure itself, the structure is hierarchical.
The nested elements induce a hierarchy or a tree.
There are constructs that actually allow
us to have links within
documents and so, you can
also have XML representing
a graph though, in general, it's
mostly thought of as a tree structure.
Next, let's talk about schemas.
In the relational model the schema is very important.
You fix your schema in
advance, when you design your database,
and them you add the data to conform to the schema.
Now, in XML, you have a lot more flexibility.
So the schema is flexible.
In fact, a lot of
people refer to XML as self-describing.
In other words, the
schema and the data kind of mixed together.
The tags on elements are
telling you the kind of data
you'll have, and you can have a lot of irregularity.
Now I will say that
their are many mechanisms for introducing
schemas into XML but they're not required.
In the relational model schemas are absolutely required.
In XML they're more optional.
In particular, let's go
back and take a look
at our example, and we'll
see that we have sort of
some structure in our example,
but not everything is perfectly structured,
as it would be in the model.
So, coming back here and taking a look,
first of all, we have
the situation where in this
first book, we have an attribute called edition, the third edition.
Whereas in the second book
we only have two attributes, so there's no addition in this book.
Now in the relational model,
we would have to have a column
for addition, and we have one for every book.
Although of course we could have null editions for some books.
In XML, it's perfectly acceptable
to have some attributes for some
elements and those attributes don't appear in other elements.
Here's another example where we
have a component in one book
that's not in another and it's this remark component.
So here we have a book
where we happen to have a
remark and incidentally, you can
see that this book suggests, this
remark suggests that we buy
the complete book together with the first course.
The first course is a subset,
so it's not a very
good suggestion, although Amazon actually did make that one.
Anyway, enough of the asides.
We do see that we have
remark for the first book
and we have no remark for the
second book and that's not
a problem whatsoever in XML.
In the relational model, we
would again have to use null values for that case.
And the third example I
just wanted to give is the number of authors.
So this first book has two authors.
The second book - you can't see them all, but it has three authors.
Not a problem in XML.
Having different numbers of things
is perfectly standard.
So the main point being
that there's a lot of flexibility
in XML in terms of the schema.
You can create your database with
certain types of elements, later
add more elements, remove elements,
introduce inconsistencies in
the structure, and it's not a problem.
And again, I'll mention one more
time that there are mechanisms for
adding schema-like elements to
XML or schema-like specifications to XML.
We will be covering those in the next two videos actually.
Next, let's talk about how this data is queried.
So for the relational model, we have relational algebra.
We have SQL.
These are pretty simple, nice languages, I would say.
It's a little bit of a
matter of opinion, but I'm going to give them a smiley face.
XML querying is a little trickier.
Now, one of the
factors here is that XML
is a lot newer than the
relational model and querying XML
is still settling down to some extent.
But I'm just gonna say, it's a little
less, so I'm gonna give
it a neutral face here, in
terms of how simple and
nice the languages are for querying
XML and we'll be spending some
time in later videos learning some of those languages.
Next, in our chart is the aspect of ordering.
So the relational model is
fundamentally an unordered model
and that can actually be considered a bad thing to some extent.
Sometimes in data applications it's nice to have ordering.
We learned the order by clause in SQL and that's a way to get order in query results.
But fundamentally, the data in our table, in our relationship database, is a set of data, without an ordering within that set.
Now, in XML we do have, I would say, an implied ordering.
So XML, as I said, can be thought of as either a document model or a stream model.
And either case,
just the nature of the
XML being laid out in
a document as we have here
or being in a stream induces an order.
Very specifically, let's take a look at the authors here.
So here we have two authors,
and these authors are in an order in the document.
If we put those authors in a relational database, there would be no order.
They could come out in either
order unless we did
a order-by clause in our
query, whereas in XML,
implied by the document structure is an order.
And there's an order between these two books as well.
Sometimes that order is meaningful; sometimes it's not.
But it is available to be used in an application.
Lastly, let's talk about implementation.
As I mentioned in earlier
videos, the relational model has
been around for as least
35 years, and the systems
that implement it have been around almost as long.
They're very mature systems.
They implement the relational model
as the native model of the
systems and they're widely used.
Things with XML are
a little bit different, partly again because
XML hasn't been around as long.
But what's happening right now
in terms of XML and conventional
database systems is XML is typically an add-on.
So in most systems, XML
will be a layer over the relational database system.
You can enter data in
XML; you can query data in XML.
It will be translated to a relational implementation.
That's not necessarily a bad thing.
And it does allow you to
combine relational data and
XML in a single system, sometimes
even in a single query, but
it's not the native model of the system itself.
Now you might have noticed that the name of this video is "Well-formed-XML".
So well-formed XML is
actually the most flexible XML.
An XML document or
an XML stream is considered
well formed if it adheres
to the basic structural requirements of XML.
And there aren't many.
Just that we have a single
root element, as we discussed
before, a single bookstore in this
case; that all of
our tags are matching, we don't
have open tags without closed
tags; and our tags
are properly nested, so we don't have interweaving of elements.
And finally, within each
element if we have attribute names, they're unique.
And that's about it.
That's all we require for a
XML document, or
a set of XML data to be considered well-formed.
And for many applications, that's all we're concerned about.
In order to test whether a
document is well-formed, and specifically
to access the components of
the document in a program,
we have what's called an XML parser.
So, we'll take an XML
document here, and we'll
feed it to an XML parser,
and the parser will check the
basic structure of the document,
just to make sure that everything is okay.
If the document doesn't appear to
these three requirements up here,
the parser will just send an error saying it's not well-formed.
If the document does adhere
to the structure, then what comes out is parsed XML.
And, there's various standards
for how we show parsed XML.
One is called the document object
model, or DOM; it's a
programmatic interface for sort
of traversing the tree that's implied by XML.
Another popular one is SAX.
That's a more of a stream model for XML.
So these are the ways in
which a program would access the
parsed XML when it comes out of the parser.
So one issue that comes up,
because the XML data is used
frequently on the internet, is
how we display XML.
So one way to display XML is just
as we see it here, but very
often we want to format the
data that's in an XML
document or an XML string
in a more intuitive way.
And actually there's a nice setup for doing that.
What we can do is use a
rule-based language to take
the XML and translate it automatically
to HTML, which we can then render in a browser.
A couple of popular languages
are cascading style sheets known
as CSS or the extensible
style sheet language known as XSL.
We're going to look a little bit
with XSL on a later video
in the context of query in XML.
We won't be covering CSS in this course.
But let's just understand how these
languages are used, what the basic structure is.
So the idea is that
we have an XML document
and then we send it to
an interpreter of CSS or
XSL, but we also have to have
the rules that we're going to use on that particular document.
And the rules are going to do things
like match patterns or add
extra commands and once
we send an XML document thorugh
the interpreter we'll get an
HTML document out and then we can render that document in the browser.
Now, one thing I should mention is
that we'll also check with the
parser to make sure
that the document is well formed
as well before we translate it to HTML.
To conclude, XML is a
standard for data representation and exchange.
It can also be thought of as a data model.
Sort of a competitor to the
relational model for structuring the data in one's application.
It generally has a lot more
flexibility than the relational
model, which can be a plus and a minus, actually.
In this video we covered the
well formed XML, so, XML
that adheres to basic structural requirements,
in the next video we will
cover valid XML, where we
actually do introduce a kind of
schema for XML.
The last thing I want to mention, is
that the formal specification for XML is quite enormous.
There are a lot of bells and whistles.
We're going to cover, in these
videos, the most important components
for understanding anything XML.