1
00:00:00,390 --> 00:00:02,410
In the previous video, we learned the basics of XML.

2
00:00:02,650 --> 00:00:04,240
In this video, we're

3
00:00:04,360 --> 00:00:05,670
going to learn about Document Type Descriptors,

4
00:00:06,300 --> 00:00:09,680
also known as DTDs, and also ID and ID ref attributes.

5
00:00:11,230 --> 00:00:12,690
We learned that well-formed XML

6
00:00:13,280 --> 00:00:14,630
is XML that adheres to

7
00:00:14,720 --> 00:00:16,520
basic structural requirements: a single

8
00:00:16,770 --> 00:00:18,600
root element, matched tags with

9
00:00:18,730 --> 00:00:20,410
proper nesting, and unique

10
00:00:20,820 --> 00:00:21,980
attributes within each element.

11
00:00:23,400 --> 00:00:25,620
Now we're going to learn about what's known as valid XML.

12
00:00:26,480 --> 00:00:27,700
Valid XML has to adhere

13
00:00:27,960 --> 00:00:29,440
to the same basic structural requirements

14
00:00:30,190 --> 00:00:31,890
as well-formed XML, but it

15
00:00:32,000 --> 00:00:34,140
also adheres to content specific specifications.

16
00:00:35,260 --> 00:00:37,350
And we're going to learn two languages for those specifications.

17
00:00:38,540 --> 00:00:39,770
One of them is Document Type

18
00:00:39,990 --> 00:00:41,750
Descriptors or DTDs, and the

19
00:00:41,920 --> 00:00:43,970
other, a more powerful language, is XML schema.

20
00:00:44,840 --> 00:00:45,650
Specifications in XML

21
00:00:46,360 --> 00:00:48,910
schema are known as XSDs, for XML Schema Descriptions.

22
00:00:50,750 --> 00:00:51,860
So as a reminder, here's how

23
00:00:52,030 --> 00:00:53,790
things worked with well-formed XML documents.

24
00:00:54,390 --> 00:00:55,490
We sent the document to a

25
00:00:55,730 --> 00:00:56,780
parser and the parser would

26
00:00:57,140 --> 00:00:57,980
either return that the document

27
00:00:58,380 --> 00:01:00,920
was not well-formed or it would return parsed XML.

28
00:01:02,010 --> 00:01:03,740
Now let's consider what happens with valid XML.

29
00:01:03,990 --> 00:01:05,220
Now we use a validating

30
00:01:05,920 --> 00:01:06,960
XML parser, and we have

31
00:01:07,110 --> 00:01:08,270
an additional input to the

32
00:01:08,320 --> 00:01:09,540
process, which is a

33
00:01:10,050 --> 00:01:12,230
specification, either a DTD or an XSD.

34
00:01:12,960 --> 00:01:15,210
So that's also fed to the parser, along with the document.

35
00:01:15,490 --> 00:01:16,730
The parser can again

36
00:01:17,070 --> 00:01:18,400
say the document is

37
00:01:18,520 --> 00:01:20,970
not well formed if it doesn't meet the basic structural requirements.

38
00:01:22,060 --> 00:01:23,030
It could also say that the

39
00:01:23,190 --> 00:01:24,470
document is not valid, meaning

40
00:01:24,750 --> 00:01:26,040
the structure of the document doesn't

41
00:01:26,390 --> 00:01:27,790
match the content specific specification.

42
00:01:28,600 --> 00:01:30,230
If everything is good, then

43
00:01:30,330 --> 00:01:32,150
once again "parsed XML" is returned.

44
00:01:33,250 --> 00:01:35,540
Now let's talk about the document-type descriptors, or DTDs.

45
00:01:36,480 --> 00:01:37,220
We see a DTD in

46
00:01:37,410 --> 00:01:38,390
the lower-left corner of the

47
00:01:38,460 --> 00:01:39,290
video, but we won't look

48
00:01:39,570 --> 00:01:40,810
at it in any detail, because we'll

49
00:01:40,910 --> 00:01:43,110
be doing demos of DTDs a little later on.

50
00:01:44,080 --> 00:01:45,080
A DTD is a language

51
00:01:45,400 --> 00:01:47,730
that's kind of like a grammar, and

52
00:01:47,780 --> 00:01:49,220
what you can specify in that language is for

53
00:01:49,400 --> 00:01:50,590
a particular document what elements

54
00:01:51,250 --> 00:01:52,470
you want that document to contain,

55
00:01:52,860 --> 00:01:53,740
the tags of the elements,

56
00:01:54,580 --> 00:01:55,680
what attributes can be in

57
00:01:55,800 --> 00:01:58,620
the elements, how the different types of elements can be nested.

58
00:01:59,600 --> 00:02:00,740
Sometimes the ordering of the

59
00:02:00,800 --> 00:02:01,860
elements might want to be

60
00:02:01,940 --> 00:02:04,760
specified, and sometimes the number of occurrences of different elements.

61
00:02:06,170 --> 00:02:07,410
DTDs also allow the

62
00:02:07,780 --> 00:02:08,920
introduction of special types of

63
00:02:09,000 --> 00:02:10,650
attributes, called id and idrefs.

64
00:02:11,910 --> 00:02:13,040
And, effectively, what these allow you

65
00:02:13,190 --> 00:02:14,790
to do is specify pointers within

66
00:02:15,070 --> 00:02:17,270
a document, although these pointers are untyped.

67
00:02:19,030 --> 00:02:19,910
Before moving to the demo,

68
00:02:20,390 --> 00:02:21,240
let's talk a little bit about

69
00:02:21,450 --> 00:02:22,760
the positives and negatives about

70
00:02:22,980 --> 00:02:23,730
choosing to use a DTD

71
00:02:24,350 --> 00:02:26,120
or and XSD for one's XML data.

72
00:02:26,260 --> 00:02:27,440
After all, if you're

73
00:02:27,550 --> 00:02:28,870
building an application that encodes

74
00:02:29,220 --> 00:02:30,340
its data in XML, you'll have

75
00:02:30,520 --> 00:02:31,840
to decide whether you want the

76
00:02:32,020 --> 00:02:33,180
XML to just be well formed

77
00:02:33,690 --> 00:02:34,670
or whether you want to

78
00:02:34,940 --> 00:02:36,610
have specifications and require the

79
00:02:37,000 --> 00:02:38,810
XML to be valid to satisfy those specifications.

80
00:02:40,350 --> 00:02:41,080
So, let's put a few positives

81
00:02:41,810 --> 00:02:44,340
of choosing a later of requiring a DTD or an XSD.

82
00:02:44,430 --> 00:02:46,470
First of all, one of

83
00:02:46,580 --> 00:02:47,540
them is that when you write your

84
00:02:47,650 --> 00:02:49,290
program, you can assume

85
00:02:49,490 --> 00:02:51,910
that the data adheres to a specific structure.

86
00:02:52,560 --> 00:02:54,020
So programs can assume a

87
00:02:54,480 --> 00:02:56,400
structure and so the

88
00:02:56,520 --> 00:02:57,300
programs themselves are simpler because they don't

89
00:02:57,640 --> 00:03:00,200
have to be doing a lot of error checking on the data.

90
00:03:00,690 --> 00:03:01,710
They'll know that before the data

91
00:03:01,950 --> 00:03:03,480
reaches the program, it's been

92
00:03:03,620 --> 00:03:06,530
run through a validator and it does satisfy a particular structure.

93
00:03:07,250 --> 00:03:08,740
Second of all, we talked

94
00:03:08,840 --> 00:03:10,460
at some time ago about

95
00:03:10,980 --> 00:03:12,440
the cascading style sheet language

96
00:03:13,130 --> 00:03:15,050
and the extensible style sheet languages.

97
00:03:15,920 --> 00:03:17,440
These are languages that take XML

98
00:03:17,880 --> 00:03:18,970
and they run rules on it

99
00:03:19,080 --> 00:03:21,010
to process it into a different form, often HTML.

100
00:03:22,290 --> 00:03:23,990
When you write those rules, if

101
00:03:24,170 --> 00:03:25,030
you note that the data

102
00:03:25,190 --> 00:03:26,600
has a certain structure, then those

103
00:03:26,790 --> 00:03:28,110
rules can be simpler, so like

104
00:03:28,440 --> 00:03:29,940
the programs they also can

105
00:03:30,170 --> 00:03:32,210
assume particular structure and it makes them simpler.

106
00:03:33,470 --> 00:03:34,440
Now, another use for DTDs

107
00:03:35,170 --> 00:03:36,430
or XSDs is as a

108
00:03:36,810 --> 00:03:38,410
specification language for conveying

109
00:03:39,140 --> 00:03:41,070
what XML might need to look like.

110
00:03:41,610 --> 00:03:43,220
So, as an example if you're

111
00:03:43,810 --> 00:03:45,300
performing data exchange using

112
00:03:45,590 --> 00:03:46,970
XML, maybe a company is

113
00:03:47,110 --> 00:03:48,630
going to receive purchase orders in

114
00:03:48,970 --> 00:03:50,120
XML, the company can

115
00:03:50,240 --> 00:03:51,320
actually use the DTD as

116
00:03:51,420 --> 00:03:52,950
a specification for what

117
00:03:53,150 --> 00:03:54,330
the XML needs to look

118
00:03:54,590 --> 00:03:56,910
like when it arrives at

119
00:03:56,990 --> 00:03:59,340
the program it's going to operate on it.

120
00:03:59,580 --> 00:04:01,050
Also documentation, it can

121
00:04:01,220 --> 00:04:02,370
be useful to use one of

122
00:04:02,430 --> 00:04:03,680
the specifications to just document

123
00:04:04,180 --> 00:04:05,600
what the data itself looks like.

124
00:04:06,420 --> 00:04:07,950
In general, really what

125
00:04:08,080 --> 00:04:10,070
we have here is the benefits of typing.

126
00:04:11,130 --> 00:04:12,930
We're talking about strongly typed data

127
00:04:13,450 --> 00:04:16,000
versus loosely-typed data, if you want to think of it that way.

128
00:04:17,940 --> 00:04:20,480
Now let's look at when we might prefer not to use a DTD.

129
00:04:21,030 --> 00:04:22,470
So what I'm going describe down

130
00:04:22,650 --> 00:04:24,980
here is the benefits of not using a DTD.

131
00:04:25,460 --> 00:04:26,640
So the biggest benefit is flexibility.

132
00:04:27,840 --> 00:04:29,550
So a DTD makes your

133
00:04:30,110 --> 00:04:31,640
XML data have to conform to a specification.

134
00:04:33,150 --> 00:04:34,830
If you want more flexibility or

135
00:04:34,930 --> 00:04:36,020
you want ease of change

136
00:04:36,780 --> 00:04:37,470
in the way that the data is

137
00:04:37,750 --> 00:04:39,000
formatted without running into

138
00:04:39,140 --> 00:04:40,520
a lot of errors, then, if

139
00:04:40,590 --> 00:04:41,950
that's what you want,

140
00:04:42,180 --> 00:04:43,590
then the DTD can be constraining.

141
00:04:45,400 --> 00:04:46,660
Another fact is that DTDs can

142
00:04:46,820 --> 00:04:48,020
be fairly messy and this

143
00:04:48,080 --> 00:04:48,740
is not going to be obvious

144
00:04:49,140 --> 00:04:50,060
to you yet until we get

145
00:04:50,240 --> 00:04:52,160
into the demo, but if

146
00:04:52,990 --> 00:04:54,770
the data is irregular, very irregular, then

147
00:04:55,480 --> 00:04:56,890
specifying its structure can

148
00:04:57,090 --> 00:04:59,220
be hard, especially for irregular documents.

149
00:05:00,510 --> 00:05:01,920
Actually, when we see

150
00:05:02,660 --> 00:05:04,850
the schema language, we'll

151
00:05:04,990 --> 00:05:06,140
discover that XSDs can be,

152
00:05:06,810 --> 00:05:09,940
I would say, really messy, so they can actually get very large.

153
00:05:10,660 --> 00:05:11,730
It's possible to have a

154
00:05:11,770 --> 00:05:13,630
document where the specification of

155
00:05:13,700 --> 00:05:14,830
the structure of the document is

156
00:05:14,960 --> 00:05:16,270
much, much larger than the

157
00:05:16,330 --> 00:05:17,970
document itself, which seems not

158
00:05:18,160 --> 00:05:19,290
entirely intuitive, but when we get to

159
00:05:19,390 --> 00:05:21,230
learn about XSDs, I think you'll see how that can happen.

160
00:05:22,070 --> 00:05:23,480
So, overall, this is

161
00:05:23,780 --> 00:05:25,660
the benefits of nil typing.

162
00:05:26,200 --> 00:05:28,050
It' s really quite similar to

163
00:05:28,380 --> 00:05:30,040
the analogy in programming languages.

164
00:05:31,780 --> 00:05:32,870
The remainder of this video will

165
00:05:33,020 --> 00:05:35,170
teach about the DTDs themselves through a set of examples.

166
00:05:35,940 --> 00:05:36,810
We'll have a separate video

167
00:05:36,830 --> 00:05:39,320
for learning about XML schema and XSDs.

168
00:05:39,440 --> 00:05:41,540
So, here we are

169
00:05:41,660 --> 00:05:42,950
with our first document that we're

170
00:05:43,330 --> 00:05:44,730
going to look at with a document type descriptor.

171
00:05:45,790 --> 00:05:47,100
We have on the left the document itself.

172
00:05:47,610 --> 00:05:48,470
We have on the right the document-type

173
00:05:49,170 --> 00:05:50,190
descriptor, and then we have

174
00:05:50,330 --> 00:05:51,470
in the lower right a command

175
00:05:51,960 --> 00:05:54,000
line shell that we're going to use to validate the document.

176
00:05:55,150 --> 00:05:56,200
So this is similar data to

177
00:05:56,280 --> 00:05:57,260
what we saw on the last video,

178
00:05:57,490 --> 00:05:59,050
but let's go through it just to see what we have.

179
00:05:59,500 --> 00:06:00,960
We have an outermost element called

180
00:06:01,220 --> 00:06:03,760
bookstore, and we have two books in our bookstore.

181
00:06:04,830 --> 00:06:07,220
The first book has an ISBN number, price and editions.

182
00:06:08,270 --> 00:06:09,570
As attributes and then it

183
00:06:09,650 --> 00:06:11,620
has a sub-element called title, another

184
00:06:12,010 --> 00:06:13,440
sub-element called authors with two

185
00:06:13,620 --> 00:06:15,560
authors underneath; first names and last names.

186
00:06:16,310 --> 00:06:17,940
The second book element is

187
00:06:18,050 --> 00:06:19,530
similar, except it doesn't have a edition.

188
00:06:20,670 --> 00:06:22,230
It also has, as we see, a remark.

189
00:06:23,320 --> 00:06:24,600
Now let's take a look at

190
00:06:24,840 --> 00:06:25,580
the DTD and I'm just going

191
00:06:25,620 --> 00:06:27,490
to walk through DTD, not

192
00:06:27,810 --> 00:06:29,080
too slowly, not too fast, and

193
00:06:29,190 --> 00:06:30,600
explain exactly what it's doing.

194
00:06:30,790 --> 00:06:31,510
So the start of the

195
00:06:31,960 --> 00:06:33,200
DTD says this a

196
00:06:33,270 --> 00:06:35,040
DTD named bookstore and the

197
00:06:35,170 --> 00:06:36,390
root element is called bookstore,

198
00:06:37,070 --> 00:06:39,550
and now we have the first grammar-like construct.

199
00:06:40,800 --> 00:06:42,070
So these constructs, in fact, are

200
00:06:42,160 --> 00:06:43,900
a little bit like regular expressions if you know them.

201
00:06:44,530 --> 00:06:45,340
What this says is that

202
00:06:45,490 --> 00:06:47,120
a bookstore element has as

203
00:06:47,250 --> 00:06:48,600
its sub-element any number

204
00:06:49,110 --> 00:06:50,550
of elements that are called book or magazine.

205
00:06:51,280 --> 00:06:52,960
We have book or magazine.

206
00:06:53,660 --> 00:06:55,330
We don't have any magazines yet but we'll add one.

207
00:06:55,590 --> 00:06:58,150
And then this star says, zero or more instances.

208
00:06:58,690 --> 00:07:01,610
It's the clean and close operator for those of you familiar with regular expression.

209
00:07:02,150 --> 00:07:04,110
Now let's talk about

210
00:07:04,340 --> 00:07:06,410
what the book element
has, so that's our next specification.

211
00:07:07,910 --> 00:07:09,240
The book element has a

212
00:07:09,390 --> 00:07:11,050
title followed by authors,

213
00:07:11,890 --> 00:07:12,940
followed by an optional remark.

214
00:07:13,730 --> 00:07:14,470
So now we don't have an

215
00:07:14,520 --> 00:07:15,450
"or", we have a comma, and

216
00:07:15,700 --> 00:07:16,690
that says that these are going to

217
00:07:16,770 --> 00:07:17,700
be in that order - title,

218
00:07:17,990 --> 00:07:18,970
authors, and remark and the

219
00:07:19,310 --> 00:07:20,880
question mark says that the remark is optional.

220
00:07:22,210 --> 00:07:24,260
Next we have the attributes of our book elements.

221
00:07:24,740 --> 00:07:26,050
So this bang attribute list

222
00:07:26,430 --> 00:07:27,200
says we're going to describe

223
00:07:27,640 --> 00:07:28,780
the attributes and we're going

224
00:07:28,850 --> 00:07:30,170
to have three of them: the ISBN,

225
00:07:31,380 --> 00:07:32,170
the price, and the edition.

226
00:07:33,070 --> 00:07:34,570
C data is the type of the attribute.

227
00:07:35,160 --> 00:07:35,620
It's just a string.

228
00:07:36,240 --> 00:07:37,500
And then required says that

229
00:07:37,720 --> 00:07:39,030
the attribute must be present, whereas

230
00:07:39,280 --> 00:07:40,840
implied says it doesn't have to be present.

231
00:07:41,420 --> 00:07:43,980
As you may remember, we have one book that doesn't have an edition.

232
00:07:45,230 --> 00:07:46,350
Our magazines are simply going

233
00:07:46,560 --> 00:07:47,460
to have titles and they're going

234
00:07:47,660 --> 00:07:49,410
to have attributes that are month and year.

235
00:07:49,890 --> 00:07:51,140
Again, we don't have any magazines yet.

236
00:07:51,950 --> 00:07:53,360
A title is going to

237
00:07:53,740 --> 00:07:55,270
consist of string data.

238
00:07:55,580 --> 00:07:58,000
So here we see our title of first course and database system.

239
00:07:58,250 --> 00:08:00,980
You can think of that as the leaf data in the XML tree.

240
00:08:02,010 --> 00:08:03,420
And when you have a leaf that

241
00:08:03,680 --> 00:08:05,020
consists of text data, this is

242
00:08:05,360 --> 00:08:06,220
what you put in the DTD

243
00:08:06,700 --> 00:08:07,730
- just take my word for it:

244
00:08:08,090 --> 00:08:09,370
hash PC data in parentheses.

245
00:08:10,780 --> 00:08:13,910
Now our authors are an element that still has structure .

246
00:08:14,310 --> 00:08:15,910
Our authors have a sub-element,

247
00:08:16,680 --> 00:08:17,820
author sub-elements or elements,

248
00:08:18,160 --> 00:08:19,530
and we're going to

249
00:08:19,640 --> 00:08:20,880
specify here that the

250
00:08:21,200 --> 00:08:22,490
author's element must have one

251
00:08:23,070 --> 00:08:24,280
or more author subelements.

252
00:08:25,230 --> 00:08:26,110
So that's what the plus

253
00:08:26,580 --> 00:08:28,990
is saying here, again taken from regular expressions.

254
00:08:29,540 --> 00:08:30,780
"Plus" means one or more instances.

255
00:08:32,160 --> 00:08:33,370
We have the remark, which

256
00:08:33,530 --> 00:08:35,470
is just going to be pc data or string data.

257
00:08:36,370 --> 00:08:37,590
We have our authors which consist

258
00:08:38,040 --> 00:08:39,990
of a first name sub-element and

259
00:08:40,200 --> 00:08:42,260
a last-name sub-element, and in that order.

260
00:08:42,860 --> 00:08:45,280
And then finally, our first names and last names are also strengths.

261
00:08:46,180 --> 00:08:47,380
So, this is the entire

262
00:08:47,670 --> 00:08:48,650
DTD and it describes

263
00:08:49,500 --> 00:08:50,340
in detail the structure

264
00:08:51,640 --> 00:08:52,190
of our document.

265
00:08:53,260 --> 00:08:54,410
Now we have a command, we're

266
00:08:54,530 --> 00:08:55,960
using something called xmllint,

267
00:08:57,020 --> 00:08:59,420
that will check to see if the document meets the structure.

268
00:09:00,900 --> 00:09:02,050
We'll just run that command

269
00:09:02,210 --> 00:09:03,770
here with a couple of options, and

270
00:09:03,870 --> 00:09:04,830
it doesn't give us any output

271
00:09:05,150 --> 00:09:06,880
which actually means that our document is correct.

272
00:09:09,490 --> 00:09:12,010
Well be making some edits and seeing when our document is not correct what happens when we run the command.

273
00:09:13,140 --> 00:09:14,100
So let's make our first edit,

274
00:09:14,780 --> 00:09:16,050
let's say that we decide that

275
00:09:16,140 --> 00:09:17,330
we want the additional attribute

276
00:09:17,710 --> 00:09:20,150
of our books to be "required" rather than "applied".

277
00:09:21,330 --> 00:09:22,840
So we'll change the DTD.

278
00:09:23,090 --> 00:09:26,170
We'll save the file and now when we run our command.

279
00:09:27,700 --> 00:09:28,780
So as expected we got an

280
00:09:28,870 --> 00:09:29,990
error, and the error said

281
00:09:30,310 --> 00:09:32,520
that one of our book elements does not have attribute addition.

282
00:09:33,310 --> 00:09:36,390
Now that addition is required, every book element ought to have it.

283
00:09:36,730 --> 00:09:39,160
So let's add an addition to our second book.

284
00:09:39,380 --> 00:09:41,150
Let 's say that it's

285
00:09:41,280 --> 00:09:42,900
the second edition, save the

286
00:09:43,030 --> 00:09:44,640
file, we'll validate our

287
00:09:44,790 --> 00:09:47,040
document again, and now everything is good. Let's

288
00:09:48,350 --> 00:09:49,210
do an edit to the document

289
00:09:49,760 --> 00:09:51,030
this time to see what

290
00:09:51,180 --> 00:09:52,060
happens when we change the

291
00:09:52,130 --> 00:09:53,800
order of the first name and the last name.

292
00:09:54,860 --> 00:09:57,380
So we've swapped Jeffrey Ullman to be Ullman Jeffery.

293
00:09:58,680 --> 00:10:00,370
We validate our document, and now

294
00:10:00,700 --> 00:10:01,630
we see we got an error

295
00:10:02,050 --> 00:10:03,830
because the elements are not in the correct order.

296
00:10:04,700 --> 00:10:06,070
In this case, let's undo that

297
00:10:06,460 --> 00:10:07,720
change, rather than change our DTD.

298
00:10:09,290 --> 00:10:10,500
Let's try another edit to our document.

299
00:10:11,280 --> 00:10:12,960
Let's add a remark to our first book.

300
00:10:13,350 --> 00:10:14,430
But what we'll do is

301
00:10:14,640 --> 00:10:16,080
we'll leave the remark empty, so

302
00:10:16,380 --> 00:10:17,840
we'll add a opening and then

303
00:10:18,050 --> 00:10:21,960
directly a closing tag, and let's see if that validates.

304
00:10:24,210 --> 00:10:24,760
So, it did validate.

305
00:10:25,210 --> 00:10:26,370
And in fact when we have

306
00:10:26,680 --> 00:10:27,610
PC data as the type

307
00:10:27,870 --> 00:10:30,470
of an element it's perfectly acceptable to have a empty element.

308
00:10:32,390 --> 00:10:34,180
As a final change, let's add a magazine to our database.

309
00:10:34,860 --> 00:10:37,010
You'll have to bear with me as I type.

310
00:10:37,460 --> 00:10:38,230
I'm always a little bit slow.

311
00:10:39,080 --> 00:10:40,310
So we see over here that

312
00:10:40,430 --> 00:10:41,480
when we have a magazine there are

313
00:10:41,560 --> 00:10:44,160
two required attributes, the month and the year.

314
00:10:44,520 --> 00:10:45,670
So, let's say the month is

315
00:10:45,910 --> 00:10:47,510
January and the year,

316
00:10:48,100 --> 00:10:50,210
let's make that 2011,

317
00:10:50,960 --> 00:10:53,280
and then we have a title for our magazine.

318
00:10:53,940 --> 00:10:53,940
Here.

319
00:10:54,170 --> 00:10:54,770
We'll go down here.

320
00:10:55,730 --> 00:10:58,110
Our title, let's make it National Geographic.

321
00:11:00,520 --> 00:11:01,960
We'll close the tag, title tag.

322
00:11:03,660 --> 00:11:05,240
And then, sorry again about my typing.

323
00:11:05,610 --> 00:11:06,920
Let's go ahead and validate the document.

324
00:11:08,390 --> 00:11:10,790
we saw premature end of something or other.

325
00:11:11,810 --> 00:11:13,150
We forgot our closing tag for

326
00:11:13,220 --> 00:11:16,370
magazine, let's put that in.

327
00:11:17,720 --> 00:11:19,570
My terrible typing, and here we go.

328
00:11:19,900 --> 00:11:21,410
Let's validate, and we're done.

329
00:11:23,040 --> 00:11:25,390
Now we're gonna learn about and id rep attributes.

330
00:11:26,770 --> 00:11:27,660
The document on the left side

331
00:11:28,310 --> 00:11:29,420
contains the same data as

332
00:11:29,560 --> 00:11:31,370
our previous document but completely restructured.

333
00:11:32,410 --> 00:11:33,780
Instead of having authors as

334
00:11:33,990 --> 00:11:35,050
subelements of book elements,

335
00:11:35,640 --> 00:11:37,020
we're going to have our authors listed separately,

336
00:11:37,590 --> 00:11:40,650
and then effectively point from the books to the authors of the book.

337
00:11:41,550 --> 00:11:42,320
We'll take a look at the

338
00:11:42,400 --> 00:11:43,600
data first, and then

339
00:11:43,830 --> 00:11:46,100
we'll look at the DTD that describes the data.

340
00:11:47,110 --> 00:11:48,250
Let's actually start with the

341
00:11:48,370 --> 00:11:50,990
author, so our bookstore element

342
00:11:51,430 --> 00:11:54,110
here has two subelements that are books and three that are authors.

343
00:11:55,060 --> 00:11:56,560
So, looking at the authors, we have

344
00:11:56,910 --> 00:11:57,970
the first name and last name

345
00:11:58,140 --> 00:11:59,830
as sub-elements as usual, but

346
00:11:59,950 --> 00:12:01,950
we've added what we call the ident attribute.

347
00:12:02,380 --> 00:12:03,400
That's not a keyword; we've just

348
00:12:03,590 --> 00:12:04,720
called the attribute ident, and

349
00:12:05,260 --> 00:12:06,520
then for each of the three authors,

350
00:12:07,050 --> 00:12:08,510
we've given a string value

351
00:12:08,830 --> 00:12:10,000
to that attribute that we're going

352
00:12:10,180 --> 00:12:12,090
to use effectively for the pointers in the book.

353
00:12:12,940 --> 00:12:15,130
So we have our three authors, now let's take a look at the books.

354
00:12:16,210 --> 00:12:18,110
Our book has the ISBN number and price.

355
00:12:18,420 --> 00:12:19,750
I've taken the addition out for now.

356
00:12:21,320 --> 00:12:22,560
special attribute called authors.

357
00:12:23,820 --> 00:12:25,200
Authors is an ID reps

358
00:12:25,840 --> 00:12:27,060
attribute, and it's value

359
00:12:27,690 --> 00:12:28,780
can refer to one or

360
00:12:28,980 --> 00:12:30,770
more strings that are ID attributes.

361
00:12:31,290 --> 00:12:32,220
attributes in another element.

362
00:12:32,620 --> 00:12:33,510
So that's what we're doing here.

363
00:12:33,660 --> 00:12:35,840
We're referring to the two author elements here.

364
00:12:36,770 --> 00:12:39,520
And in our second book we're referring to the three author elements.

365
00:12:40,440 --> 00:12:41,490
We still have the title subelement

366
00:12:41,700 --> 00:12:43,890
and we still have the remarks subelement.

367
00:12:44,910 --> 00:12:45,830
And furthermore, we have one

368
00:12:46,270 --> 00:12:47,750
other cute thing here, which is,

369
00:12:47,870 --> 00:12:49,640
instead of referring to

370
00:12:49,810 --> 00:12:51,080
the book by name within the

371
00:12:51,150 --> 00:12:52,310
remark when we're talking about

372
00:12:52,570 --> 00:12:55,070
the other book, we have another type of pointer.

373
00:12:56,010 --> 00:12:57,470
So we'll specify that the

374
00:12:57,620 --> 00:12:59,350
ISBN is an ID

375
00:12:59,880 --> 00:13:01,420
for books and then this

376
00:13:01,640 --> 00:13:02,930
is an id reps attribute

377
00:13:03,610 --> 00:13:06,010
that's referring to the id of the other book.

378
00:13:07,830 --> 00:13:10,770
The DTD on the right that describes the structure of this document.

379
00:13:11,630 --> 00:13:12,690
This time our bookstore is

380
00:13:12,920 --> 00:13:14,160
going to contain zero or more

381
00:13:14,310 --> 00:13:16,190
books followed by zero or more authors.

382
00:13:17,380 --> 00:13:18,570
Our books contain a title and

383
00:13:18,770 --> 00:13:20,200
an optional remark is subelements and

384
00:13:20,830 --> 00:13:22,520
now they contain three attributes,

385
00:13:22,970 --> 00:13:24,430
the IDBN which is

386
00:13:24,560 --> 00:13:26,360
now a special type of

387
00:13:26,720 --> 00:13:28,420
attribute called and ID, the

388
00:13:28,610 --> 00:13:29,980
price,which is the string

389
00:13:30,100 --> 00:13:31,090
value as usual and the

390
00:13:31,360 --> 00:13:32,480
authors which is the special type

391
00:13:32,770 --> 00:13:34,680
called id reps.  Let's keep

392
00:13:34,850 --> 00:13:37,270
going, our title is just string Value as usual.

393
00:13:37,820 --> 00:13:41,040
A remark, here this is a actually interesting construct.

394
00:13:41,550 --> 00:13:43,710
A remark consist of the

395
00:13:43,810 --> 00:13:44,930
PC data which is string,

396
00:13:46,020 --> 00:13:47,320
or a book reference and then

397
00:13:47,580 --> 00:13:48,830
zero more instances of those.

398
00:13:50,090 --> 00:13:51,100
This is the type of construct

399
00:13:51,160 --> 00:13:52,150
that can be used to mix

400
00:13:52,730 --> 00:13:54,700
strings and sub elements within an element.

401
00:13:55,190 --> 00:13:56,260
So anytime you want an

402
00:13:56,350 --> 00:13:57,330
element that might have some

403
00:13:57,630 --> 00:14:00,110
strings and then another element and then more string value.

404
00:14:00,890 --> 00:14:01,390
That's how it's done.

405
00:14:01,820 --> 00:14:04,550
PC data or the element type zero or more.

406
00:14:05,970 --> 00:14:07,710
Then we have our book reference

407
00:14:08,020 --> 00:14:09,660
which is actually an empty element it's

408
00:14:09,910 --> 00:14:11,190
only interesting because is has

409
00:14:11,390 --> 00:14:12,270
an attribute so let's go

410
00:14:12,390 --> 00:14:13,200
back here we see our book

411
00:14:13,460 --> 00:14:14,530
wrap here it actually doesn't

412
00:14:14,770 --> 00:14:16,340
have any data or sub

413
00:14:16,490 --> 00:14:17,470
elements, but it has an

414
00:14:17,720 --> 00:14:20,050
attribute called book and that is an ID ref.

415
00:14:20,990 --> 00:14:22,660
That means it refers to an

416
00:14:22,740 --> 00:14:24,470
ID attribute of another, another

417
00:14:26,020 --> 00:14:26,020
element.

418
00:14:27,400 --> 00:14:28,630
Now we have our authors the first

419
00:14:28,850 --> 00:14:30,290
name and the last name and

420
00:14:30,460 --> 00:14:32,830
our author attributes have again

421
00:14:33,180 --> 00:14:34,990
an ID and we're calling it the ident.

422
00:14:35,890 --> 00:14:38,190
And finally the first name and last name are string values.

423
00:14:39,390 --> 00:14:40,850
This may seem overwhelming but the

424
00:14:40,900 --> 00:14:42,310
key points in this DTD

425
00:14:43,450 --> 00:14:43,850
are the ID the attributes.

426
00:14:44,310 --> 00:14:45,650
So the ID attributes, the ISBN

427
00:14:46,510 --> 00:14:47,900
attributes in the book, and

428
00:14:48,280 --> 00:14:50,550
the ident, wherever it

429
00:14:50,660 --> 00:14:51,580
went, ident attribute in the author

430
00:14:52,490 --> 00:14:53,810
are special attributes, and by

431
00:14:53,930 --> 00:14:54,830
the way, they do need to be

432
00:14:54,940 --> 00:14:56,240
unique values for those attributes,

433
00:14:57,210 --> 00:14:58,620
and they're special in that

434
00:14:58,750 --> 00:15:00,530
ID refs attributes can refer

435
00:15:01,000 --> 00:15:02,820
to them, and that will be checked as well.

436
00:15:03,520 --> 00:15:04,330
Now, I did want to

437
00:15:04,640 --> 00:15:05,450
point out that the book

438
00:15:05,810 --> 00:15:07,590
reference here says ID ref singular.

439
00:15:08,430 --> 00:15:09,810
When you have a singular

440
00:15:09,900 --> 00:15:10,990
ID ref then the string has

441
00:15:11,190 --> 00:15:12,750
to be exactly one ID value.

442
00:15:13,580 --> 00:15:15,010
When you have the plural ID refs.

443
00:15:15,660 --> 00:15:16,890
Then the string of the

444
00:15:17,190 --> 00:15:18,750
attribute is one or

445
00:15:19,010 --> 00:15:20,890
more ID ref value, I'm

446
00:15:21,380 --> 00:15:23,790
sorry one or more ID values separated by spaces.

447
00:15:24,390 --> 00:15:26,400
So it's a little bit clunky, but it does seem to work.

448
00:15:27,710 --> 00:15:30,400
Now let's go to our command line, and let's validate the document.

449
00:15:31,440 --> 00:15:32,740
So the document is in fact valid.

450
00:15:33,070 --> 00:15:33,950
That's what it means when we

451
00:15:34,050 --> 00:15:35,320
get nothing back, and let's

452
00:15:35,650 --> 00:15:36,680
make some changes, as we did

453
00:15:36,890 --> 00:15:38,640
before, to explore what structure

454
00:15:39,100 --> 00:15:41,590
is imposed and what's checked with this DTD in the presence.

455
00:15:42,200 --> 00:15:42,830
IDs and ID refs.

456
00:15:44,600 --> 00:15:45,860
As a first change, let's change

457
00:15:46,310 --> 00:15:47,620
this ID, this identifier

458
00:15:48,310 --> 00:15:50,520
HG to JU.

459
00:15:51,040 --> 00:15:52,030
That should actually cause a couple of problems

460
00:15:52,050 --> 00:15:53,060
when we do that let's

461
00:15:53,330 --> 00:15:55,130
validate the document and see what happens.

462
00:15:56,610 --> 00:15:58,160
And we do in fact get two different errors.

463
00:15:58,940 --> 00:16:00,320
The first error says that

464
00:16:00,580 --> 00:16:02,690
we have two instances of "JU".

465
00:16:03,070 --> 00:16:04,090
As you can see here, we

466
00:16:04,260 --> 00:16:06,030
now have JU twice where

467
00:16:06,400 --> 00:16:07,660
ID values do have to be unique.

468
00:16:08,070 --> 00:16:09,830
They have to be globally unique throughout the document.

469
00:16:10,890 --> 00:16:11,880
The second error that occurred

470
00:16:12,290 --> 00:16:14,300
when we changed HG to JU

471
00:16:14,450 --> 00:16:16,360
is we effectively have a dangling pointer.

472
00:16:17,270 --> 00:16:19,180
We refer to HG here

473
00:16:19,400 --> 00:16:21,290
in this ID refs attribute but there's

474
00:16:21,490 --> 00:16:23,720
no longer an element whose value is HG.

475
00:16:24,260 --> 00:16:25,420
So that's an error as well.

476
00:16:25,840 --> 00:16:26,980
So let's change it back to

477
00:16:27,720 --> 00:16:29,560
HG just so our document is valid again.

478
00:16:31,100 --> 00:16:33,880
Now let's make another change, let's take our book reference.

479
00:16:34,760 --> 00:16:37,550
We can see that our book reference is referring to the other book.

480
00:16:37,760 --> 00:16:38,790
We're in the complete book here

481
00:16:39,190 --> 00:16:40,340
and the comment, the remark is

482
00:16:40,460 --> 00:16:41,490
referring to the first course

483
00:16:41,750 --> 00:16:44,260
through the ISBN number, but let's

484
00:16:44,470 --> 00:16:46,980
change this string instead to refer to HG.

485
00:16:47,550 --> 00:16:49,160
So now we're actually referring

486
00:16:49,530 --> 00:16:51,230
to an author rather than another book.

487
00:16:51,870 --> 00:16:53,090
Let's check if the document validates.

488
00:16:54,230 --> 00:16:54,680
In fact it does.

489
00:16:55,440 --> 00:16:56,560
And that shows that the

490
00:16:56,640 --> 00:16:58,800
pointers when you have a DTD are untyped.

491
00:16:59,800 --> 00:17:00,860
So it does check to make

492
00:17:01,040 --> 00:17:01,830
sure that this is an

493
00:17:02,070 --> 00:17:03,560
id of another element, but we

494
00:17:03,720 --> 00:17:05,340
weren't able to specify that

495
00:17:05,500 --> 00:17:06,480
it should be a book element

496
00:17:07,190 --> 00:17:08,490
in our DTD, and since we're

497
00:17:08,630 --> 00:17:09,880
not able to specify it, of

498
00:17:10,040 --> 00:17:11,450
course it's not possible to check it.

499
00:17:11,910 --> 00:17:12,660
We will see that in XML

500
00:17:13,220 --> 00:17:14,620
schema, we can have typed

501
00:17:14,860 --> 00:17:16,740
pointers but it's not possible to have them in DTDs.

502
00:17:17,960 --> 00:17:19,070
The last change I'm going to

503
00:17:19,160 --> 00:17:20,260
show is to add a

504
00:17:20,660 --> 00:17:22,100
second book reference within our remark.

505
00:17:22,810 --> 00:17:23,950
So as I pointed out over

506
00:17:24,170 --> 00:17:25,550
here, when we write PC data

507
00:17:26,340 --> 00:17:27,490
or in an element type

508
00:17:28,140 --> 00:17:29,500
followed by the [xx] closure, the

509
00:17:29,610 --> 00:17:31,180
zero or more star, that

510
00:17:31,350 --> 00:17:33,690
means we can freely mix text and sub-elements.

511
00:17:34,310 --> 00:17:36,470
So just right in the middle here, let's put a book reference.

512
00:17:39,710 --> 00:17:41,220
and we can put, let's say

513
00:17:41,410 --> 00:17:45,450
book equals JU, and that

514
00:17:45,670 --> 00:17:46,430
will be the end of our reference

515
00:17:46,920 --> 00:17:48,360
there and now we

516
00:17:48,620 --> 00:17:49,790
see that we have text followed

517
00:17:50,270 --> 00:17:51,430
by a subelement followed by more

518
00:17:51,670 --> 00:17:53,170
text then so on.

519
00:17:53,310 --> 00:17:55,590
That should validate fine, and in fact it does.

520
00:17:56,650 --> 00:17:58,320
That completes our demonstration of

521
00:17:58,810 --> 00:18:00,120
XML documents with DTDs.