1 00:00:00,390 --> 00:00:02,410 In the previous video, we learned the basics of XML. 2 00:00:02,650 --> 00:00:04,240 In this video, we're 3 00:00:04,360 --> 00:00:05,670 going to learn about Document Type Descriptors, 4 00:00:06,300 --> 00:00:09,680 also known as DTDs, and also ID and ID ref attributes. 5 00:00:11,230 --> 00:00:12,690 We learned that well-formed XML 6 00:00:13,280 --> 00:00:14,630 is XML that adheres to 7 00:00:14,720 --> 00:00:16,520 basic structural requirements: a single 8 00:00:16,770 --> 00:00:18,600 root element, matched tags with 9 00:00:18,730 --> 00:00:20,410 proper nesting, and unique 10 00:00:20,820 --> 00:00:21,980 attributes within each element. 11 00:00:23,400 --> 00:00:25,620 Now we're going to learn about what's known as valid XML. 12 00:00:26,480 --> 00:00:27,700 Valid XML has to adhere 13 00:00:27,960 --> 00:00:29,440 to the same basic structural requirements 14 00:00:30,190 --> 00:00:31,890 as well-formed XML, but it 15 00:00:32,000 --> 00:00:34,140 also adheres to content specific specifications. 16 00:00:35,260 --> 00:00:37,350 And we're going to learn two languages for those specifications. 17 00:00:38,540 --> 00:00:39,770 One of them is Document Type 18 00:00:39,990 --> 00:00:41,750 Descriptors or DTDs, and the 19 00:00:41,920 --> 00:00:43,970 other, a more powerful language, is XML schema. 20 00:00:44,840 --> 00:00:45,650 Specifications in XML 21 00:00:46,360 --> 00:00:48,910 schema are known as XSDs, for XML Schema Descriptions. 22 00:00:50,750 --> 00:00:51,860 So as a reminder, here's how 23 00:00:52,030 --> 00:00:53,790 things worked with well-formed XML documents. 24 00:00:54,390 --> 00:00:55,490 We sent the document to a 25 00:00:55,730 --> 00:00:56,780 parser and the parser would 26 00:00:57,140 --> 00:00:57,980 either return that the document 27 00:00:58,380 --> 00:01:00,920 was not well-formed or it would return parsed XML. 28 00:01:02,010 --> 00:01:03,740 Now let's consider what happens with valid XML. 29 00:01:03,990 --> 00:01:05,220 Now we use a validating 30 00:01:05,920 --> 00:01:06,960 XML parser, and we have 31 00:01:07,110 --> 00:01:08,270 an additional input to the 32 00:01:08,320 --> 00:01:09,540 process, which is a 33 00:01:10,050 --> 00:01:12,230 specification, either a DTD or an XSD. 34 00:01:12,960 --> 00:01:15,210 So that's also fed to the parser, along with the document. 35 00:01:15,490 --> 00:01:16,730 The parser can again 36 00:01:17,070 --> 00:01:18,400 say the document is 37 00:01:18,520 --> 00:01:20,970 not well formed if it doesn't meet the basic structural requirements. 38 00:01:22,060 --> 00:01:23,030 It could also say that the 39 00:01:23,190 --> 00:01:24,470 document is not valid, meaning 40 00:01:24,750 --> 00:01:26,040 the structure of the document doesn't 41 00:01:26,390 --> 00:01:27,790 match the content specific specification. 42 00:01:28,600 --> 00:01:30,230 If everything is good, then 43 00:01:30,330 --> 00:01:32,150 once again "parsed XML" is returned. 44 00:01:33,250 --> 00:01:35,540 Now let's talk about the document-type descriptors, or DTDs. 45 00:01:36,480 --> 00:01:37,220 We see a DTD in 46 00:01:37,410 --> 00:01:38,390 the lower-left corner of the 47 00:01:38,460 --> 00:01:39,290 video, but we won't look 48 00:01:39,570 --> 00:01:40,810 at it in any detail, because we'll 49 00:01:40,910 --> 00:01:43,110 be doing demos of DTDs a little later on. 50 00:01:44,080 --> 00:01:45,080 A DTD is a language 51 00:01:45,400 --> 00:01:47,730 that's kind of like a grammar, and 52 00:01:47,780 --> 00:01:49,220 what you can specify in that language is for 53 00:01:49,400 --> 00:01:50,590 a particular document what elements 54 00:01:51,250 --> 00:01:52,470 you want that document to contain, 55 00:01:52,860 --> 00:01:53,740 the tags of the elements, 56 00:01:54,580 --> 00:01:55,680 what attributes can be in 57 00:01:55,800 --> 00:01:58,620 the elements, how the different types of elements can be nested. 58 00:01:59,600 --> 00:02:00,740 Sometimes the ordering of the 59 00:02:00,800 --> 00:02:01,860 elements might want to be 60 00:02:01,940 --> 00:02:04,760 specified, and sometimes the number of occurrences of different elements. 61 00:02:06,170 --> 00:02:07,410 DTDs also allow the 62 00:02:07,780 --> 00:02:08,920 introduction of special types of 63 00:02:09,000 --> 00:02:10,650 attributes, called id and idrefs. 64 00:02:11,910 --> 00:02:13,040 And, effectively, what these allow you 65 00:02:13,190 --> 00:02:14,790 to do is specify pointers within 66 00:02:15,070 --> 00:02:17,270 a document, although these pointers are untyped. 67 00:02:19,030 --> 00:02:19,910 Before moving to the demo, 68 00:02:20,390 --> 00:02:21,240 let's talk a little bit about 69 00:02:21,450 --> 00:02:22,760 the positives and negatives about 70 00:02:22,980 --> 00:02:23,730 choosing to use a DTD 71 00:02:24,350 --> 00:02:26,120 or and XSD for one's XML data. 72 00:02:26,260 --> 00:02:27,440 After all, if you're 73 00:02:27,550 --> 00:02:28,870 building an application that encodes 74 00:02:29,220 --> 00:02:30,340 its data in XML, you'll have 75 00:02:30,520 --> 00:02:31,840 to decide whether you want the 76 00:02:32,020 --> 00:02:33,180 XML to just be well formed 77 00:02:33,690 --> 00:02:34,670 or whether you want to 78 00:02:34,940 --> 00:02:36,610 have specifications and require the 79 00:02:37,000 --> 00:02:38,810 XML to be valid to satisfy those specifications. 80 00:02:40,350 --> 00:02:41,080 So, let's put a few positives 81 00:02:41,810 --> 00:02:44,340 of choosing a later of requiring a DTD or an XSD. 82 00:02:44,430 --> 00:02:46,470 First of all, one of 83 00:02:46,580 --> 00:02:47,540 them is that when you write your 84 00:02:47,650 --> 00:02:49,290 program, you can assume 85 00:02:49,490 --> 00:02:51,910 that the data adheres to a specific structure. 86 00:02:52,560 --> 00:02:54,020 So programs can assume a 87 00:02:54,480 --> 00:02:56,400 structure and so the 88 00:02:56,520 --> 00:02:57,300 programs themselves are simpler because they don't 89 00:02:57,640 --> 00:03:00,200 have to be doing a lot of error checking on the data. 90 00:03:00,690 --> 00:03:01,710 They'll know that before the data 91 00:03:01,950 --> 00:03:03,480 reaches the program, it's been 92 00:03:03,620 --> 00:03:06,530 run through a validator and it does satisfy a particular structure. 93 00:03:07,250 --> 00:03:08,740 Second of all, we talked 94 00:03:08,840 --> 00:03:10,460 at some time ago about 95 00:03:10,980 --> 00:03:12,440 the cascading style sheet language 96 00:03:13,130 --> 00:03:15,050 and the extensible style sheet languages. 97 00:03:15,920 --> 00:03:17,440 These are languages that take XML 98 00:03:17,880 --> 00:03:18,970 and they run rules on it 99 00:03:19,080 --> 00:03:21,010 to process it into a different form, often HTML. 100 00:03:22,290 --> 00:03:23,990 When you write those rules, if 101 00:03:24,170 --> 00:03:25,030 you note that the data 102 00:03:25,190 --> 00:03:26,600 has a certain structure, then those 103 00:03:26,790 --> 00:03:28,110 rules can be simpler, so like 104 00:03:28,440 --> 00:03:29,940 the programs they also can 105 00:03:30,170 --> 00:03:32,210 assume particular structure and it makes them simpler. 106 00:03:33,470 --> 00:03:34,440 Now, another use for DTDs 107 00:03:35,170 --> 00:03:36,430 or XSDs is as a 108 00:03:36,810 --> 00:03:38,410 specification language for conveying 109 00:03:39,140 --> 00:03:41,070 what XML might need to look like. 110 00:03:41,610 --> 00:03:43,220 So, as an example if you're 111 00:03:43,810 --> 00:03:45,300 performing data exchange using 112 00:03:45,590 --> 00:03:46,970 XML, maybe a company is 113 00:03:47,110 --> 00:03:48,630 going to receive purchase orders in 114 00:03:48,970 --> 00:03:50,120 XML, the company can 115 00:03:50,240 --> 00:03:51,320 actually use the DTD as 116 00:03:51,420 --> 00:03:52,950 a specification for what 117 00:03:53,150 --> 00:03:54,330 the XML needs to look 118 00:03:54,590 --> 00:03:56,910 like when it arrives at 119 00:03:56,990 --> 00:03:59,340 the program it's going to operate on it. 120 00:03:59,580 --> 00:04:01,050 Also documentation, it can 121 00:04:01,220 --> 00:04:02,370 be useful to use one of 122 00:04:02,430 --> 00:04:03,680 the specifications to just document 123 00:04:04,180 --> 00:04:05,600 what the data itself looks like. 124 00:04:06,420 --> 00:04:07,950 In general, really what 125 00:04:08,080 --> 00:04:10,070 we have here is the benefits of typing. 126 00:04:11,130 --> 00:04:12,930 We're talking about strongly typed data 127 00:04:13,450 --> 00:04:16,000 versus loosely-typed data, if you want to think of it that way. 128 00:04:17,940 --> 00:04:20,480 Now let's look at when we might prefer not to use a DTD. 129 00:04:21,030 --> 00:04:22,470 So what I'm going describe down 130 00:04:22,650 --> 00:04:24,980 here is the benefits of not using a DTD. 131 00:04:25,460 --> 00:04:26,640 So the biggest benefit is flexibility. 132 00:04:27,840 --> 00:04:29,550 So a DTD makes your 133 00:04:30,110 --> 00:04:31,640 XML data have to conform to a specification. 134 00:04:33,150 --> 00:04:34,830 If you want more flexibility or 135 00:04:34,930 --> 00:04:36,020 you want ease of change 136 00:04:36,780 --> 00:04:37,470 in the way that the data is 137 00:04:37,750 --> 00:04:39,000 formatted without running into 138 00:04:39,140 --> 00:04:40,520 a lot of errors, then, if 139 00:04:40,590 --> 00:04:41,950 that's what you want, 140 00:04:42,180 --> 00:04:43,590 then the DTD can be constraining. 141 00:04:45,400 --> 00:04:46,660 Another fact is that DTDs can 142 00:04:46,820 --> 00:04:48,020 be fairly messy and this 143 00:04:48,080 --> 00:04:48,740 is not going to be obvious 144 00:04:49,140 --> 00:04:50,060 to you yet until we get 145 00:04:50,240 --> 00:04:52,160 into the demo, but if 146 00:04:52,990 --> 00:04:54,770 the data is irregular, very irregular, then 147 00:04:55,480 --> 00:04:56,890 specifying its structure can 148 00:04:57,090 --> 00:04:59,220 be hard, especially for irregular documents. 149 00:05:00,510 --> 00:05:01,920 Actually, when we see 150 00:05:02,660 --> 00:05:04,850 the schema language, we'll 151 00:05:04,990 --> 00:05:06,140 discover that XSDs can be, 152 00:05:06,810 --> 00:05:09,940 I would say, really messy, so they can actually get very large. 153 00:05:10,660 --> 00:05:11,730 It's possible to have a 154 00:05:11,770 --> 00:05:13,630 document where the specification of 155 00:05:13,700 --> 00:05:14,830 the structure of the document is 156 00:05:14,960 --> 00:05:16,270 much, much larger than the 157 00:05:16,330 --> 00:05:17,970 document itself, which seems not 158 00:05:18,160 --> 00:05:19,290 entirely intuitive, but when we get to 159 00:05:19,390 --> 00:05:21,230 learn about XSDs, I think you'll see how that can happen. 160 00:05:22,070 --> 00:05:23,480 So, overall, this is 161 00:05:23,780 --> 00:05:25,660 the benefits of nil typing. 162 00:05:26,200 --> 00:05:28,050 It' s really quite similar to 163 00:05:28,380 --> 00:05:30,040 the analogy in programming languages. 164 00:05:31,780 --> 00:05:32,870 The remainder of this video will 165 00:05:33,020 --> 00:05:35,170 teach about the DTDs themselves through a set of examples. 166 00:05:35,940 --> 00:05:36,810 We'll have a separate video 167 00:05:36,830 --> 00:05:39,320 for learning about XML schema and XSDs. 168 00:05:39,440 --> 00:05:41,540 So, here we are 169 00:05:41,660 --> 00:05:42,950 with our first document that we're 170 00:05:43,330 --> 00:05:44,730 going to look at with a document type descriptor. 171 00:05:45,790 --> 00:05:47,100 We have on the left the document itself. 172 00:05:47,610 --> 00:05:48,470 We have on the right the document-type 173 00:05:49,170 --> 00:05:50,190 descriptor, and then we have 174 00:05:50,330 --> 00:05:51,470 in the lower right a command 175 00:05:51,960 --> 00:05:54,000 line shell that we're going to use to validate the document. 176 00:05:55,150 --> 00:05:56,200 So this is similar data to 177 00:05:56,280 --> 00:05:57,260 what we saw on the last video, 178 00:05:57,490 --> 00:05:59,050 but let's go through it just to see what we have. 179 00:05:59,500 --> 00:06:00,960 We have an outermost element called 180 00:06:01,220 --> 00:06:03,760 bookstore, and we have two books in our bookstore. 181 00:06:04,830 --> 00:06:07,220 The first book has an ISBN number, price and editions. 182 00:06:08,270 --> 00:06:09,570 As attributes and then it 183 00:06:09,650 --> 00:06:11,620 has a sub-element called title, another 184 00:06:12,010 --> 00:06:13,440 sub-element called authors with two 185 00:06:13,620 --> 00:06:15,560 authors underneath; first names and last names. 186 00:06:16,310 --> 00:06:17,940 The second book element is 187 00:06:18,050 --> 00:06:19,530 similar, except it doesn't have a edition. 188 00:06:20,670 --> 00:06:22,230 It also has, as we see, a remark. 189 00:06:23,320 --> 00:06:24,600 Now let's take a look at 190 00:06:24,840 --> 00:06:25,580 the DTD and I'm just going 191 00:06:25,620 --> 00:06:27,490 to walk through DTD, not 192 00:06:27,810 --> 00:06:29,080 too slowly, not too fast, and 193 00:06:29,190 --> 00:06:30,600 explain exactly what it's doing. 194 00:06:30,790 --> 00:06:31,510 So the start of the 195 00:06:31,960 --> 00:06:33,200 DTD says this a 196 00:06:33,270 --> 00:06:35,040 DTD named bookstore and the 197 00:06:35,170 --> 00:06:36,390 root element is called bookstore, 198 00:06:37,070 --> 00:06:39,550 and now we have the first grammar-like construct. 199 00:06:40,800 --> 00:06:42,070 So these constructs, in fact, are 200 00:06:42,160 --> 00:06:43,900 a little bit like regular expressions if you know them. 201 00:06:44,530 --> 00:06:45,340 What this says is that 202 00:06:45,490 --> 00:06:47,120 a bookstore element has as 203 00:06:47,250 --> 00:06:48,600 its sub-element any number 204 00:06:49,110 --> 00:06:50,550 of elements that are called book or magazine. 205 00:06:51,280 --> 00:06:52,960 We have book or magazine. 206 00:06:53,660 --> 00:06:55,330 We don't have any magazines yet but we'll add one. 207 00:06:55,590 --> 00:06:58,150 And then this star says, zero or more instances. 208 00:06:58,690 --> 00:07:01,610 It's the clean and close operator for those of you familiar with regular expression. 209 00:07:02,150 --> 00:07:04,110 Now let's talk about 210 00:07:04,340 --> 00:07:06,410 what the book element has, so that's our next specification. 211 00:07:07,910 --> 00:07:09,240 The book element has a 212 00:07:09,390 --> 00:07:11,050 title followed by authors, 213 00:07:11,890 --> 00:07:12,940 followed by an optional remark. 214 00:07:13,730 --> 00:07:14,470 So now we don't have an 215 00:07:14,520 --> 00:07:15,450 "or", we have a comma, and 216 00:07:15,700 --> 00:07:16,690 that says that these are going to 217 00:07:16,770 --> 00:07:17,700 be in that order - title, 218 00:07:17,990 --> 00:07:18,970 authors, and remark and the 219 00:07:19,310 --> 00:07:20,880 question mark says that the remark is optional. 220 00:07:22,210 --> 00:07:24,260 Next we have the attributes of our book elements. 221 00:07:24,740 --> 00:07:26,050 So this bang attribute list 222 00:07:26,430 --> 00:07:27,200 says we're going to describe 223 00:07:27,640 --> 00:07:28,780 the attributes and we're going 224 00:07:28,850 --> 00:07:30,170 to have three of them: the ISBN, 225 00:07:31,380 --> 00:07:32,170 the price, and the edition. 226 00:07:33,070 --> 00:07:34,570 C data is the type of the attribute. 227 00:07:35,160 --> 00:07:35,620 It's just a string. 228 00:07:36,240 --> 00:07:37,500 And then required says that 229 00:07:37,720 --> 00:07:39,030 the attribute must be present, whereas 230 00:07:39,280 --> 00:07:40,840 implied says it doesn't have to be present. 231 00:07:41,420 --> 00:07:43,980 As you may remember, we have one book that doesn't have an edition. 232 00:07:45,230 --> 00:07:46,350 Our magazines are simply going 233 00:07:46,560 --> 00:07:47,460 to have titles and they're going 234 00:07:47,660 --> 00:07:49,410 to have attributes that are month and year. 235 00:07:49,890 --> 00:07:51,140 Again, we don't have any magazines yet. 236 00:07:51,950 --> 00:07:53,360 A title is going to 237 00:07:53,740 --> 00:07:55,270 consist of string data. 238 00:07:55,580 --> 00:07:58,000 So here we see our title of first course and database system. 239 00:07:58,250 --> 00:08:00,980 You can think of that as the leaf data in the XML tree. 240 00:08:02,010 --> 00:08:03,420 And when you have a leaf that 241 00:08:03,680 --> 00:08:05,020 consists of text data, this is 242 00:08:05,360 --> 00:08:06,220 what you put in the DTD 243 00:08:06,700 --> 00:08:07,730 - just take my word for it: 244 00:08:08,090 --> 00:08:09,370 hash PC data in parentheses. 245 00:08:10,780 --> 00:08:13,910 Now our authors are an element that still has structure . 246 00:08:14,310 --> 00:08:15,910 Our authors have a sub-element, 247 00:08:16,680 --> 00:08:17,820 author sub-elements or elements, 248 00:08:18,160 --> 00:08:19,530 and we're going to 249 00:08:19,640 --> 00:08:20,880 specify here that the 250 00:08:21,200 --> 00:08:22,490 author's element must have one 251 00:08:23,070 --> 00:08:24,280 or more author subelements. 252 00:08:25,230 --> 00:08:26,110 So that's what the plus 253 00:08:26,580 --> 00:08:28,990 is saying here, again taken from regular expressions. 254 00:08:29,540 --> 00:08:30,780 "Plus" means one or more instances. 255 00:08:32,160 --> 00:08:33,370 We have the remark, which 256 00:08:33,530 --> 00:08:35,470 is just going to be pc data or string data. 257 00:08:36,370 --> 00:08:37,590 We have our authors which consist 258 00:08:38,040 --> 00:08:39,990 of a first name sub-element and 259 00:08:40,200 --> 00:08:42,260 a last-name sub-element, and in that order. 260 00:08:42,860 --> 00:08:45,280 And then finally, our first names and last names are also strengths. 261 00:08:46,180 --> 00:08:47,380 So, this is the entire 262 00:08:47,670 --> 00:08:48,650 DTD and it describes 263 00:08:49,500 --> 00:08:50,340 in detail the structure 264 00:08:51,640 --> 00:08:52,190 of our document. 265 00:08:53,260 --> 00:08:54,410 Now we have a command, we're 266 00:08:54,530 --> 00:08:55,960 using something called xmllint, 267 00:08:57,020 --> 00:08:59,420 that will check to see if the document meets the structure. 268 00:09:00,900 --> 00:09:02,050 We'll just run that command 269 00:09:02,210 --> 00:09:03,770 here with a couple of options, and 270 00:09:03,870 --> 00:09:04,830 it doesn't give us any output 271 00:09:05,150 --> 00:09:06,880 which actually means that our document is correct. 272 00:09:09,490 --> 00:09:12,010 Well be making some edits and seeing when our document is not correct what happens when we run the command. 273 00:09:13,140 --> 00:09:14,100 So let's make our first edit, 274 00:09:14,780 --> 00:09:16,050 let's say that we decide that 275 00:09:16,140 --> 00:09:17,330 we want the additional attribute 276 00:09:17,710 --> 00:09:20,150 of our books to be "required" rather than "applied". 277 00:09:21,330 --> 00:09:22,840 So we'll change the DTD. 278 00:09:23,090 --> 00:09:26,170 We'll save the file and now when we run our command. 279 00:09:27,700 --> 00:09:28,780 So as expected we got an 280 00:09:28,870 --> 00:09:29,990 error, and the error said 281 00:09:30,310 --> 00:09:32,520 that one of our book elements does not have attribute addition. 282 00:09:33,310 --> 00:09:36,390 Now that addition is required, every book element ought to have it. 283 00:09:36,730 --> 00:09:39,160 So let's add an addition to our second book. 284 00:09:39,380 --> 00:09:41,150 Let 's say that it's 285 00:09:41,280 --> 00:09:42,900 the second edition, save the 286 00:09:43,030 --> 00:09:44,640 file, we'll validate our 287 00:09:44,790 --> 00:09:47,040 document again, and now everything is good. Let's 288 00:09:48,350 --> 00:09:49,210 do an edit to the document 289 00:09:49,760 --> 00:09:51,030 this time to see what 290 00:09:51,180 --> 00:09:52,060 happens when we change the 291 00:09:52,130 --> 00:09:53,800 order of the first name and the last name. 292 00:09:54,860 --> 00:09:57,380 So we've swapped Jeffrey Ullman to be Ullman Jeffery. 293 00:09:58,680 --> 00:10:00,370 We validate our document, and now 294 00:10:00,700 --> 00:10:01,630 we see we got an error 295 00:10:02,050 --> 00:10:03,830 because the elements are not in the correct order. 296 00:10:04,700 --> 00:10:06,070 In this case, let's undo that 297 00:10:06,460 --> 00:10:07,720 change, rather than change our DTD. 298 00:10:09,290 --> 00:10:10,500 Let's try another edit to our document. 299 00:10:11,280 --> 00:10:12,960 Let's add a remark to our first book. 300 00:10:13,350 --> 00:10:14,430 But what we'll do is 301 00:10:14,640 --> 00:10:16,080 we'll leave the remark empty, so 302 00:10:16,380 --> 00:10:17,840 we'll add a opening and then 303 00:10:18,050 --> 00:10:21,960 directly a closing tag, and let's see if that validates. 304 00:10:24,210 --> 00:10:24,760 So, it did validate. 305 00:10:25,210 --> 00:10:26,370 And in fact when we have 306 00:10:26,680 --> 00:10:27,610 PC data as the type 307 00:10:27,870 --> 00:10:30,470 of an element it's perfectly acceptable to have a empty element. 308 00:10:32,390 --> 00:10:34,180 As a final change, let's add a magazine to our database. 309 00:10:34,860 --> 00:10:37,010 You'll have to bear with me as I type. 310 00:10:37,460 --> 00:10:38,230 I'm always a little bit slow. 311 00:10:39,080 --> 00:10:40,310 So we see over here that 312 00:10:40,430 --> 00:10:41,480 when we have a magazine there are 313 00:10:41,560 --> 00:10:44,160 two required attributes, the month and the year. 314 00:10:44,520 --> 00:10:45,670 So, let's say the month is 315 00:10:45,910 --> 00:10:47,510 January and the year, 316 00:10:48,100 --> 00:10:50,210 let's make that 2011, 317 00:10:50,960 --> 00:10:53,280 and then we have a title for our magazine. 318 00:10:53,940 --> 00:10:53,940 Here. 319 00:10:54,170 --> 00:10:54,770 We'll go down here. 320 00:10:55,730 --> 00:10:58,110 Our title, let's make it National Geographic. 321 00:11:00,520 --> 00:11:01,960 We'll close the tag, title tag. 322 00:11:03,660 --> 00:11:05,240 And then, sorry again about my typing. 323 00:11:05,610 --> 00:11:06,920 Let's go ahead and validate the document. 324 00:11:08,390 --> 00:11:10,790 we saw premature end of something or other. 325 00:11:11,810 --> 00:11:13,150 We forgot our closing tag for 326 00:11:13,220 --> 00:11:16,370 magazine, let's put that in. 327 00:11:17,720 --> 00:11:19,570 My terrible typing, and here we go. 328 00:11:19,900 --> 00:11:21,410 Let's validate, and we're done. 329 00:11:23,040 --> 00:11:25,390 Now we're gonna learn about and id rep attributes. 330 00:11:26,770 --> 00:11:27,660 The document on the left side 331 00:11:28,310 --> 00:11:29,420 contains the same data as 332 00:11:29,560 --> 00:11:31,370 our previous document but completely restructured. 333 00:11:32,410 --> 00:11:33,780 Instead of having authors as 334 00:11:33,990 --> 00:11:35,050 subelements of book elements, 335 00:11:35,640 --> 00:11:37,020 we're going to have our authors listed separately, 336 00:11:37,590 --> 00:11:40,650 and then effectively point from the books to the authors of the book. 337 00:11:41,550 --> 00:11:42,320 We'll take a look at the 338 00:11:42,400 --> 00:11:43,600 data first, and then 339 00:11:43,830 --> 00:11:46,100 we'll look at the DTD that describes the data. 340 00:11:47,110 --> 00:11:48,250 Let's actually start with the 341 00:11:48,370 --> 00:11:50,990 author, so our bookstore element 342 00:11:51,430 --> 00:11:54,110 here has two subelements that are books and three that are authors. 343 00:11:55,060 --> 00:11:56,560 So, looking at the authors, we have 344 00:11:56,910 --> 00:11:57,970 the first name and last name 345 00:11:58,140 --> 00:11:59,830 as sub-elements as usual, but 346 00:11:59,950 --> 00:12:01,950 we've added what we call the ident attribute. 347 00:12:02,380 --> 00:12:03,400 That's not a keyword; we've just 348 00:12:03,590 --> 00:12:04,720 called the attribute ident, and 349 00:12:05,260 --> 00:12:06,520 then for each of the three authors, 350 00:12:07,050 --> 00:12:08,510 we've given a string value 351 00:12:08,830 --> 00:12:10,000 to that attribute that we're going 352 00:12:10,180 --> 00:12:12,090 to use effectively for the pointers in the book. 353 00:12:12,940 --> 00:12:15,130 So we have our three authors, now let's take a look at the books. 354 00:12:16,210 --> 00:12:18,110 Our book has the ISBN number and price. 355 00:12:18,420 --> 00:12:19,750 I've taken the addition out for now. 356 00:12:21,320 --> 00:12:22,560 special attribute called authors. 357 00:12:23,820 --> 00:12:25,200 Authors is an ID reps 358 00:12:25,840 --> 00:12:27,060 attribute, and it's value 359 00:12:27,690 --> 00:12:28,780 can refer to one or 360 00:12:28,980 --> 00:12:30,770 more strings that are ID attributes. 361 00:12:31,290 --> 00:12:32,220 attributes in another element. 362 00:12:32,620 --> 00:12:33,510 So that's what we're doing here. 363 00:12:33,660 --> 00:12:35,840 We're referring to the two author elements here. 364 00:12:36,770 --> 00:12:39,520 And in our second book we're referring to the three author elements. 365 00:12:40,440 --> 00:12:41,490 We still have the title subelement 366 00:12:41,700 --> 00:12:43,890 and we still have the remarks subelement. 367 00:12:44,910 --> 00:12:45,830 And furthermore, we have one 368 00:12:46,270 --> 00:12:47,750 other cute thing here, which is, 369 00:12:47,870 --> 00:12:49,640 instead of referring to 370 00:12:49,810 --> 00:12:51,080 the book by name within the 371 00:12:51,150 --> 00:12:52,310 remark when we're talking about 372 00:12:52,570 --> 00:12:55,070 the other book, we have another type of pointer. 373 00:12:56,010 --> 00:12:57,470 So we'll specify that the 374 00:12:57,620 --> 00:12:59,350 ISBN is an ID 375 00:12:59,880 --> 00:13:01,420 for books and then this 376 00:13:01,640 --> 00:13:02,930 is an id reps attribute 377 00:13:03,610 --> 00:13:06,010 that's referring to the id of the other book. 378 00:13:07,830 --> 00:13:10,770 The DTD on the right that describes the structure of this document. 379 00:13:11,630 --> 00:13:12,690 This time our bookstore is 380 00:13:12,920 --> 00:13:14,160 going to contain zero or more 381 00:13:14,310 --> 00:13:16,190 books followed by zero or more authors. 382 00:13:17,380 --> 00:13:18,570 Our books contain a title and 383 00:13:18,770 --> 00:13:20,200 an optional remark is subelements and 384 00:13:20,830 --> 00:13:22,520 now they contain three attributes, 385 00:13:22,970 --> 00:13:24,430 the IDBN which is 386 00:13:24,560 --> 00:13:26,360 now a special type of 387 00:13:26,720 --> 00:13:28,420 attribute called and ID, the 388 00:13:28,610 --> 00:13:29,980 price,which is the string 389 00:13:30,100 --> 00:13:31,090 value as usual and the 390 00:13:31,360 --> 00:13:32,480 authors which is the special type 391 00:13:32,770 --> 00:13:34,680 called id reps. Let's keep 392 00:13:34,850 --> 00:13:37,270 going, our title is just string Value as usual. 393 00:13:37,820 --> 00:13:41,040 A remark, here this is a actually interesting construct. 394 00:13:41,550 --> 00:13:43,710 A remark consist of the 395 00:13:43,810 --> 00:13:44,930 PC data which is string, 396 00:13:46,020 --> 00:13:47,320 or a book reference and then 397 00:13:47,580 --> 00:13:48,830 zero more instances of those. 398 00:13:50,090 --> 00:13:51,100 This is the type of construct 399 00:13:51,160 --> 00:13:52,150 that can be used to mix 400 00:13:52,730 --> 00:13:54,700 strings and sub elements within an element. 401 00:13:55,190 --> 00:13:56,260 So anytime you want an 402 00:13:56,350 --> 00:13:57,330 element that might have some 403 00:13:57,630 --> 00:14:00,110 strings and then another element and then more string value. 404 00:14:00,890 --> 00:14:01,390 That's how it's done. 405 00:14:01,820 --> 00:14:04,550 PC data or the element type zero or more. 406 00:14:05,970 --> 00:14:07,710 Then we have our book reference 407 00:14:08,020 --> 00:14:09,660 which is actually an empty element it's 408 00:14:09,910 --> 00:14:11,190 only interesting because is has 409 00:14:11,390 --> 00:14:12,270 an attribute so let's go 410 00:14:12,390 --> 00:14:13,200 back here we see our book 411 00:14:13,460 --> 00:14:14,530 wrap here it actually doesn't 412 00:14:14,770 --> 00:14:16,340 have any data or sub 413 00:14:16,490 --> 00:14:17,470 elements, but it has an 414 00:14:17,720 --> 00:14:20,050 attribute called book and that is an ID ref. 415 00:14:20,990 --> 00:14:22,660 That means it refers to an 416 00:14:22,740 --> 00:14:24,470 ID attribute of another, another 417 00:14:26,020 --> 00:14:26,020 element. 418 00:14:27,400 --> 00:14:28,630 Now we have our authors the first 419 00:14:28,850 --> 00:14:30,290 name and the last name and 420 00:14:30,460 --> 00:14:32,830 our author attributes have again 421 00:14:33,180 --> 00:14:34,990 an ID and we're calling it the ident. 422 00:14:35,890 --> 00:14:38,190 And finally the first name and last name are string values. 423 00:14:39,390 --> 00:14:40,850 This may seem overwhelming but the 424 00:14:40,900 --> 00:14:42,310 key points in this DTD 425 00:14:43,450 --> 00:14:43,850 are the ID the attributes. 426 00:14:44,310 --> 00:14:45,650 So the ID attributes, the ISBN 427 00:14:46,510 --> 00:14:47,900 attributes in the book, and 428 00:14:48,280 --> 00:14:50,550 the ident, wherever it 429 00:14:50,660 --> 00:14:51,580 went, ident attribute in the author 430 00:14:52,490 --> 00:14:53,810 are special attributes, and by 431 00:14:53,930 --> 00:14:54,830 the way, they do need to be 432 00:14:54,940 --> 00:14:56,240 unique values for those attributes, 433 00:14:57,210 --> 00:14:58,620 and they're special in that 434 00:14:58,750 --> 00:15:00,530 ID refs attributes can refer 435 00:15:01,000 --> 00:15:02,820 to them, and that will be checked as well. 436 00:15:03,520 --> 00:15:04,330 Now, I did want to 437 00:15:04,640 --> 00:15:05,450 point out that the book 438 00:15:05,810 --> 00:15:07,590 reference here says ID ref singular. 439 00:15:08,430 --> 00:15:09,810 When you have a singular 440 00:15:09,900 --> 00:15:10,990 ID ref then the string has 441 00:15:11,190 --> 00:15:12,750 to be exactly one ID value. 442 00:15:13,580 --> 00:15:15,010 When you have the plural ID refs. 443 00:15:15,660 --> 00:15:16,890 Then the string of the 444 00:15:17,190 --> 00:15:18,750 attribute is one or 445 00:15:19,010 --> 00:15:20,890 more ID ref value, I'm 446 00:15:21,380 --> 00:15:23,790 sorry one or more ID values separated by spaces. 447 00:15:24,390 --> 00:15:26,400 So it's a little bit clunky, but it does seem to work. 448 00:15:27,710 --> 00:15:30,400 Now let's go to our command line, and let's validate the document. 449 00:15:31,440 --> 00:15:32,740 So the document is in fact valid. 450 00:15:33,070 --> 00:15:33,950 That's what it means when we 451 00:15:34,050 --> 00:15:35,320 get nothing back, and let's 452 00:15:35,650 --> 00:15:36,680 make some changes, as we did 453 00:15:36,890 --> 00:15:38,640 before, to explore what structure 454 00:15:39,100 --> 00:15:41,590 is imposed and what's checked with this DTD in the presence. 455 00:15:42,200 --> 00:15:42,830 IDs and ID refs. 456 00:15:44,600 --> 00:15:45,860 As a first change, let's change 457 00:15:46,310 --> 00:15:47,620 this ID, this identifier 458 00:15:48,310 --> 00:15:50,520 HG to JU. 459 00:15:51,040 --> 00:15:52,030 That should actually cause a couple of problems 460 00:15:52,050 --> 00:15:53,060 when we do that let's 461 00:15:53,330 --> 00:15:55,130 validate the document and see what happens. 462 00:15:56,610 --> 00:15:58,160 And we do in fact get two different errors. 463 00:15:58,940 --> 00:16:00,320 The first error says that 464 00:16:00,580 --> 00:16:02,690 we have two instances of "JU". 465 00:16:03,070 --> 00:16:04,090 As you can see here, we 466 00:16:04,260 --> 00:16:06,030 now have JU twice where 467 00:16:06,400 --> 00:16:07,660 ID values do have to be unique. 468 00:16:08,070 --> 00:16:09,830 They have to be globally unique throughout the document. 469 00:16:10,890 --> 00:16:11,880 The second error that occurred 470 00:16:12,290 --> 00:16:14,300 when we changed HG to JU 471 00:16:14,450 --> 00:16:16,360 is we effectively have a dangling pointer. 472 00:16:17,270 --> 00:16:19,180 We refer to HG here 473 00:16:19,400 --> 00:16:21,290 in this ID refs attribute but there's 474 00:16:21,490 --> 00:16:23,720 no longer an element whose value is HG. 475 00:16:24,260 --> 00:16:25,420 So that's an error as well. 476 00:16:25,840 --> 00:16:26,980 So let's change it back to 477 00:16:27,720 --> 00:16:29,560 HG just so our document is valid again. 478 00:16:31,100 --> 00:16:33,880 Now let's make another change, let's take our book reference. 479 00:16:34,760 --> 00:16:37,550 We can see that our book reference is referring to the other book. 480 00:16:37,760 --> 00:16:38,790 We're in the complete book here 481 00:16:39,190 --> 00:16:40,340 and the comment, the remark is 482 00:16:40,460 --> 00:16:41,490 referring to the first course 483 00:16:41,750 --> 00:16:44,260 through the ISBN number, but let's 484 00:16:44,470 --> 00:16:46,980 change this string instead to refer to HG. 485 00:16:47,550 --> 00:16:49,160 So now we're actually referring 486 00:16:49,530 --> 00:16:51,230 to an author rather than another book. 487 00:16:51,870 --> 00:16:53,090 Let's check if the document validates. 488 00:16:54,230 --> 00:16:54,680 In fact it does. 489 00:16:55,440 --> 00:16:56,560 And that shows that the 490 00:16:56,640 --> 00:16:58,800 pointers when you have a DTD are untyped. 491 00:16:59,800 --> 00:17:00,860 So it does check to make 492 00:17:01,040 --> 00:17:01,830 sure that this is an 493 00:17:02,070 --> 00:17:03,560 id of another element, but we 494 00:17:03,720 --> 00:17:05,340 weren't able to specify that 495 00:17:05,500 --> 00:17:06,480 it should be a book element 496 00:17:07,190 --> 00:17:08,490 in our DTD, and since we're 497 00:17:08,630 --> 00:17:09,880 not able to specify it, of 498 00:17:10,040 --> 00:17:11,450 course it's not possible to check it. 499 00:17:11,910 --> 00:17:12,660 We will see that in XML 500 00:17:13,220 --> 00:17:14,620 schema, we can have typed 501 00:17:14,860 --> 00:17:16,740 pointers but it's not possible to have them in DTDs. 502 00:17:17,960 --> 00:17:19,070 The last change I'm going to 503 00:17:19,160 --> 00:17:20,260 show is to add a 504 00:17:20,660 --> 00:17:22,100 second book reference within our remark. 505 00:17:22,810 --> 00:17:23,950 So as I pointed out over 506 00:17:24,170 --> 00:17:25,550 here, when we write PC data 507 00:17:26,340 --> 00:17:27,490 or in an element type 508 00:17:28,140 --> 00:17:29,500 followed by the [xx] closure, the 509 00:17:29,610 --> 00:17:31,180 zero or more star, that 510 00:17:31,350 --> 00:17:33,690 means we can freely mix text and sub-elements. 511 00:17:34,310 --> 00:17:36,470 So just right in the middle here, let's put a book reference. 512 00:17:39,710 --> 00:17:41,220 and we can put, let's say 513 00:17:41,410 --> 00:17:45,450 book equals JU, and that 514 00:17:45,670 --> 00:17:46,430 will be the end of our reference 515 00:17:46,920 --> 00:17:48,360 there and now we 516 00:17:48,620 --> 00:17:49,790 see that we have text followed 517 00:17:50,270 --> 00:17:51,430 by a subelement followed by more 518 00:17:51,670 --> 00:17:53,170 text then so on. 519 00:17:53,310 --> 00:17:55,590 That should validate fine, and in fact it does. 520 00:17:56,650 --> 00:17:58,320 That completes our demonstration of 521 00:17:58,810 --> 00:18:00,120 XML documents with DTDs.