1 00:00:00,270 --> 00:00:01,540 This video introduces a very 2 00:00:01,850 --> 00:00:04,100 important type of constraint known as referential integrity. 3 00:00:05,270 --> 00:00:07,190 As a reminder, integrity constraints in 4 00:00:07,300 --> 00:00:08,900 a database restrict the allowable 5 00:00:09,500 --> 00:00:10,990 data beyond what's already restricted 6 00:00:11,170 --> 00:00:12,590 by the structure and types of the database. 7 00:00:13,820 --> 00:00:15,180 Now the term referential integrity 8 00:00:16,010 --> 00:00:17,630 refers to integrity of references 9 00:00:18,630 --> 00:00:19,210 that appear in the database. 10 00:00:19,800 --> 00:00:21,340 In a relational database, a reference 11 00:00:21,760 --> 00:00:23,260 from one couple to another occurs 12 00:00:23,710 --> 00:00:25,810 through specifying values, and integrity, 13 00:00:26,530 --> 00:00:29,080 referential integrity says that those values are valid. 14 00:00:29,370 --> 00:00:31,030 It's sort of equivalent to 15 00:00:31,160 --> 00:00:32,800 having no dangling pointers if 16 00:00:32,960 --> 00:00:35,170 we used pointers in the database; we'll see a number of examples. 17 00:00:36,350 --> 00:00:37,460 So let's look at our 18 00:00:37,860 --> 00:00:39,470 simple example database with the 19 00:00:39,530 --> 00:00:41,290 students, colleges and students 20 00:00:41,610 --> 00:00:44,150 applying to colleges, that we've been using for all our demos. 21 00:00:45,690 --> 00:00:46,740 Let's suppose for example, that we 22 00:00:46,860 --> 00:00:47,970 had a student, 123, who had 23 00:00:48,790 --> 00:00:51,030 applied to Stanford, for some 24 00:00:51,200 --> 00:00:53,210 major, let's say CS, with a decision of yes. 25 00:00:53,500 --> 00:00:55,160 When we have this 26 00:00:55,410 --> 00:00:56,940 tuple, the value in 27 00:00:57,120 --> 00:00:58,840 the SID field here of 28 00:00:59,020 --> 00:01:01,340 123 presumably refers to 29 00:01:01,550 --> 00:01:04,180 a value here of an actual student. 30 00:01:04,700 --> 00:01:06,160 So maybe 123 is student Mary 31 00:01:06,470 --> 00:01:07,830 with some GPA in some high school. 32 00:01:08,890 --> 00:01:10,450 And furthermore, the Stanford value 33 00:01:11,110 --> 00:01:12,310 in the C name or college name 34 00:01:12,680 --> 00:01:14,440 attribute is presumably referring 35 00:01:15,460 --> 00:01:17,340 to the college name in the college table. 36 00:01:17,730 --> 00:01:19,780 So we would expect to have a Stanford value here. 37 00:01:20,790 --> 00:01:22,530 Referential integrity talks about 38 00:01:22,920 --> 00:01:24,940 these values here, referencing the 39 00:01:25,150 --> 00:01:27,000 corresponding values in the other tables. 40 00:01:27,890 --> 00:01:29,480 Specifically if we have 41 00:01:29,710 --> 00:01:32,010 referential integrity from a 42 00:01:32,600 --> 00:01:33,840 attribute A say of 43 00:01:33,950 --> 00:01:35,120 a relation R to an 44 00:01:35,410 --> 00:01:36,970 attribute B of another relation S. 45 00:01:37,650 --> 00:01:38,590 What that's saying is that 46 00:01:38,830 --> 00:01:40,150 every value that appears 47 00:01:40,680 --> 00:01:41,710 in the A column of relation 48 00:01:42,150 --> 00:01:43,960 R must have some 49 00:01:44,160 --> 00:01:45,510 corresponding value in the 50 00:01:45,610 --> 00:01:48,180 B column of relation S. So 51 00:01:48,400 --> 00:01:49,330 if we take a look at our 52 00:01:49,580 --> 00:01:50,960 example, we would say 53 00:01:51,200 --> 00:01:52,100 then that we would have referential 54 00:01:52,790 --> 00:01:54,380 integrity from the SID 55 00:01:54,610 --> 00:01:57,380 column of apply, to the SID column of students. 56 00:01:58,410 --> 00:01:59,870 We would expect every value that 57 00:02:00,030 --> 00:02:01,460 appears in this column to 58 00:02:01,650 --> 00:02:02,910 also have a value in the 59 00:02:03,030 --> 00:02:04,900 other column, and similarly we 60 00:02:05,140 --> 00:02:07,210 all have referential integrity from the 61 00:02:07,340 --> 00:02:08,900 college name attribute been applied 62 00:02:09,160 --> 00:02:10,820 to the college name attribute, in college. 63 00:02:11,640 --> 00:02:12,860 Again we want every 64 00:02:13,150 --> 00:02:14,340 value that appears in this 65 00:02:14,700 --> 00:02:16,680 column, to also appear in this column. 66 00:02:18,360 --> 00:02:19,110 Now we might have a violation, 67 00:02:19,830 --> 00:02:21,390 for example, if we had 68 00:02:22,190 --> 00:02:24,670 a, say, 555 applying to 69 00:02:24,910 --> 00:02:27,890 Stanford for some major and some decision. 70 00:02:28,790 --> 00:02:30,300 If we have no 555 over 71 00:02:30,920 --> 00:02:32,340 in the student table, then this 72 00:02:32,610 --> 00:02:35,060 here would be considered a referential integrity violation. 73 00:02:36,330 --> 00:02:37,340 Similarly we might have 1 74 00:02:37,410 --> 00:02:39,050 2 3, which is valid 75 00:02:39,480 --> 00:02:40,610 because we have a 1 76 00:02:40,700 --> 00:02:42,000 2 3 student here, but if 77 00:02:42,200 --> 00:02:43,420 1 2 3 is applying to Yale 78 00:02:44,120 --> 00:02:45,370 and we don't have Yale over here, 79 00:02:46,020 --> 00:02:47,940 then again we have a referential integrity violation. 80 00:02:49,450 --> 00:02:50,360 Now let me mention that referential 81 00:02:51,160 --> 00:02:52,840 integrity is directional, so 82 00:02:53,070 --> 00:02:56,350 we talk about, this SID 83 00:02:56,650 --> 00:02:59,090 here, referencing the SID in the student table. 84 00:02:59,790 --> 00:03:01,220 We could have referential integrity 85 00:03:01,720 --> 00:03:04,230 in the other direction, but that's saying something different. 86 00:03:05,090 --> 00:03:06,290 That would be saying that every 87 00:03:06,640 --> 00:03:08,450 student must apply somewhere, 88 00:03:09,420 --> 00:03:10,220 every value of her student 89 00:03:10,690 --> 00:03:12,810 ID must appear in the apply table. 90 00:03:13,250 --> 00:03:14,750 And this particular example, we 91 00:03:14,920 --> 00:03:15,900 probably would not have that 92 00:03:16,110 --> 00:03:17,840 be the case, we would probably be 93 00:03:18,050 --> 00:03:18,970 able to have students who hadn't 94 00:03:19,290 --> 00:03:21,860 yet applied anywhere, or colleges where no one had applied yet. 95 00:03:22,500 --> 00:03:24,260 And the most sensible direction for 96 00:03:24,410 --> 00:03:26,400 this referential integrity in this 97 00:03:26,580 --> 00:03:27,580 case is from the apply 98 00:03:27,980 --> 00:03:29,280 to the student and the apply to the college. 99 00:03:29,770 --> 00:03:32,370 But again, we could have it in both directions if we so wished. 100 00:03:33,740 --> 00:03:36,150 Now just a few more details of referential integrity constraints. 101 00:03:37,100 --> 00:03:38,850 The referencing attribute, in this 102 00:03:39,020 --> 00:03:40,330 case, the referencing attribute is 103 00:03:40,550 --> 00:03:42,930 A, is often called the foreign key. 104 00:03:43,450 --> 00:03:45,150 And in fact, referential integrity is 105 00:03:45,460 --> 00:03:47,470 often referred to as foreign key constraints. 106 00:03:48,440 --> 00:03:49,540 And even in the SQL syntax, 107 00:03:50,020 --> 00:03:52,430 we'll see that the term foreign key is used. 108 00:03:52,840 --> 00:03:54,470 Second, when we have 109 00:03:54,950 --> 00:03:56,430 a referenced attribute, in 110 00:03:56,530 --> 00:03:57,430 this case now we're talking about 111 00:03:57,640 --> 00:03:58,750 attribute B, so we have 112 00:03:59,000 --> 00:04:01,010 R.A to S.B, the 113 00:04:01,420 --> 00:04:03,060 referenced attribute is usually required 114 00:04:03,590 --> 00:04:04,980 to be the primary key for 115 00:04:05,090 --> 00:04:06,830 the table, for it's table, 116 00:04:07,250 --> 00:04:08,780 or at least specified as unique. 117 00:04:09,600 --> 00:04:12,370 And that's more about efficient implementation than anything else. 118 00:04:12,870 --> 00:04:15,550 But it is a requirement in the SQL standard, and in most systems. 119 00:04:16,910 --> 00:04:18,110 Third, it is possible 120 00:04:18,450 --> 00:04:19,710 to have foreign keys that consist 121 00:04:20,090 --> 00:04:21,360 of multiple attributes, just like 122 00:04:21,630 --> 00:04:23,110 keys themselves can be multiple attributes. 123 00:04:23,940 --> 00:04:25,300 Let's say for example that in 124 00:04:25,430 --> 00:04:27,050 our college relation, the college 125 00:04:27,440 --> 00:04:28,500 name together with the state, 126 00:04:28,930 --> 00:04:30,320 form a key, not the college name individually. 127 00:04:31,530 --> 00:04:32,560 If that were the case then 128 00:04:32,710 --> 00:04:33,960 our apply table would presumably 129 00:04:34,610 --> 00:04:36,210 have one more column that 130 00:04:36,400 --> 00:04:37,490 specified the state so we 131 00:04:37,600 --> 00:04:39,590 knew which college a student was applying to. 132 00:04:39,780 --> 00:04:41,090 And in this case we 133 00:04:41,190 --> 00:04:42,560 would have these two attributes together 134 00:04:43,450 --> 00:04:45,240 as the foreign key referencing the 135 00:04:45,310 --> 00:04:46,510 college name and state together 136 00:04:48,050 --> 00:04:48,650 in the college table. 137 00:04:49,000 --> 00:04:50,560 And we'll see an example of 138 00:04:50,800 --> 00:04:53,810 multikey, multiattribute foreign key constraints in our demo. 139 00:04:56,110 --> 00:04:58,200 Now let's talk about the enforcement of referential integrity constraints. 140 00:04:59,390 --> 00:05:00,590 First, let's think about what operations 141 00:05:01,310 --> 00:05:03,100 can occur to the database to violate a constraint. 142 00:05:04,140 --> 00:05:05,340 It's not every possible modification. 143 00:05:06,310 --> 00:05:07,520 So again, let's suppose we 144 00:05:07,680 --> 00:05:10,630 have R.A referencing S.B. So, 145 00:05:11,250 --> 00:05:12,410 for example, here we have 146 00:05:12,620 --> 00:05:15,200 apply referencing students and we have apply referencing college. 147 00:05:16,300 --> 00:05:17,520 So certainly if we insert 148 00:05:18,210 --> 00:05:19,330 a tuple into the referencing 149 00:05:19,980 --> 00:05:20,830 relation, so if we inserted 150 00:05:21,390 --> 00:05:22,510 a tuple into the apply relation 151 00:05:23,310 --> 00:05:24,750 that could potentially violate the 152 00:05:24,840 --> 00:05:26,380 referential integrity if the 153 00:05:26,480 --> 00:05:27,770 value, say an SID or 154 00:05:27,860 --> 00:05:30,240 C name, didn't have matching values in the reference relations. 155 00:05:31,450 --> 00:05:32,770 Sort of conversely, if we 156 00:05:32,960 --> 00:05:34,120 delete from a reference 157 00:05:34,440 --> 00:05:35,580 relation, say we delete a 158 00:05:35,630 --> 00:05:36,780 student, then that could 159 00:05:36,860 --> 00:05:38,400 cause a violation because say 160 00:05:38,600 --> 00:05:39,330 we have the 123 value here and 161 00:05:39,440 --> 00:05:42,000 the app apply tuple 162 00:05:42,460 --> 00:05:43,620 that was referring to 123 163 00:05:43,740 --> 00:05:45,060 after this is gone would then 164 00:05:45,270 --> 00:05:46,630 have effectively a dangling pointer. 165 00:05:47,430 --> 00:05:48,820 And of course, if we 166 00:05:49,150 --> 00:05:50,960 updated the referencing value either 167 00:05:51,220 --> 00:05:52,290 of these columns that could cause 168 00:05:52,580 --> 00:05:53,650 a violation if the new 169 00:05:53,830 --> 00:05:54,920 value didn't exist in the 170 00:05:55,000 --> 00:05:56,730 reference table, or if 171 00:05:56,860 --> 00:05:58,750 we update the referenced values 172 00:05:58,910 --> 00:06:00,660 that could also cause a violation. 173 00:06:00,980 --> 00:06:03,210 In the SQL standard and 174 00:06:03,300 --> 00:06:05,120 in all implementations, if we 175 00:06:05,260 --> 00:06:06,800 have an insertion into the 176 00:06:06,940 --> 00:06:09,010 referencing table or an 177 00:06:09,230 --> 00:06:11,010 update to the referencing table that 178 00:06:11,170 --> 00:06:12,560 causes a violation to the 179 00:06:12,700 --> 00:06:13,940 integrity constraint, the referential 180 00:06:14,470 --> 00:06:16,080 integrity constraint, then an 181 00:06:16,230 --> 00:06:17,590 error is generated, and that 182 00:06:17,980 --> 00:06:19,100 modification is not allowed, just 183 00:06:19,300 --> 00:06:21,270 like the violation of other types of constraints. 184 00:06:22,690 --> 00:06:24,310 For the reference table, however, table 185 00:06:25,070 --> 00:06:27,090 S in our case table student and table college. 186 00:06:28,320 --> 00:06:29,840 If there are modifications that violate 187 00:06:30,280 --> 00:06:31,610 the constraint, if the referential 188 00:06:32,200 --> 00:06:34,120 integrity was defined initially with 189 00:06:34,400 --> 00:06:35,840 some special options then it's 190 00:06:36,060 --> 00:06:37,390 possible for the database system 191 00:06:37,810 --> 00:06:39,790 to automatically modify the 192 00:06:39,940 --> 00:06:42,240 referencing table so that the constraint is not violated. 193 00:06:42,830 --> 00:06:44,310 So let's talk about that in a little more detail. 194 00:06:45,870 --> 00:06:48,730 Let's start by talking about deletions from the referenced table. 195 00:06:49,230 --> 00:06:50,020 So let's says we have our 196 00:06:50,130 --> 00:06:51,970 student 123 here, maybe 197 00:06:52,220 --> 00:06:53,240 123 has applied a couple 198 00:06:53,630 --> 00:06:54,860 of places, and then we 199 00:06:55,110 --> 00:06:57,610 have our student 123 here in the student table. 200 00:06:58,120 --> 00:07:00,080 So right now referential integrity is good. 201 00:07:00,930 --> 00:07:01,320 Everything's okay. 202 00:07:02,030 --> 00:07:04,470 But let's suppose that we delete the tuple with 123. 203 00:07:04,610 --> 00:07:06,950 So there's actually three possible 204 00:07:07,380 --> 00:07:08,570 things that can happen, depending on 205 00:07:08,700 --> 00:07:09,480 how we set up the referential 206 00:07:09,970 --> 00:07:11,260 integrity constraint in the first place. 207 00:07:12,170 --> 00:07:13,880 So the default behavior is 208 00:07:14,370 --> 00:07:16,230 what's called restrict, so restrict 209 00:07:16,650 --> 00:07:18,020 is actually a key word, but 210 00:07:18,330 --> 00:07:19,440 that's the default, and it 211 00:07:19,620 --> 00:07:20,720 says that if we do 212 00:07:20,930 --> 00:07:22,000 a deletion to the reference 213 00:07:22,490 --> 00:07:23,740 table and a constraint 214 00:07:24,290 --> 00:07:25,640 becomes violated then we generate 215 00:07:26,100 --> 00:07:26,910 a air just like I said 216 00:07:27,120 --> 00:07:28,880 before and the modification is disallowed. 217 00:07:30,200 --> 00:07:31,800 The two other options are a little bit more interesting. 218 00:07:32,790 --> 00:07:33,940 One of them is called set null. 219 00:07:34,360 --> 00:07:36,030 And what set null says is 220 00:07:36,190 --> 00:07:37,450 if we delete a tuple 221 00:07:38,000 --> 00:07:39,790 in a reference table, then we 222 00:07:39,990 --> 00:07:41,570 don't generate an error. 223 00:07:42,310 --> 00:07:43,290 Rather, we take the referencing 224 00:07:44,090 --> 00:07:45,030 tuples, in this case these 225 00:07:45,220 --> 00:07:46,270 two tuples with 123, and 226 00:07:46,440 --> 00:07:48,270 we take their SIDs 227 00:07:48,560 --> 00:07:50,900 and we replace those SIDs with NULL. 228 00:07:51,960 --> 00:07:53,660 And this is considered acceptable from 229 00:07:53,870 --> 00:07:55,130 a referential integrity point of 230 00:07:55,250 --> 00:07:56,530 view to have nulls in 231 00:07:56,690 --> 00:07:58,810 the foreign key column, so that will occur automatically. 232 00:08:00,520 --> 00:08:02,060 The third possibility is what's 233 00:08:02,280 --> 00:08:03,580 called cascade, so let's 234 00:08:03,920 --> 00:08:05,320 set up a little more data let's 235 00:08:05,540 --> 00:08:06,950 say we have 234, who's applied 236 00:08:07,720 --> 00:08:09,500 to Stanford, and we 237 00:08:09,650 --> 00:08:11,730 have Stanford over here, and 238 00:08:11,970 --> 00:08:13,060 now let's say that we 239 00:08:13,320 --> 00:08:14,610 again delete this tuple, so 240 00:08:15,200 --> 00:08:16,120 that would leave us with a 241 00:08:16,500 --> 00:08:19,490 referential integrity constraint violation here with the Stanford value. 242 00:08:20,570 --> 00:08:22,300 So what cascade says for 243 00:08:22,510 --> 00:08:24,070 the on delete case, is 244 00:08:24,180 --> 00:08:25,450 that if we delete this tuple 245 00:08:25,970 --> 00:08:27,590 then we'll simply delete any tuple 246 00:08:27,970 --> 00:08:30,850 that has a referencing value, so this tuple will be deleted as well. 247 00:08:31,810 --> 00:08:32,950 Now the reason is called Cascade 248 00:08:33,980 --> 00:08:34,820 is because sometimes you can 249 00:08:34,960 --> 00:08:35,880 actually set up a whole 250 00:08:36,040 --> 00:08:37,610 chain of referential integrity constraints, 251 00:08:38,230 --> 00:08:40,010 so we have apply referencing college 252 00:08:40,440 --> 00:08:41,530 here but maybe we've had 253 00:08:41,730 --> 00:08:43,520 some other referencing ply and 254 00:08:43,700 --> 00:08:45,170 maybe even another table referencing that 255 00:08:45,370 --> 00:08:46,960 one, and if we 256 00:08:47,140 --> 00:08:48,530 say deleted a tuple from 257 00:08:48,770 --> 00:08:49,960 college that caused us to 258 00:08:50,140 --> 00:08:51,100 delete a tuple from a 259 00:08:51,390 --> 00:08:52,530 ply, if there was a 260 00:08:52,760 --> 00:08:53,910 tuple up here referencing that 261 00:08:54,110 --> 00:08:57,110 we might get a delete there and then a further delete, and then so on. 262 00:08:57,790 --> 00:08:59,150 Typically, cascading will only 263 00:08:59,370 --> 00:09:00,410 go one step, but we'll see 264 00:09:00,530 --> 00:09:01,560 an example in our demo, where 265 00:09:01,760 --> 00:09:04,040 we'll set it up where a cascade will go some distance. 266 00:09:05,330 --> 00:09:07,520 Now, updates have a similar three options. 267 00:09:07,980 --> 00:09:09,490 Let's erase all this stuff here. 268 00:09:10,850 --> 00:09:12,450 Let's again set up some example data. 269 00:09:12,900 --> 00:09:14,730 So let's say our student 123 270 00:09:14,910 --> 00:09:18,670 was applied to Stanford and we have 1 "23" over here. 271 00:09:20,410 --> 00:09:21,700 If we tried to update, say 272 00:09:21,940 --> 00:09:23,260 this "123" to be the value 273 00:09:23,640 --> 00:09:26,130 "456", there strict command would 274 00:09:26,260 --> 00:09:27,540 say that's not allowed because that 275 00:09:27,690 --> 00:09:28,640 would leave us with a steg 276 00:09:28,770 --> 00:09:30,340 when pointer and will generate an error. 277 00:09:31,490 --> 00:09:33,420 The set null command will 278 00:09:33,750 --> 00:09:35,140 similar to the delete if 279 00:09:35,310 --> 00:09:36,270 this is changed to four, five, 280 00:09:36,580 --> 00:09:39,500 six Set any 123 values to null. 281 00:09:39,730 --> 00:09:42,340 So in this case we change 123 to 456 in the student. 282 00:09:43,050 --> 00:09:44,520 And then we would change 283 00:09:45,090 --> 00:09:46,100 123 over here to null. 284 00:09:47,610 --> 00:09:49,300 Probably the most interesting case is 285 00:09:49,620 --> 00:09:51,500 the cascade case for the update. 286 00:09:52,250 --> 00:09:53,360 Cascade says that if we 287 00:09:53,490 --> 00:09:55,730 update a reference value then 288 00:09:55,900 --> 00:09:58,310 we'll make the same update to the referencing value. 289 00:09:58,670 --> 00:09:59,850 So, let's say we have 290 00:10:00,010 --> 00:10:01,100 Stanford over here in fact 291 00:10:01,340 --> 00:10:02,140 we have to if we have 292 00:10:02,380 --> 00:10:04,210 the value the referential integrity 293 00:10:04,640 --> 00:10:06,240 constraint being correct, and now 294 00:10:06,620 --> 00:10:07,730 let's say that somebody comes 295 00:10:08,010 --> 00:10:10,000 along and says, "I think Stanford is spelled wrong. 296 00:10:10,310 --> 00:10:12,290 It's actually Stanford, well, that's 297 00:10:12,600 --> 00:10:13,850 actually a common mispelling for Stanford. 298 00:10:14,410 --> 00:10:15,040 The first one was correct. 299 00:10:15,780 --> 00:10:16,890 But if someone makes this change, 300 00:10:18,040 --> 00:10:19,400 if we have the cascade option 301 00:10:20,210 --> 00:10:21,990 for the referential integrity constraints between 302 00:10:22,290 --> 00:10:23,250 apply C name and college 303 00:10:23,530 --> 00:10:24,910 C name then that update 304 00:10:25,190 --> 00:10:27,050 will be propagated to any referencing values. 305 00:10:27,880 --> 00:10:29,580 So in that case automatically this 306 00:10:29,780 --> 00:10:30,940 Stanford and any other 307 00:10:31,350 --> 00:10:32,350 Stanfords in the apply 308 00:10:32,710 --> 00:10:34,860 table will be updated automatically to Stanford. 309 00:10:36,680 --> 00:10:38,940 Now let's take a look at referential [xx] integrity in action. 310 00:10:40,640 --> 00:10:41,580 Let's create our three tables. 311 00:10:42,410 --> 00:10:43,830 We create the college table with 312 00:10:43,990 --> 00:10:45,220 the college name as primary key 313 00:10:45,390 --> 00:10:47,870 and the student table with the student ID as primary key. 314 00:10:48,690 --> 00:10:49,670 That allows us to have referential 315 00:10:50,250 --> 00:10:51,440 integrity constraints in the 316 00:10:51,670 --> 00:10:53,560 apply table that reference those two attributes. 317 00:10:54,710 --> 00:10:55,870 When we create the table apply, 318 00:10:56,730 --> 00:10:57,780 now we're going to specify 319 00:10:57,990 --> 00:10:59,060 the attributes, and thhe key 320 00:10:59,360 --> 00:11:01,000 word "references" says that 321 00:11:01,120 --> 00:11:02,480 we're setting up a referential integrity 322 00:11:02,930 --> 00:11:04,410 constraint from attribute "Student 323 00:11:04,890 --> 00:11:07,610 ID" to the "Student ID" attribute of the student table. 324 00:11:08,570 --> 00:11:10,250 And similarly, we extend the 325 00:11:10,310 --> 00:11:11,290 declaration of the "College 326 00:11:11,760 --> 00:11:13,230 Name" attribute to have a 327 00:11:13,550 --> 00:11:15,900 referential integrity constraint to 328 00:11:16,010 --> 00:11:17,860 the "College Name" attribute of the college table. 329 00:11:18,660 --> 00:11:20,120 We'll go ahead and create those tables. 330 00:11:21,380 --> 00:11:26,210 Now, let's try to populate our tables. 331 00:11:27,330 --> 00:11:30,130 If we make a mistake, we try to put our apply values in first, but there's no data in the student table or the college table. 332 00:11:30,570 --> 00:11:31,670 So, when we try to insert, 333 00:11:31,980 --> 00:11:33,760 for example, student123 applying to 334 00:11:33,860 --> 00:11:35,080 Stanford, we'll get a 335 00:11:35,420 --> 00:11:37,040 referential integrity violation because there's 336 00:11:37,280 --> 00:11:38,940 no student123, and there's no 337 00:11:39,100 --> 00:11:41,100 college Standford, and similarly for 338 00:11:41,280 --> 00:11:42,610 student 234 applying to Berkley. 339 00:11:43,570 --> 00:11:44,690 So, we see the errors and 340 00:11:44,940 --> 00:11:46,070 what we need to do is 341 00:11:46,580 --> 00:11:48,020 first insert the tuples for 342 00:11:48,210 --> 00:11:49,870 the students and the colleges and 343 00:11:49,990 --> 00:11:51,400 then insert the applied tuples afterwards. 344 00:11:53,400 --> 00:11:54,440 So let's insert the two 345 00:11:54,700 --> 00:11:55,390 students, 1, 2, 3 and 346 00:11:55,460 --> 00:11:56,500 two, three, four, and the 347 00:11:56,790 --> 00:11:58,050 two colleges Stanford and Berkeley, 348 00:11:58,980 --> 00:11:59,960 no problem doing that. 349 00:12:00,380 --> 00:12:01,790 And now we'll go ahead, 350 00:12:02,240 --> 00:12:03,530 and again, insert the apply 351 00:12:03,980 --> 00:12:05,460 tuples, and this time everything 352 00:12:05,850 --> 00:12:08,090 should work just fine, and it does. 353 00:12:09,700 --> 00:12:11,430 In addition to inserts into 354 00:12:11,780 --> 00:12:14,090 the referencing table, we also have to worry about updates. 355 00:12:14,970 --> 00:12:17,920 So, as a reminder let's take a look at the students that we have. 356 00:12:18,280 --> 00:12:19,620 We have students 123 and 357 00:12:19,760 --> 00:12:21,810 234 and in the 358 00:12:22,100 --> 00:12:23,720 apply we have students 123 359 00:12:23,920 --> 00:12:26,230 and 234 each applying to one college. 360 00:12:27,360 --> 00:12:28,330 Now we're going to update 361 00:12:28,540 --> 00:12:29,540 our applied tuple that has 362 00:12:29,760 --> 00:12:31,110 student ID 123. 363 00:12:31,490 --> 00:12:32,890 Our first update tries to set 364 00:12:33,080 --> 00:12:33,810 the student ID to 345, but 365 00:12:33,950 --> 00:12:36,220 we'll get a referential integrity 366 00:12:36,670 --> 00:12:38,770 violation because there's no student with ID 345. 367 00:12:38,960 --> 00:12:41,520 Our second update will be more successful. 368 00:12:42,950 --> 00:12:45,420 It will update 123's application to have student ID 234. 369 00:12:45,740 --> 00:12:47,870 We'll go ahead and execute the update. 370 00:12:48,240 --> 00:12:49,790 And we see that the second one did succeed. 371 00:12:51,470 --> 00:12:52,370 So far, we've looked at modifications 372 00:12:53,150 --> 00:12:54,980 to the referencing table but we 373 00:12:55,160 --> 00:12:57,800 also have to worry about modifications for the referenced tables. 374 00:12:58,690 --> 00:13:00,880 In our case that's the student table and the college table. 375 00:13:01,700 --> 00:13:03,220 For example, let's suppose we 376 00:13:03,390 --> 00:13:04,580 try to delete from the 377 00:13:04,660 --> 00:13:06,440 college table where the college name is Stanford. 378 00:13:07,460 --> 00:13:08,640 If we try to delete that couple 379 00:13:08,800 --> 00:13:09,880 we'll get an error because we 380 00:13:09,990 --> 00:13:11,040 do have an applied couple 381 00:13:11,700 --> 00:13:13,200 that has a value Stamford and 382 00:13:13,290 --> 00:13:15,190 is therefore referencing the couple we're trying to delete. 383 00:13:16,730 --> 00:13:18,590 Similarly tried to delete some student couples. 384 00:13:19,330 --> 00:13:20,340 Let's go back and look at 385 00:13:20,450 --> 00:13:21,690 our apply relation, and we 386 00:13:21,760 --> 00:13:23,020 now see that both of the 387 00:13:23,150 --> 00:13:24,500 student IDs in apply are "234", 388 00:13:25,140 --> 00:13:26,310 so it should be ok 389 00:13:26,560 --> 00:13:27,700 to delete the student couple 390 00:13:28,130 --> 00:13:29,750 with 123, nothing is referencing it. 391 00:13:30,160 --> 00:13:31,080 But it should not be okay 392 00:13:31,500 --> 00:13:33,460 to delete the couple with student 393 00:13:33,890 --> 00:13:35,240 ID 234 and indeed when we 394 00:13:35,360 --> 00:13:36,350 run the command we see that 395 00:13:36,470 --> 00:13:38,770 the first one generated an error and the second one succeeded. 396 00:13:40,990 --> 00:13:42,950 How about updating a referenced table. 397 00:13:43,650 --> 00:13:44,610 Let's say that we decide we'd 398 00:13:44,760 --> 00:13:46,030 rather have Berkley called Bezerkly 399 00:13:46,990 --> 00:13:48,120 So we try to update that 400 00:13:48,230 --> 00:13:49,480 college name, but when we 401 00:13:49,570 --> 00:13:50,570 run the command, we get an 402 00:13:50,660 --> 00:13:51,530 error because we do have 403 00:13:51,700 --> 00:13:54,240 an apply tuple that's referencing the value as Berkley. 404 00:13:55,870 --> 00:13:57,140 And finally, although we've been 405 00:13:57,290 --> 00:13:58,520 talking about data level modifications, 406 00:14:00,680 --> 00:14:02,280 referential integrity constraints also 407 00:14:02,680 --> 00:14:04,270 place restrictions on dropping tables. 408 00:14:05,150 --> 00:14:06,030 For example, if we try 409 00:14:06,270 --> 00:14:07,370 tried to drop the student table, 410 00:14:08,150 --> 00:14:09,470 we would again get a referential integrity 411 00:14:09,900 --> 00:14:10,990 constraint because that would leave 412 00:14:11,200 --> 00:14:12,240 data in the apply table 413 00:14:12,620 --> 00:14:15,970 referencing non-existent data in what would now be a non-existing table. 414 00:14:17,330 --> 00:14:18,340 You can see that the errors says 415 00:14:18,410 --> 00:14:19,450 that you cannot drop a table 416 00:14:19,660 --> 00:14:22,440 student because other objects are currently depending on it. 417 00:14:23,060 --> 00:14:24,630 So when we have referential integrity constraints, 418 00:14:25,230 --> 00:14:26,170 if we wanted to drop the tables, 419 00:14:26,610 --> 00:14:27,660 we'd have to first drop the 420 00:14:27,920 --> 00:14:30,050 applied table and then drop the table that it's referencing. 421 00:14:31,650 --> 00:14:32,370 Now we're going to set up 422 00:14:32,490 --> 00:14:33,460 the apply table to experiment 423 00:14:34,110 --> 00:14:35,770 with some of the automatic mechanisms 424 00:14:36,250 --> 00:14:37,830 for handling referential integrity violations. 425 00:14:38,990 --> 00:14:40,220 Specifically, we still have the 426 00:14:40,400 --> 00:14:42,130 same referential integrity constraints from 427 00:14:42,440 --> 00:14:43,580 student id to the student table 428 00:14:43,980 --> 00:14:44,960 and from college names to the 429 00:14:45,010 --> 00:14:46,970 college table, but for 430 00:14:47,250 --> 00:14:48,650 the student ID referential integrity 431 00:14:49,080 --> 00:14:49,880 constraint we're and we're going 432 00:14:50,110 --> 00:14:51,250 to specify that if a student 433 00:14:51,860 --> 00:14:53,220 is deleted, then we're going 434 00:14:53,470 --> 00:14:55,040 to set any referencing values to 435 00:14:55,190 --> 00:14:56,200 "null", and we do that 436 00:14:56,350 --> 00:14:58,150 with the keywords "on delete" which 437 00:14:58,350 --> 00:14:59,060 tells us what to do when 438 00:14:59,240 --> 00:15:00,100 there's a delete to the referenced 439 00:15:00,790 --> 00:15:02,690 table, we use the set null option. 440 00:15:03,830 --> 00:15:05,150 For the college name reference 441 00:15:05,930 --> 00:15:07,190 we're going to specify that if 442 00:15:07,420 --> 00:15:08,690 the college is updated in 443 00:15:08,890 --> 00:15:10,110 the college table and that says 444 00:15:10,240 --> 00:15:12,360 on update we'll use the cascade option. 445 00:15:13,200 --> 00:15:14,520 As a reminder what that does 446 00:15:14,580 --> 00:15:15,450 is that if we change the college 447 00:15:15,900 --> 00:15:17,180 name then we'll propagate that 448 00:15:17,350 --> 00:15:18,810 change to any college names 449 00:15:19,090 --> 00:15:20,530 that reference it in the apply table. 450 00:15:21,360 --> 00:15:22,890 Now I could have specified two more options. 451 00:15:23,330 --> 00:15:24,600 I could have specified an on 452 00:15:24,990 --> 00:15:26,450 update option for the 453 00:15:26,600 --> 00:15:27,820 student ID and an on 454 00:15:28,300 --> 00:15:29,400 delete option for the college 455 00:15:29,850 --> 00:15:31,030 name, so there could be four all together. 456 00:15:31,950 --> 00:15:32,830 Because I left those out 457 00:15:33,100 --> 00:15:34,160 those both will use the 458 00:15:34,270 --> 00:15:35,280 default which is the 459 00:15:35,430 --> 00:15:37,130 restrict option which says 460 00:15:37,770 --> 00:15:39,610 if you perform a modification 461 00:15:40,080 --> 00:15:41,310 that generates a referential integrity 462 00:15:41,760 --> 00:15:42,930 violation then an error will 463 00:15:43,120 --> 00:15:44,400 be generated and the modification 464 00:15:44,710 --> 00:15:47,160 will not be performed, just as we saw in the previous examples. 465 00:15:48,760 --> 00:15:49,900 So let's go ahead and let's 466 00:15:50,110 --> 00:15:51,120 create the table and then 467 00:15:51,260 --> 00:15:52,720 let's experiment with some modifications. 468 00:15:54,220 --> 00:15:56,660 Let's start by adding a couple more students to our student table. 469 00:15:57,450 --> 00:15:58,370 So then, if we take a look, 470 00:15:58,570 --> 00:15:59,870 our applied table is currently empty 471 00:16:00,210 --> 00:16:01,980 because we just created it, our 472 00:16:02,120 --> 00:16:03,450 college table has Stanford and 473 00:16:03,630 --> 00:16:05,370 Berkely, and our student table 474 00:16:05,510 --> 00:16:09,620 now has three tuples, student IDs "123", "234", and "235". 475 00:16:09,800 --> 00:16:13,300 We'll insert five tuples into 476 00:16:13,470 --> 00:16:14,670 the apply table, and all 477 00:16:14,910 --> 00:16:15,920 of them are going to be 478 00:16:16,080 --> 00:16:17,560 valid with respect to referential integrity. 479 00:16:18,050 --> 00:16:19,540 We're only going to insert students 480 00:16:20,560 --> 00:16:21,270 that are "123", "234", or "345", 481 00:16:21,380 --> 00:16:23,920 and they're only going to apply to Stanford or Berkley. 482 00:16:24,800 --> 00:16:26,110 So, we've inserted those values, 483 00:16:26,410 --> 00:16:29,120 and now we'll perform some modifications to see what happens. 484 00:16:30,530 --> 00:16:31,880 We're going to delete, from the 485 00:16:31,990 --> 00:16:34,520 student table, all students whose ID is greater than 200. 486 00:16:35,080 --> 00:16:36,440 Going back and looking at 487 00:16:36,480 --> 00:16:37,440 that table, we'll see that student 488 00:16:37,830 --> 00:16:40,320 "234" and "345" are going to be deleted. 489 00:16:41,270 --> 00:16:42,840 Now remember, we specified on 490 00:16:43,190 --> 00:16:44,710 delete set null for the 491 00:16:44,830 --> 00:16:46,360 apply referential integrity constraints. 492 00:16:47,050 --> 00:16:48,160 Specifically, when we look 493 00:16:48,340 --> 00:16:49,820 at our apply table the references 494 00:16:50,840 --> 00:16:51,660 to the students that are about 495 00:16:52,020 --> 00:16:53,390 to be deleted should be automatically 496 00:16:54,140 --> 00:16:56,440 set to null when we run the delete command. 497 00:16:57,090 --> 00:16:58,300 So, we'll go ahead. 498 00:16:59,020 --> 00:17:00,450 We'll perform the deletion. 499 00:17:01,490 --> 00:17:02,230 We'll take a look at the 500 00:17:02,320 --> 00:17:03,850 apply table when we We 501 00:17:04,030 --> 00:17:05,140 see that those values have indeed 502 00:17:05,490 --> 00:17:06,890 been set to null, and if 503 00:17:06,980 --> 00:17:07,670 we take a look at the student 504 00:17:08,030 --> 00:17:10,030 table, we'll see that we only have student123 left. 505 00:17:10,500 --> 00:17:13,880 Now, let's test our cascaded update. 506 00:17:14,520 --> 00:17:15,450 As a reminder, when we set 507 00:17:15,660 --> 00:17:16,990 up the college name referential integrity 508 00:17:17,420 --> 00:17:18,550 constraint, we said that 509 00:17:18,660 --> 00:17:19,910 if we update the college name 510 00:17:20,130 --> 00:17:21,450 in the college table, we should 511 00:17:21,800 --> 00:17:23,510 propogate those updates using cascade 512 00:17:23,730 --> 00:17:26,000 to any references in the applied table. 513 00:17:26,690 --> 00:17:27,820 So, we're once again going to 514 00:17:27,940 --> 00:17:28,930 attempt to change the name 515 00:17:29,170 --> 00:17:31,120 of Berkely to Bezerkly. 516 00:17:31,250 --> 00:17:32,060 This time it should allow us 517 00:17:32,150 --> 00:17:33,090 to do it, and it should 518 00:17:33,350 --> 00:17:36,310 change any applications to Berkely to now specify Berzerkly. 519 00:17:37,340 --> 00:17:38,080 So, we'll go ahead, and run 520 00:17:38,320 --> 00:17:39,400 the command, and we'll look 521 00:17:40,090 --> 00:17:41,270 at the apply, and we will 522 00:17:41,530 --> 00:17:42,980 see, once we refresh, that indeed 523 00:17:44,020 --> 00:17:46,400 Berkley has now been changed to Bezerkly . 524 00:17:47,130 --> 00:17:48,580 This example is a doozy. 525 00:17:49,250 --> 00:17:52,200 It's going to show a whole bunch of features that we haven't seen in previous examples. 526 00:17:53,230 --> 00:17:54,650 We're gonna use a simple table T 527 00:17:54,870 --> 00:17:56,270 with just three attributes A, B 528 00:17:56,420 --> 00:17:57,610 and C, and we're going to 529 00:17:57,780 --> 00:17:58,560 say that A and B together 530 00:17:59,100 --> 00:18:00,270 are a primary key for the table. 531 00:18:01,410 --> 00:18:02,390 In the example, we're going to 532 00:18:02,460 --> 00:18:04,040 demonstrate referential integrity within 533 00:18:04,440 --> 00:18:06,730 a single table, so intra-table referential integrity. 534 00:18:07,880 --> 00:18:09,770 We're going to demonstrate referential integrity 535 00:18:10,210 --> 00:18:11,490 involving multiple we'll attribute 536 00:18:11,950 --> 00:18:13,430 foreign keys and primary keys 537 00:18:13,470 --> 00:18:14,570 and we're going to demonstrate 538 00:18:15,170 --> 00:18:16,680 a real cascading where we're 539 00:18:17,050 --> 00:18:18,590 going to have a cascaded delete 540 00:18:19,000 --> 00:18:21,030 that's going to propagate across numerous couples. 541 00:18:22,090 --> 00:18:23,190 So typically one thinks of 542 00:18:23,320 --> 00:18:24,930 referential integrity as having 543 00:18:25,340 --> 00:18:26,630 a referencing table and then 544 00:18:26,770 --> 00:18:28,180 the referenced value exists in 545 00:18:28,270 --> 00:18:29,560 a different table but there's 546 00:18:29,730 --> 00:18:30,760 nothing wrong with having referential 547 00:18:31,350 --> 00:18:32,630 integrity within a single table. 548 00:18:33,390 --> 00:18:34,440 For example, in the one 549 00:18:34,820 --> 00:18:35,840 attribute case I might have 550 00:18:36,420 --> 00:18:37,770 attribute B where every value 551 00:18:38,160 --> 00:18:39,840 in B must also appear in 552 00:18:39,920 --> 00:18:41,210 value A and that would 553 00:18:41,380 --> 00:18:43,280 be a referential integrity constraint within the table. 554 00:18:44,510 --> 00:18:45,350 In this case things, are slightly 555 00:18:45,750 --> 00:18:46,990 more complicated because I'm using 556 00:18:47,330 --> 00:18:49,660 pairs of attributes for my referential integrity constraint. 557 00:18:50,760 --> 00:18:51,990 Specifically, I'm going to 558 00:18:52,270 --> 00:18:54,020 have attributes B and 559 00:18:54,220 --> 00:18:56,710 C together reference attributes A and B together. 560 00:18:57,830 --> 00:18:59,410 Syntactically, to declare a 561 00:18:59,790 --> 00:19:01,660 multi attribute referential integrity constraint 562 00:19:02,190 --> 00:19:03,770 in a table definition, I have 563 00:19:04,000 --> 00:19:05,130 to put it at the end because 564 00:19:05,400 --> 00:19:06,840 I can't attach it to a single attribute. 565 00:19:07,290 --> 00:19:08,080 Just like I do when I 566 00:19:08,220 --> 00:19:11,210 have, say, multi-attribute keys which are also demonstrated here. 567 00:19:12,000 --> 00:19:13,940 The syntax is at the end of the table definition. 568 00:19:14,930 --> 00:19:15,690 I'd say that I'm going to 569 00:19:15,810 --> 00:19:17,590 have foreign key constraint and 570 00:19:17,850 --> 00:19:19,080 attributes B and C 571 00:19:19,270 --> 00:19:20,700 together have a referential integrity 572 00:19:21,140 --> 00:19:22,630 constraint to attributes A 573 00:19:22,750 --> 00:19:25,010 and B of the same table Then, 574 00:19:25,210 --> 00:19:27,920 in addition, I'm going to specify, "on delete cascade". 575 00:19:29,040 --> 00:19:30,010 That means, if I delete 576 00:19:30,720 --> 00:19:31,930 an attribute, if I 577 00:19:32,020 --> 00:19:34,000 delete a tuple, then any 578 00:19:34,330 --> 00:19:35,940 other tuple that's referencing the 579 00:19:36,100 --> 00:19:37,020 AB values of that tuple 580 00:19:37,360 --> 00:19:40,410 with it's BC values will automatically be deleted. 581 00:19:41,990 --> 00:19:42,880 So, let's run the table 582 00:19:43,200 --> 00:19:44,330 creation, insert our data, 583 00:19:44,850 --> 00:19:45,870 and then, let's take a look 584 00:19:46,010 --> 00:19:47,180 at the table and predict what's 585 00:19:47,400 --> 00:19:49,060 going to happen when we actually delete a tuple. 586 00:19:49,290 --> 00:19:50,630 So, here's the contents 587 00:19:51,340 --> 00:19:52,320 of table T after the insertions. 588 00:19:53,420 --> 00:19:55,730 So, we see that A and B together do form a key. 589 00:19:56,250 --> 00:19:57,300 All of the AB pairs are 590 00:19:57,350 --> 00:19:59,320 unique, and furthermore, every 591 00:19:59,840 --> 00:20:01,220 BC pair has a 592 00:20:01,320 --> 00:20:03,680 corresponding AB value in the same table. 593 00:20:04,720 --> 00:20:06,040 So, every tuple except the 594 00:20:06,360 --> 00:20:08,150 first, the BC pair, references 595 00:20:08,920 --> 00:20:10,480 the AB pair of the preceding tuple. 596 00:20:10,840 --> 00:20:12,180 So, we have '1-1' here referencing 597 00:20:12,880 --> 00:20:14,630 '1-1' in tuple one. 598 00:20:15,490 --> 00:20:16,660 our two one here references 599 00:20:17,230 --> 00:20:19,570 the two one in table two and so on. 600 00:20:20,040 --> 00:20:21,390 So our referential integrity constraints 601 00:20:22,130 --> 00:20:23,110 are intact for this table. 602 00:20:25,500 --> 00:20:26,390 What we're going to delete 603 00:20:26,650 --> 00:20:27,850 the first tupple by running 604 00:20:28,130 --> 00:20:30,640 a command that says delete the tupple whose A value is one. 605 00:20:31,410 --> 00:20:33,300 When we delete tupple one because 606 00:20:33,800 --> 00:20:35,570 we have the cascaded delete set-up 607 00:20:35,960 --> 00:20:36,900 we will need to delete 608 00:20:37,470 --> 00:20:40,020 any tuple whose reference values are one one. 609 00:20:40,170 --> 00:20:41,660 So that will be couple two. 610 00:20:41,940 --> 00:20:43,360 So after deleting couple one 611 00:20:43,720 --> 00:20:47,040 the referential integrity constraint enforcement will delete couple two. 612 00:20:47,780 --> 00:20:49,160 When couple two is deleted the 613 00:20:49,360 --> 00:20:50,440 two one value will be gone 614 00:20:50,900 --> 00:20:51,890 So tuple 3 which references 615 00:20:52,490 --> 00:20:53,990 tuple 2 will be deleted, then 616 00:20:54,150 --> 00:20:55,450 tuple 4 which references 3 617 00:20:55,620 --> 00:20:56,820 will be deleted, and so on 618 00:20:57,390 --> 00:20:58,710 until the entire table is empty. 619 00:20:59,640 --> 00:21:01,910 So here's our delete command to delete the first tuple. 620 00:21:02,540 --> 00:21:03,250 We run the command. 621 00:21:04,150 --> 00:21:05,140 We go back and we look 622 00:21:05,340 --> 00:21:06,280 at the table, and when we refresh, 623 00:21:06,880 --> 00:21:08,290 we see that the table is indeed empty. 624 00:21:10,020 --> 00:21:12,050 That concludes our demonstration of referential integrity constraints. 625 00:21:13,060 --> 00:21:14,270 Referential integrity is actually 626 00:21:14,610 --> 00:21:17,270 extremely common in deployments of relational databases. 627 00:21:18,290 --> 00:21:19,580 The natural way to design a 628 00:21:19,830 --> 00:21:21,310 relational schema often has values 629 00:21:21,740 --> 00:21:23,150 in columns of one table referring 630 00:21:23,600 --> 00:21:24,620 to values of columns of another, 631 00:21:25,370 --> 00:21:26,600 and by setting up referential integrity 632 00:21:27,040 --> 00:21:28,190 constraints the system itself 633 00:21:28,510 --> 00:21:29,730 will monitor the database and 634 00:21:30,020 --> 00:21:31,490 make sure that it always remains consistent.