1 00:00:00,600 --> 00:00:01,920 This video covers functional dependencies. 2 00:00:02,920 --> 00:00:05,360 First a quick recap of relational design by decomposition. 3 00:00:06,690 --> 00:00:07,870 The idea is that the application 4 00:00:08,440 --> 00:00:10,140 designer writes mega relations 5 00:00:10,630 --> 00:00:12,220 that contain all the information that 6 00:00:12,370 --> 00:00:15,130 we want to have and properties of the data that we're storing. 7 00:00:15,340 --> 00:00:16,780 And then the system will automatically 8 00:00:17,150 --> 00:00:19,370 decompose those based on the properties that are specified. 9 00:00:20,530 --> 00:00:21,820 The final set of decomposed relations 10 00:00:22,280 --> 00:00:23,410 will satisfy what's called the 11 00:00:23,690 --> 00:00:25,030 normal form, and normal forms 12 00:00:25,260 --> 00:00:26,380 are good relations in the sense 13 00:00:26,620 --> 00:00:27,930 that they have no anomalies and they 14 00:00:28,070 --> 00:00:29,510 don't lose information from what 15 00:00:29,730 --> 00:00:31,440 was specified in the original mega relations. 16 00:00:32,580 --> 00:00:34,000 Now the properties themselves are defined 17 00:00:34,460 --> 00:00:36,340 either as functional dependencies in 18 00:00:36,440 --> 00:00:37,480 which case the system will 19 00:00:37,850 --> 00:00:39,070 generate Boyce-Codd Normal Form relations 20 00:00:39,610 --> 00:00:41,690 or multi-value dependencies, which 21 00:00:42,340 --> 00:00:45,520 will then yield fourth normal form relations. 22 00:00:46,450 --> 00:00:47,820 So this video, as you 23 00:00:47,910 --> 00:00:49,860 can tell, is about functional dependencies themselves. 24 00:00:50,460 --> 00:00:51,920 And let me say that functional 25 00:00:52,200 --> 00:00:53,070 dependencies are actually a generally 26 00:00:53,730 --> 00:00:56,770 useful concept in databases not only for relational design. 27 00:00:57,680 --> 00:00:59,530 So for functional dependencies, 28 00:00:59,610 --> 00:01:01,240 as we'll see soon, are a generalization 29 00:01:01,430 --> 00:01:03,240 of the notion of keys and they 30 00:01:03,520 --> 00:01:04,910 allow the system to, for 31 00:01:05,200 --> 00:01:06,200 example, store the data more 32 00:01:06,430 --> 00:01:08,260 efficiently when the system knows about functional dependencies. 33 00:01:09,130 --> 00:01:10,240 Compression schemes can be 34 00:01:10,370 --> 00:01:11,980 used based on functional dependencies for 35 00:01:12,110 --> 00:01:13,640 storage, and also functional 36 00:01:14,000 --> 00:01:15,510 dependencies, as a generalization of 37 00:01:15,660 --> 00:01:16,940 keys, can be used to 38 00:01:17,210 --> 00:01:19,440 reason about queries and for query optimization. 39 00:01:20,550 --> 00:01:23,120 Which is a reminder of a very important aspect of database systems. 40 00:01:23,710 --> 00:01:26,770 Which allows declarative queries to be executed by the system efficiently. 41 00:01:28,440 --> 00:01:29,710 By the way a third use of 42 00:01:29,960 --> 00:01:31,670 functional dependencies is for exam 43 00:01:32,150 --> 00:01:33,730 questions in database courses because 44 00:01:34,240 --> 00:01:36,480 there is a very nice theory functional dependencies as you'll see. 45 00:01:37,020 --> 00:01:38,880 It's quite easy to write questions about them. 46 00:01:39,690 --> 00:01:40,650 The remainder of the video will 47 00:01:40,940 --> 00:01:42,640 cover functional dependencies in general 48 00:01:42,990 --> 00:01:44,200 as a general concept and not 49 00:01:44,380 --> 00:01:46,350 specifically to relational design and 50 00:01:46,440 --> 00:01:47,740 then later videos will tie functional 51 00:01:48,140 --> 00:01:49,800 dependencies back to design by decomposition. 52 00:01:51,610 --> 00:01:52,790 As always we'll be using as 53 00:01:53,020 --> 00:01:54,210 a sample a college application 54 00:01:54,850 --> 00:01:56,010 database and this case I've 55 00:01:56,160 --> 00:01:58,700 expanded the information that we're including quite a bit. 56 00:01:59,140 --> 00:02:00,270 We'll be using these same two 57 00:02:00,370 --> 00:02:02,110 relations as examples throughout 58 00:02:02,460 --> 00:02:04,780 this video and subsequent videos on relational design. 59 00:02:05,370 --> 00:02:06,330 In this case we're gonna 60 00:02:06,440 --> 00:02:07,580 look at two relations, one with 61 00:02:07,820 --> 00:02:09,340 information about students and then 62 00:02:09,470 --> 00:02:10,900 a separate one with information about where they're applying. 63 00:02:11,840 --> 00:02:13,150 The student information will have social 64 00:02:13,400 --> 00:02:14,740 security number, the student's name, 65 00:02:15,060 --> 00:02:16,530 their address and then three 66 00:02:16,990 --> 00:02:18,490 attributes about their high school. We'll assume 67 00:02:18,630 --> 00:02:19,960 there are unique codes for high 68 00:02:20,130 --> 00:02:21,200 schools, but they also 69 00:02:21,670 --> 00:02:22,590 have a name and are in 70 00:02:22,670 --> 00:02:23,830 a city, finally the student's 71 00:02:24,250 --> 00:02:25,490 GPA, and a priority 72 00:02:25,680 --> 00:02:27,530 field for admissions that we'll see in a moment. 73 00:02:28,500 --> 00:02:30,040 For applications, we'll have the 74 00:02:30,110 --> 00:02:31,510 student's social security number, the 75 00:02:31,620 --> 00:02:32,830 college name they're applying to, 76 00:02:33,160 --> 00:02:35,690 the state of the college, the date of application and the major. 77 00:02:36,830 --> 00:02:37,480 Not all of these attributes 78 00:02:37,680 --> 00:02:38,570 will even be used in this 79 00:02:38,760 --> 00:02:39,500 video, but like I said, 80 00:02:39,800 --> 00:02:42,060 this will permeate several videos as our running example. 81 00:02:43,320 --> 00:02:44,830 To motivate functional dependencies, let's 82 00:02:45,010 --> 00:02:45,990 focus on the student relation 83 00:02:46,400 --> 00:02:48,670 and specifically on the GPA and priority attributes. 84 00:02:49,560 --> 00:02:50,630 Let's suppose that a student's 85 00:02:50,850 --> 00:02:52,720 priority is determined by their GPA. 86 00:02:53,620 --> 00:02:54,660 For example, we might have 87 00:02:55,160 --> 00:02:56,410 a rule that says if GPA 88 00:02:57,360 --> 00:03:00,620 is 3.8, greater than 3.8 then priority is 1. 89 00:03:01,350 --> 00:03:02,980 If the GPA is 90 00:03:03,220 --> 00:03:06,020 say, in between, let's say, 3.3 and 3.8. 91 00:03:06,130 --> 00:03:09,660 Then we'll set priority 92 00:03:10,210 --> 00:03:11,770 to be equal 2, and let's 93 00:03:12,090 --> 00:03:13,330 say if the GPA, then, is 94 00:03:13,650 --> 00:03:16,940 less than 3.3, then the priority value is three. 95 00:03:17,620 --> 00:03:18,930 So, if this relationship is guaranteed 96 00:03:19,620 --> 00:03:20,960 in our data then what we 97 00:03:21,060 --> 00:03:22,150 can say is that any two 98 00:03:22,480 --> 00:03:23,500 tuples that have the same 99 00:03:23,980 --> 00:03:26,100 priority are guaranteed to have the same GPA. 100 00:03:26,820 --> 00:03:28,000 And let's formalize that concept. 101 00:03:28,320 --> 00:03:29,350 So I'm going to write 102 00:03:29,420 --> 00:03:31,650 a little logical statement here to formalize the concept. 103 00:03:32,200 --> 00:03:33,030 I'm going to use the for all 104 00:03:33,240 --> 00:03:35,020 symbol from predicate logic and 105 00:03:35,240 --> 00:03:37,190 I'm going to say if we have any pair of tuples. 106 00:03:37,610 --> 00:03:38,820 So for all T or U, 107 00:03:39,040 --> 00:03:40,400 those are tuples in the 108 00:03:40,490 --> 00:03:43,580 student relation, then if 109 00:03:43,770 --> 00:03:44,760 the student, if the T 110 00:03:44,980 --> 00:03:46,370 and U have the same priorities 111 00:03:47,620 --> 00:03:48,860 I'm sorry the same, let me 112 00:03:49,050 --> 00:03:51,300 fix that, they have the the same GPA. 113 00:03:52,440 --> 00:03:54,750 So if T.GPA equals U.GPA, 114 00:03:56,420 --> 00:03:57,240 then and this is the 115 00:03:57,350 --> 00:03:59,230 logical implication symbol, then 116 00:03:59,450 --> 00:04:01,930 T.priority will equal U.priority. 117 00:04:03,910 --> 00:04:05,840 So this logical statement is in 118 00:04:05,940 --> 00:04:07,030 fact a definition of a functional 119 00:04:07,240 --> 00:04:08,390 dependency, and we would 120 00:04:08,520 --> 00:04:10,250 write that functional dependency as 121 00:04:10,640 --> 00:04:12,960 GPA arrow priority, so 122 00:04:13,100 --> 00:04:14,690 that says the GPA determines 123 00:04:15,320 --> 00:04:17,090 the priority, any 2 tuples 124 00:04:17,390 --> 00:04:18,950 with the same GPA must have the same priority. 125 00:04:19,990 --> 00:04:21,240 So that was a specific example. 126 00:04:22,230 --> 00:04:23,420 Now let's generalize our definition. 127 00:04:24,120 --> 00:04:25,530 So, let me replace GPA 128 00:04:26,130 --> 00:04:27,710 and priority here with just 129 00:04:27,990 --> 00:04:29,920 two attributes A and B 130 00:04:29,940 --> 00:04:31,040 E of say a relation 131 00:04:31,380 --> 00:04:33,910 R. And then we'll also need to modify our definition. 132 00:04:35,000 --> 00:04:37,890 So you can see, I've erased the specific attributes and relation. 133 00:04:38,700 --> 00:04:39,760 And I'll just say for every. 134 00:04:39,960 --> 00:04:41,170 T and U in our 135 00:04:41,320 --> 00:04:43,400 relation, R. If T 136 00:04:43,790 --> 00:04:45,230 dot A equals U 137 00:04:45,450 --> 00:04:46,820 dot A then T dot 138 00:04:47,140 --> 00:04:48,260 B equals U dot B 139 00:04:48,520 --> 00:04:50,320 and that's the definition of 140 00:04:50,700 --> 00:04:51,990 the functional dependency A determines B 141 00:04:52,350 --> 00:04:54,100 for a relation R. Actually 142 00:04:54,740 --> 00:04:56,100 I'm gonna generalize this definition even further 143 00:04:56,450 --> 00:04:57,650 because functional dependencies don't 144 00:04:57,780 --> 00:04:58,550 always have to have one 145 00:04:58,920 --> 00:05:01,050 attribute on each side, they can actually have a set of attributes. 146 00:05:02,060 --> 00:05:04,030 So now I write A1 A2 147 00:05:04,850 --> 00:05:06,160 dot dot dot AN on 148 00:05:06,290 --> 00:05:07,300 the left hand side, these will 149 00:05:07,490 --> 00:05:08,570 all be attributes of relation 150 00:05:08,960 --> 00:05:09,820 R. And on the right 151 00:05:10,070 --> 00:05:11,570 hand side, B1 B2 comma 152 00:05:11,940 --> 00:05:14,450 BM, again, attributes 153 00:05:14,590 --> 00:05:16,460 of R. Modifying the formal 154 00:05:16,760 --> 00:05:18,020 definition in red, now I 155 00:05:18,190 --> 00:05:19,640 can't use the dot notation anymore, 156 00:05:20,000 --> 00:05:21,510 so I'll use a square bracket and 157 00:05:21,630 --> 00:05:23,430 I'll write A1 through An 158 00:05:24,160 --> 00:05:25,930 equals U square bracket A1 159 00:05:26,110 --> 00:05:28,610 through AN, so what 160 00:05:28,780 --> 00:05:29,800 I'm saying here in this 161 00:05:29,980 --> 00:05:30,900 case is that the two 162 00:05:31,360 --> 00:05:33,020 tuples T and U 163 00:05:33,380 --> 00:05:34,730 have the same values for 164 00:05:34,930 --> 00:05:36,160 all of the attributes A-1 through 165 00:05:36,310 --> 00:05:37,980 A-N, and if they do, 166 00:05:38,310 --> 00:05:39,340 then they will also though have 167 00:05:39,650 --> 00:05:40,920 the same values for B-1 through B-M. 168 00:05:40,940 --> 00:05:43,170 We'll be getting to 169 00:05:43,280 --> 00:05:44,760 some concrete examples soon. 170 00:05:45,800 --> 00:05:48,010 Just one last bit of notation before we move on. 171 00:05:48,320 --> 00:05:49,650 For simplicity I'm going 172 00:05:49,900 --> 00:05:50,740 to often in the video 173 00:05:51,150 --> 00:05:52,370 abbreviate a list of 174 00:05:52,450 --> 00:05:54,420 attributes or set of attributes by using a bar. 175 00:05:54,650 --> 00:05:55,970 So I'll write A bar to 176 00:05:56,230 --> 00:05:58,060 indicate a set of attributes A 177 00:05:58,640 --> 00:05:59,640 and B bar to indicate 178 00:06:00,030 --> 00:06:02,210 a set of attributes B. And again, this is just for convenience. 179 00:06:03,710 --> 00:06:06,560 So we've seen the motivation for a functional dependency in a relation. 180 00:06:06,870 --> 00:06:08,170 A functional dependency for a 181 00:06:08,390 --> 00:06:09,810 relation is based on knowledge of 182 00:06:10,000 --> 00:06:11,330 the real world data that's being captured. 183 00:06:12,240 --> 00:06:13,610 And when we specify one, just 184 00:06:13,800 --> 00:06:15,100 like specifying keys, all instances 185 00:06:15,900 --> 00:06:17,950 of the relation must adhere to the functional dependency. 186 00:06:19,020 --> 00:06:20,620 So just to summarize functional dependencies, 187 00:06:21,330 --> 00:06:23,820 we say that attribute, a 188 00:06:23,870 --> 00:06:25,740 set of attributes A functionally 189 00:06:26,120 --> 00:06:26,930 determines a set of attributes 190 00:06:27,650 --> 00:06:29,570 B, if again any 191 00:06:29,790 --> 00:06:30,850 time tuples agree in their 192 00:06:31,000 --> 00:06:32,920 A values, they also agree in their B values. 193 00:06:33,740 --> 00:06:34,510 And let's say that our relation 194 00:06:35,060 --> 00:06:36,260 here R has [xx] tuples 195 00:06:36,760 --> 00:06:37,740 A the tuples and B 196 00:06:38,120 --> 00:06:39,080 and also a few more 197 00:06:39,340 --> 00:06:40,450 attributes we'll call those C. 198 00:06:41,190 --> 00:06:42,210 So, let me draw a picture 199 00:06:42,460 --> 00:06:43,660 of a relation now here that 200 00:06:43,810 --> 00:06:45,200 has those attributes in it. 201 00:06:45,370 --> 00:06:46,760 So, we'll have here.. 202 00:06:47,790 --> 00:06:50,270 lets just three columns, but again these are multiple attributes. 203 00:06:51,180 --> 00:06:52,130 And these are the attributes 204 00:06:52,540 --> 00:06:53,700 A, these are the attributes B, 205 00:06:54,070 --> 00:06:55,370 and these are the attributes C 206 00:06:56,180 --> 00:06:56,930 And if we put in a couple 207 00:06:57,250 --> 00:06:58,860 of tuples, then what we'll 208 00:06:59,110 --> 00:07:00,530 say is, if we have 209 00:07:00,880 --> 00:07:02,500 two tuples here, that have 210 00:07:02,770 --> 00:07:04,820 and I'm gonna use a bar even for the values in the tuples. 211 00:07:05,620 --> 00:07:07,010 If we have two tuples whose A 212 00:07:07,080 --> 00:07:08,850 values are the same then their 213 00:07:09,050 --> 00:07:10,620 B values must also be the same. 214 00:07:10,900 --> 00:07:13,580 And we're going to be using this type of template for some reasoning later on. 215 00:07:14,190 --> 00:07:15,310 But we're not saying their C values 216 00:07:15,530 --> 00:07:16,520 have to be the same so we 217 00:07:16,680 --> 00:07:19,260 could have C1 and and different C values here as well. 218 00:07:19,920 --> 00:07:21,140 But again, if we specify this 219 00:07:21,370 --> 00:07:22,660 functional dependency, we are saying 220 00:07:22,920 --> 00:07:24,170 that every instance of our 221 00:07:24,280 --> 00:07:26,170 relation must satisfy the 222 00:07:26,250 --> 00:07:27,170 condition that if the A 223 00:07:27,310 --> 00:07:29,630 values are the same, then the B values are also the same. 224 00:07:30,970 --> 00:07:32,140 Finally, let's come back to our example. 225 00:07:32,690 --> 00:07:33,540 And I think when we start writing 226 00:07:33,860 --> 00:07:35,260 functional dependencies for our actual 227 00:07:35,510 --> 00:07:38,060 relations it'll give you a good idea of what they're really capturing. 228 00:07:38,750 --> 00:07:39,620 So let's write a few 229 00:07:39,910 --> 00:07:41,070 functional dependencies for our student 230 00:07:41,500 --> 00:07:42,630 relation based on what we 231 00:07:42,750 --> 00:07:43,730 expect to be true in 232 00:07:43,890 --> 00:07:45,990 the real world in the data that we're capturing in the relation. 233 00:07:46,910 --> 00:07:48,630 So here's a first example, Social 234 00:07:49,060 --> 00:07:52,920 Security number functionally determines S-name, the student's name. 235 00:07:53,600 --> 00:07:54,500 So what we say, we have 236 00:07:54,600 --> 00:07:56,280 multiple tuples about a 237 00:07:56,560 --> 00:07:57,300 particular student and they have 238 00:07:57,470 --> 00:07:59,450 the same social security number, say 239 00:07:59,750 --> 00:08:01,340 two tuples about student 240 00:08:02,100 --> 00:08:03,600 123, we're expecting them to 241 00:08:03,660 --> 00:08:04,520 have the same name in fact, 242 00:08:04,700 --> 00:08:06,300 we're requiring them to have the same name. 243 00:08:06,850 --> 00:08:08,030 And presumably because 1 to 244 00:08:08,170 --> 00:08:09,600 3 is sort of identifying the 245 00:08:09,660 --> 00:08:10,760 student that would be 246 00:08:10,910 --> 00:08:12,330 a natural functional dependency that 247 00:08:12,460 --> 00:08:14,250 would hold in this case and 248 00:08:14,380 --> 00:08:15,770 similarly we would expect social security 249 00:08:16,370 --> 00:08:17,730 number to determine address, although 250 00:08:17,990 --> 00:08:19,270 we're already making an assumption 251 00:08:19,710 --> 00:08:21,110 about the real world here, if 252 00:08:21,320 --> 00:08:22,690 we have this particular functional dependency, 253 00:08:23,550 --> 00:08:24,890 then we're saying a student doesn't move. 254 00:08:25,210 --> 00:08:26,110 They don't have multiple addresses. 255 00:08:26,730 --> 00:08:28,480 Every tuple that describes that 256 00:08:28,720 --> 00:08:29,780 student by their social security 257 00:08:30,180 --> 00:08:31,350 number will have the same address. 258 00:08:32,870 --> 00:08:35,520 Let's go to the high school and see what might be going on there. 259 00:08:35,780 --> 00:08:36,690 So I mentioned that the 260 00:08:36,860 --> 00:08:38,250 high school code, what I'm 261 00:08:38,650 --> 00:08:39,470 trying to capture there is 262 00:08:39,570 --> 00:08:40,760 a unique code for each 263 00:08:40,980 --> 00:08:41,930 high school that might be filled 264 00:08:42,140 --> 00:08:44,050 in college application then we 265 00:08:44,180 --> 00:08:45,250 would expect the high school code 266 00:08:45,580 --> 00:08:46,760 to determine the high school name. 267 00:08:47,870 --> 00:08:48,920 Every time we have the particular 268 00:08:49,380 --> 00:08:50,440 high school code, maybe for different 269 00:08:50,720 --> 00:08:51,520 students, it would have the same 270 00:08:51,880 --> 00:08:53,910 name and also it would have the same city. 271 00:08:54,310 --> 00:08:55,870 So that's an example of a 272 00:08:56,030 --> 00:08:58,530 functional dependency with two attributes on the right hand side. 273 00:08:59,700 --> 00:09:00,710 Now, let's look at one 274 00:09:00,930 --> 00:09:02,550 that's a little more complicated, which 275 00:09:02,880 --> 00:09:04,120 is one that has two attributes 276 00:09:04,510 --> 00:09:05,650 on the left hand side instead. 277 00:09:06,130 --> 00:09:07,750 That actually turns out to be a more interesting case. 278 00:09:08,700 --> 00:09:09,850 In fact, in this particular case, 279 00:09:10,160 --> 00:09:11,770 we can probably reverse the arrow 280 00:09:12,090 --> 00:09:13,790 and have a functional dependency in the other direction. 281 00:09:14,900 --> 00:09:16,310 If we have a combination of 282 00:09:16,370 --> 00:09:17,430 high school name and high school 283 00:09:17,750 --> 00:09:19,010 city, I'm going to 284 00:09:19,110 --> 00:09:20,260 assume that's unique, that there aren't, 285 00:09:20,930 --> 00:09:22,080 there's never two high 286 00:09:22,260 --> 00:09:23,890 schools with the same name in the same city. 287 00:09:24,410 --> 00:09:25,410 And if that's the case, 288 00:09:25,890 --> 00:09:26,920 if that's unique, then we would 289 00:09:27,020 --> 00:09:29,140 expect a functional dependency to the high school code. 290 00:09:29,790 --> 00:09:30,810 Any time we have the same 291 00:09:31,110 --> 00:09:32,340 name and city, we're talking about 292 00:09:32,530 --> 00:09:34,340 the same high school so we should have the same code. 293 00:09:35,600 --> 00:09:36,800 What other examples do we have? 294 00:09:37,560 --> 00:09:38,520 If we assume that there's one 295 00:09:38,880 --> 00:09:40,310 GPA for each student then 296 00:09:40,410 --> 00:09:41,680 we'd have the social security number 297 00:09:41,930 --> 00:09:44,010 determines the GPA and we 298 00:09:44,200 --> 00:09:45,570 already talked about GPA determines 299 00:09:46,060 --> 00:09:48,390 priority and another 300 00:09:48,690 --> 00:09:50,030 example, actually if we 301 00:09:50,130 --> 00:09:51,300 put these two together we should 302 00:09:51,460 --> 00:09:52,300 see well if we have the 303 00:09:52,370 --> 00:09:55,110 same social security number twice we should have the same priority. 304 00:09:56,140 --> 00:09:56,870 And you may be thinking 305 00:09:56,970 --> 00:09:57,770 well, that's kind of a 306 00:09:58,100 --> 00:10:00,170 transitive rule if it takes these two and produces that one. 307 00:10:00,370 --> 00:10:00,930 And indeed it is. 308 00:10:01,180 --> 00:10:03,190 And we'll talk about rules for functional dependencies later. 309 00:10:03,470 --> 00:10:05,430 And there may be more in this case. 310 00:10:06,890 --> 00:10:09,440 Now let's take a look at functional dependencies for our apply relation. 311 00:10:10,270 --> 00:10:11,270 Actually, this one is a 312 00:10:11,430 --> 00:10:12,740 little trickier, it's even possible 313 00:10:13,230 --> 00:10:14,970 there are no functional dependencies at all. 314 00:10:15,550 --> 00:10:18,450 It really depends on the real world data, the real world constraints. 315 00:10:19,610 --> 00:10:21,260 One possibility for example is 316 00:10:21,440 --> 00:10:22,620 that every college has a 317 00:10:22,650 --> 00:10:25,190 particular single date on which it receives its application. 318 00:10:25,940 --> 00:10:27,050 So if that were the case, then 319 00:10:27,180 --> 00:10:29,730 we'd have the college name determines the date. 320 00:10:30,150 --> 00:10:31,550 In other words, every application 321 00:10:32,190 --> 00:10:34,450 for a particular college must have the same date. 322 00:10:35,460 --> 00:10:36,820 Another constraint might be that 323 00:10:36,990 --> 00:10:38,270 students are only allowed to 324 00:10:38,420 --> 00:10:41,280 apply to a single major at each college they apply to. 325 00:10:42,020 --> 00:10:43,530 So if that were the case, this 326 00:10:43,690 --> 00:10:44,760 is another one with two attributes 327 00:10:45,150 --> 00:10:46,640 on the left hand sid, we'd 328 00:10:46,860 --> 00:10:48,210 say that the social security number, 329 00:10:48,520 --> 00:10:51,110 together with the college, implies the major. 330 00:10:51,680 --> 00:10:52,770 In other words, we cannot have 331 00:10:53,190 --> 00:10:54,550 a student and college combination 332 00:10:55,610 --> 00:10:57,810 with two different majors and that captured that constraint. 333 00:10:58,940 --> 00:10:59,950 Maybe we have a constraint 334 00:11:00,190 --> 00:11:03,580 that students are only allowed to apply to colleges in one state. 335 00:11:04,030 --> 00:11:05,410 That seems rather unlikely, but I 336 00:11:05,500 --> 00:11:06,760 was struggling to find functional 337 00:11:07,170 --> 00:11:08,270 dependencies for this case. 338 00:11:08,950 --> 00:11:10,220 In that case, we'd have this 339 00:11:10,420 --> 00:11:11,750 function dependency, again, saying a 340 00:11:11,950 --> 00:11:13,820 student could only apply to colleges in a single state. 341 00:11:14,780 --> 00:11:16,630 For the apply relation specifically, again, 342 00:11:16,990 --> 00:11:18,310 it's really the real world constraints 343 00:11:18,910 --> 00:11:21,120 that drive which functional dependencies hold for the relation. 344 00:11:21,850 --> 00:11:23,050 But it's important to understand those 345 00:11:23,210 --> 00:11:24,110 constraints so they can be 346 00:11:24,220 --> 00:11:26,080 translated to functional dependencies, which 347 00:11:26,270 --> 00:11:28,420 then can drive a good relational design. 348 00:11:29,200 --> 00:11:30,230 So I've alluded a few 349 00:11:30,330 --> 00:11:31,290 times to the fact that functional 350 00:11:31,680 --> 00:11:33,520 dependencies generalize the notion of keys. 351 00:11:33,840 --> 00:11:35,580 And let's make that connection explicit now. 352 00:11:36,230 --> 00:11:37,030 Let's suppose we have a relation 353 00:11:37,550 --> 00:11:39,440 R and R has no duplicate tuples. 354 00:11:40,240 --> 00:11:42,680 R has some attributes A and it has some other attributes. 355 00:11:43,170 --> 00:11:45,120 Let's call those B. And let's 356 00:11:45,380 --> 00:11:46,830 suppose that we have a functional dependency that 357 00:11:47,090 --> 00:11:49,060 A determines all of the attributes in the relation. 358 00:11:50,190 --> 00:11:51,290 Now let me draw a picture of that. 359 00:11:52,320 --> 00:11:53,570 So here's our relation R with 360 00:11:54,030 --> 00:11:55,680 attributes A and attributes B. 361 00:11:56,050 --> 00:11:57,500 And now let's suppose that 362 00:11:57,680 --> 00:11:58,990 we have two tuples with the 363 00:11:59,080 --> 00:12:00,550 same values for A. Now 364 00:12:00,740 --> 00:12:02,330 we'll just write those as little a bar. 365 00:12:03,000 --> 00:12:04,880 And now let's see what happens with the B values. 366 00:12:05,310 --> 00:12:07,740 We'll make them B1 bar and B2 bar. 367 00:12:08,780 --> 00:12:09,760 Because we have the functional 368 00:12:10,160 --> 00:12:12,060 dependency, the equal values 369 00:12:12,460 --> 00:12:15,130 here for A say that B1 and B2 have to be equal. 370 00:12:15,410 --> 00:12:16,540 So B1 equals B2. 371 00:12:16,950 --> 00:12:18,610 Let's just erase the little one and two. 372 00:12:19,350 --> 00:12:21,080 But now we've generated duplicate tuples. 373 00:12:21,230 --> 00:12:22,170 So what the functional 374 00:12:22,750 --> 00:12:23,780 dependency tells us in 375 00:12:24,000 --> 00:12:24,970 that case is that these two tuples 376 00:12:25,340 --> 00:12:26,560 are actually the same, or rather 377 00:12:26,750 --> 00:12:28,030 we cannot have two separate 378 00:12:28,440 --> 00:12:29,800 tuples with the same A values. 379 00:12:30,680 --> 00:12:31,830 So we cannot have two separate 380 00:12:32,370 --> 00:12:33,310 tuples with the same A values 381 00:12:33,770 --> 00:12:34,930 is exactly what it means 382 00:12:35,350 --> 00:12:37,520 for A to be a key. 383 00:12:38,830 --> 00:12:39,490 Now again, this is only the case when we have 384 00:12:39,730 --> 00:12:41,260 no duplicates in R. But 385 00:12:41,480 --> 00:12:42,790 if we have no duplicates, if 386 00:12:43,090 --> 00:12:44,320 a set of attributes determines 387 00:12:44,930 --> 00:12:48,380 all the other attributes, then those attributes are a key. 388 00:12:49,650 --> 00:12:52,000 So here are a few other definitions related to functional dependencies. 389 00:12:52,980 --> 00:12:54,970 We have a notion of a trivial functional dependency. 390 00:12:55,970 --> 00:12:56,890 A functional dependency is trivial 391 00:12:57,540 --> 00:12:58,930 A to B if B 392 00:12:59,380 --> 00:13:00,830 is a subset of A. And 393 00:13:00,950 --> 00:13:02,790 let's draw a little picture of what that means. 394 00:13:03,820 --> 00:13:04,890 So here we have our attributes 395 00:13:05,390 --> 00:13:06,600 A, and then we're saying 396 00:13:06,980 --> 00:13:07,960 and that's all of the attributes 397 00:13:08,410 --> 00:13:09,520 here, and what we're saying is 398 00:13:09,650 --> 00:13:10,760 that B is a subset 399 00:13:11,420 --> 00:13:12,450 of A. So in other words, 400 00:13:12,840 --> 00:13:14,080 some portion of these attributes 401 00:13:14,670 --> 00:13:15,720 here, we'll just mark that in 402 00:13:15,840 --> 00:13:17,260 purple here are attributes 403 00:13:17,780 --> 00:13:18,930 B. Well, it's pretty obvious 404 00:13:19,550 --> 00:13:20,620 that if two tuples have 405 00:13:20,800 --> 00:13:22,440 the same values across their 406 00:13:22,670 --> 00:13:24,040 entire expanse here for A's, 407 00:13:24,690 --> 00:13:25,880 then obviously they're also going 408 00:13:26,100 --> 00:13:27,250 to have the same values for 409 00:13:27,790 --> 00:13:29,330 just this portion here, the B portion. 410 00:13:29,830 --> 00:13:31,650 So that's why it's called the trivial functional dependency. 411 00:13:33,170 --> 00:13:34,400 So a nontrivial functional dependency 412 00:13:35,140 --> 00:13:37,280 is a functional dependency that's not a trivial one. 413 00:13:37,740 --> 00:13:40,120 By the way, FD is a common abbreviation for functional dependency. 414 00:13:41,200 --> 00:13:42,280 So if we have A determines 415 00:13:42,810 --> 00:13:43,950 B, then that is 416 00:13:44,050 --> 00:13:45,300 non-trivial if it's not 417 00:13:45,640 --> 00:13:46,520 the case that B is a 418 00:13:46,780 --> 00:13:48,550 subset of A. Going to 419 00:13:48,720 --> 00:13:50,280 our picture, let's have here 420 00:13:50,660 --> 00:13:52,200 our attributes A. And now 421 00:13:52,540 --> 00:13:53,540 we're saying there are some 422 00:13:53,860 --> 00:13:55,220 attributes in B that are 423 00:13:55,320 --> 00:13:56,410 not part of A. So 424 00:13:56,660 --> 00:13:57,710 we can say, well maybe B 425 00:13:57,920 --> 00:13:59,050 is partially part of A, 426 00:13:59,550 --> 00:14:00,760 but there's some portion that is 427 00:14:00,920 --> 00:14:02,050 not part of A. So 428 00:14:02,200 --> 00:14:04,070 let's say that these are our B attributes here. 429 00:14:04,260 --> 00:14:05,500 So now our functional dependency 430 00:14:06,240 --> 00:14:07,300 is actually saying something. 431 00:14:08,000 --> 00:14:08,920 It's saying we have two 432 00:14:09,130 --> 00:14:10,830 attributes that agree in these 433 00:14:11,130 --> 00:14:12,780 values, then they're also 434 00:14:13,180 --> 00:14:15,020 going to agree in these values over here. 435 00:14:15,690 --> 00:14:17,310 And the last definition is 436 00:14:17,550 --> 00:14:19,380 a completely nontrivial functional dependency, 437 00:14:20,220 --> 00:14:21,570 and that's A determines B 438 00:14:22,220 --> 00:14:24,670 where A and B have no intersection at all. 439 00:14:25,330 --> 00:14:26,610 So in that case, again, going 440 00:14:26,870 --> 00:14:28,490 back to our picture, we'll have 441 00:14:28,710 --> 00:14:30,490 our A attributes here and 442 00:14:30,540 --> 00:14:31,900 then our B attributes are going 443 00:14:32,150 --> 00:14:34,130 to be some portion of the remaining attributes. 444 00:14:34,910 --> 00:14:35,990 And here we're saying a lot. 445 00:14:36,500 --> 00:14:37,520 We're saying if these two have 446 00:14:37,740 --> 00:14:40,220 the same value, then these two have the same value as well. 447 00:14:40,930 --> 00:14:41,960 And the reality is that completely 448 00:14:42,530 --> 00:14:43,980 nontrivial functional dependencies are the 449 00:14:44,320 --> 00:14:45,650 ones that we're most interested in specifying. 450 00:14:47,400 --> 00:14:48,330 I mentioned that there are some 451 00:14:48,470 --> 00:14:49,460 rules that apply to all 452 00:14:49,670 --> 00:14:52,040 functional dependencies and I'll give a couple of those right now. 453 00:14:52,470 --> 00:14:53,820 The first one is the splitting rule. 454 00:14:54,460 --> 00:14:55,510 The splitting rules says that if 455 00:14:55,650 --> 00:14:56,490 we have a set of attributes 456 00:14:56,630 --> 00:14:58,060 that determine another set of 457 00:14:58,250 --> 00:14:59,200 attributes - and this time I'm 458 00:14:59,420 --> 00:15:00,460 going to write it out instead of using 459 00:15:00,760 --> 00:15:02,310 the bar notation - then 460 00:15:02,590 --> 00:15:05,320 we also have...this implies that 461 00:15:05,470 --> 00:15:07,610 we have A determines B-1 462 00:15:07,940 --> 00:15:10,470 and A determines B-2 and so on. 463 00:15:11,110 --> 00:15:13,970 In other words, we can split the right side of the functional dependency. 464 00:15:14,550 --> 00:15:15,530 And if you think about it, 465 00:15:15,630 --> 00:15:16,650 this is pretty If we say 466 00:15:17,260 --> 00:15:18,290 that the when the A value 467 00:15:18,670 --> 00:15:19,710 are the same all of the 468 00:15:19,830 --> 00:15:20,820 B values have to be the 469 00:15:20,960 --> 00:15:22,200 same then certainly when the 470 00:15:22,340 --> 00:15:24,890 A values are the same the B values have to be the same independently. 471 00:15:25,720 --> 00:15:28,580 Now you might wonder if the splitting rule also goes the other way. 472 00:15:28,900 --> 00:15:30,020 So let's say we have, I'll 473 00:15:30,280 --> 00:15:31,420 put the left hand side, I'll 474 00:15:31,620 --> 00:15:33,270 write out the attributes explicitly so 475 00:15:33,620 --> 00:15:35,070 let's say we have A-1 476 00:15:35,290 --> 00:15:37,100 through A-N determines B then is 477 00:15:37,250 --> 00:15:38,420 it from that the case 478 00:15:38,860 --> 00:15:41,070 that A-1 determines B and 479 00:15:41,220 --> 00:15:44,020 A-2 determines B on its own, and so on. 480 00:15:44,630 --> 00:15:45,670 Well, the answer to that is no. 481 00:15:46,260 --> 00:15:49,340 And I'll give a simple example from our college application database. 482 00:15:50,330 --> 00:15:51,360 So let's say that we 483 00:15:51,490 --> 00:15:53,540 have the functional dependency high school 484 00:15:53,830 --> 00:15:55,560 name and high school 485 00:15:55,940 --> 00:15:57,760 city determines high school code. 486 00:15:58,120 --> 00:15:59,230 We talked about that one before. 487 00:16:00,420 --> 00:16:01,740 Oops, here, that's an arrow there. 488 00:16:02,280 --> 00:16:03,510 So that says that when we 489 00:16:03,670 --> 00:16:04,940 have a particular name and 490 00:16:05,120 --> 00:16:06,320 city combination for a high 491 00:16:06,490 --> 00:16:08,000 school, that's identifying a single 492 00:16:08,350 --> 00:16:10,860 high school, and so it will always have the same code. 493 00:16:11,450 --> 00:16:12,480 So that makes a lot of sense 494 00:16:13,030 --> 00:16:14,070 but it is not the case, 495 00:16:14,600 --> 00:16:15,610 I'll put a big X here, 496 00:16:16,210 --> 00:16:17,550 necessarily, that if we 497 00:16:17,680 --> 00:16:18,780 split the left hand side 498 00:16:19,400 --> 00:16:21,940 that high school name alone will determine a high school code. 499 00:16:22,630 --> 00:16:23,830 So for example, I would 500 00:16:24,060 --> 00:16:24,960 venture to guess that there 501 00:16:25,000 --> 00:16:26,380 are a lot of Jefferson High Schools 502 00:16:26,810 --> 00:16:28,130 all over the country and they 503 00:16:28,260 --> 00:16:29,750 won't all will be the same high school. 504 00:16:30,360 --> 00:16:31,780 So it's really the combination of attributes 505 00:16:32,090 --> 00:16:34,010 on the left that together functionally 506 00:16:34,440 --> 00:16:35,430 determine the right hand side 507 00:16:36,170 --> 00:16:37,340 and so we do not then have 508 00:16:37,630 --> 00:16:39,130 the splitting rule as a general principle. 509 00:16:39,510 --> 00:16:42,520 The combining rule is the inverse of the splitting rule. 510 00:16:42,890 --> 00:16:44,020 It says if we have A 511 00:16:44,220 --> 00:16:45,800 determines B-1 and we have 512 00:16:46,070 --> 00:16:47,070 A determines B-2 and so 513 00:16:47,900 --> 00:16:48,960 on up to A determines 514 00:16:49,340 --> 00:16:51,140 B-N, then we also 515 00:16:51,620 --> 00:16:54,640 have A determines B-1 through B-N together. 516 00:16:57,370 --> 00:16:59,240 Next, we have two trivial dependency rules. 517 00:16:59,620 --> 00:17:01,530 Let me remind remind you what a trivial dependency is. 518 00:17:01,810 --> 00:17:03,450 It's A determines B where 519 00:17:03,620 --> 00:17:04,810 B is a subset of A. 520 00:17:05,340 --> 00:17:06,730 So in other words, every left hand 521 00:17:06,940 --> 00:17:08,430 side determines itself or any 522 00:17:08,870 --> 00:17:10,040 subset of itself, and that's 523 00:17:10,220 --> 00:17:11,990 what drives the two trivial dependency rules. 524 00:17:12,680 --> 00:17:13,700 One of them says that if 525 00:17:13,960 --> 00:17:15,250 we have A determines B, 526 00:17:16,020 --> 00:17:17,950 then we also have A determines 527 00:17:18,700 --> 00:17:19,810 A union B, so in 528 00:17:19,890 --> 00:17:21,030 other words we can add to 529 00:17:21,190 --> 00:17:22,030 the right hand side of every 530 00:17:22,290 --> 00:17:24,210 dependency what's already on the left hand side. 531 00:17:24,700 --> 00:17:25,480 Sort of as a converse, 532 00:17:26,160 --> 00:17:27,200 we can also say that if 533 00:17:27,440 --> 00:17:29,930 A determines B then A 534 00:17:30,260 --> 00:17:32,340 determines A intersect B. 535 00:17:33,380 --> 00:17:34,790 Actually this one is also 536 00:17:35,340 --> 00:17:36,440 implied by the splitting rule. 537 00:17:36,630 --> 00:17:39,350 So we have two different rules in this case that are doing the same thing. 538 00:17:40,230 --> 00:17:42,910 And finally the transitive rule, which is the one we alluded to earlier. 539 00:17:43,560 --> 00:17:44,640 It says if we have A 540 00:17:44,780 --> 00:17:46,070 determines B and and 541 00:17:46,170 --> 00:17:47,990 we have B determines C, then 542 00:17:48,250 --> 00:17:49,560 we have A determines C. 543 00:17:50,540 --> 00:17:51,500 Now all of these rules 544 00:17:51,820 --> 00:17:53,100 can actually be proven formally 545 00:17:53,670 --> 00:17:54,400 and I'm going to go through 546 00:17:54,740 --> 00:17:55,950 a sketch of this particular one. 547 00:17:56,910 --> 00:17:58,950 So here's my relation R and 548 00:17:59,060 --> 00:18:00,650 I'll have attributes A, B, 549 00:18:01,070 --> 00:18:03,440 C and then let's let D be the left of our attributes. 550 00:18:04,360 --> 00:18:05,630 And my goal is to prove 551 00:18:06,190 --> 00:18:08,140 that A determines C. And 552 00:18:08,290 --> 00:18:09,280 the information I have to 553 00:18:09,410 --> 00:18:11,460 prove that is these two functional dependencies here. 554 00:18:12,520 --> 00:18:13,390 So to prove that A determines 555 00:18:14,020 --> 00:18:15,030 C, I have to 556 00:18:15,180 --> 00:18:16,320 prove that if two tupples 557 00:18:16,810 --> 00:18:18,570 have the same A values--we'll put 558 00:18:18,780 --> 00:18:21,490 little bars there--then they also have the same C values. 559 00:18:22,330 --> 00:18:25,060 So I need to show that these two C values here are going to be the same. 560 00:18:25,530 --> 00:18:27,040 So you can see what's going to happen. 561 00:18:27,520 --> 00:18:28,900 Using the first functional dependency, 562 00:18:29,570 --> 00:18:30,740 because these two A values 563 00:18:31,090 --> 00:18:33,530 are the same, I know their B values must be the same. 564 00:18:34,250 --> 00:18:35,550 And then I just use the second 565 00:18:35,890 --> 00:18:37,320 functional dependency and because 566 00:18:37,640 --> 00:18:38,660 the two B values are the 567 00:18:38,760 --> 00:18:39,810 same, I then know that 568 00:18:39,980 --> 00:18:40,920 the two C values are the 569 00:18:41,020 --> 00:18:42,020 same, and that has 570 00:18:42,160 --> 00:18:43,990 shown that this functional dependency holds. 571 00:18:44,290 --> 00:18:45,420 And you can do a 572 00:18:45,460 --> 00:18:47,550 similar thing to prove the other rules to yourself. 573 00:18:48,770 --> 00:18:51,200 Now, I'm going to introduce the concept of closure of attributes. 574 00:18:51,980 --> 00:18:52,940 Let's suppose we're given a 575 00:18:52,990 --> 00:18:54,330 relation, a set of 576 00:18:54,390 --> 00:18:55,860 functional dependencies for that 577 00:18:56,040 --> 00:18:57,230 relation, and then a set 578 00:18:57,530 --> 00:18:59,400 of attributes A bar that are part of that relation. 579 00:19:01,150 --> 00:19:02,570 I'm interested in finding all attributes 580 00:19:03,060 --> 00:19:04,620 B of the relation such 581 00:19:04,980 --> 00:19:06,370 that A bar functionally determines 582 00:19:07,010 --> 00:19:08,030 B. And this is what's 583 00:19:08,230 --> 00:19:09,530 called the closure, and I'll 584 00:19:09,660 --> 00:19:11,600 show in a bit why we might want to compute that. 585 00:19:12,570 --> 00:19:15,040 Notationally the closure is written using the + sign. 586 00:19:15,460 --> 00:19:17,800 So the closure of A bar is A bar +. 587 00:19:17,960 --> 00:19:19,010 Let me be a little 588 00:19:19,130 --> 00:19:20,010 more explicit, let me write 589 00:19:20,430 --> 00:19:21,640 out A bar, because remember 590 00:19:22,000 --> 00:19:24,570 whenever we write bar, we're actually talking about a list of attributes. 591 00:19:25,550 --> 00:19:26,420 So we're going to write it 592 00:19:26,520 --> 00:19:27,200 as A-1 through A-N, and I'm interested 593 00:19:28,560 --> 00:19:30,670 in computing the closure of that set of attributes. 594 00:19:31,490 --> 00:19:32,820 In other words, the entire set 595 00:19:33,190 --> 00:19:34,150 of attributes that are functionally 596 00:19:34,690 --> 00:19:36,720 determined by the attributes A-1 through A-N. 597 00:19:36,760 --> 00:19:39,470 And I'm going to give an algorithm for doing that. 598 00:19:40,340 --> 00:19:42,520 My algorithm says start with the set itself. 599 00:19:43,100 --> 00:19:44,190 So I'm going to start with 600 00:19:45,100 --> 00:19:46,510 A1 through AN, except I'm not going 601 00:19:48,310 --> 00:19:49,110 to close that. 602 00:19:49,300 --> 00:19:50,470 I'm going to leave a little space there. 603 00:19:50,720 --> 00:19:53,660 And then I'm going to repeat until there's no change. 604 00:19:54,730 --> 00:19:56,470 I'm going to add attributes to 605 00:19:56,690 --> 00:19:58,430 that set until I get to the closure. 606 00:19:59,250 --> 00:19:59,910 So I'm going to repeat. 607 00:20:00,920 --> 00:20:02,700 If A determines B 608 00:20:03,180 --> 00:20:03,990 - and now we'll put bars 609 00:20:04,430 --> 00:20:06,290 in here - and all 610 00:20:06,590 --> 00:20:07,830 of A are in the set, 611 00:20:09,010 --> 00:20:10,330 then add B to the set. 612 00:20:12,130 --> 00:20:13,280 So I might have my 613 00:20:13,670 --> 00:20:15,440 attributes here, A1 through AN, 614 00:20:15,550 --> 00:20:16,570 and it might be the case that 615 00:20:17,370 --> 00:20:18,710 A, you know, 4 determines 616 00:20:19,040 --> 00:20:20,250 attributes C and D, 617 00:20:20,630 --> 00:20:22,100 so I'll add C and D to the set. 618 00:20:22,650 --> 00:20:22,710 I repeat. 619 00:20:23,180 --> 00:20:24,090 Maybe there's a C goes 620 00:20:24,440 --> 00:20:26,350 to E and I'll add E to the set, and so on. 621 00:20:26,550 --> 00:20:29,460 And when there's no more change, then I've computed the complete closure. 622 00:20:30,640 --> 00:20:31,520 Now if you happen to be someone 623 00:20:31,830 --> 00:20:32,780 who likes to think in terms of 624 00:20:32,910 --> 00:20:34,330 rules instead, what we're effectively 625 00:20:34,950 --> 00:20:36,080 doing is applying the combining 626 00:20:36,810 --> 00:20:38,440 and transitive rules to the 627 00:20:38,720 --> 00:20:40,320 attributes in the set until there's no change. 628 00:20:41,500 --> 00:20:43,270 So let's run an example of the closure algorithm. 629 00:20:43,930 --> 00:20:44,890 Let's go to our complete student 630 00:20:45,290 --> 00:20:47,390 table and let's add three functional dependencies. 631 00:20:48,350 --> 00:20:49,790 One that says that student's social 632 00:20:50,050 --> 00:20:51,410 security number determines their name, 633 00:20:51,760 --> 00:20:53,000 address and GPA and that 634 00:20:53,050 --> 00:20:55,400 would be a normal... GPA determines 635 00:20:55,810 --> 00:20:56,850 priority, and high school 636 00:20:57,100 --> 00:20:58,920 code determines high school name and high school city. 637 00:20:59,360 --> 00:21:00,490 Again, these are all examples we 638 00:21:00,580 --> 00:21:01,730 gave earlier that would be 639 00:21:01,850 --> 00:21:04,010 natural functional dependencies for this particular relation. 640 00:21:05,470 --> 00:21:06,520 Let's suppose that we're interested 641 00:21:07,050 --> 00:21:08,450 in computing the closure of 642 00:21:08,650 --> 00:21:09,920 the two attributes - social security 643 00:21:10,350 --> 00:21:11,560 number and high school code. 644 00:21:12,540 --> 00:21:13,480 So in other words, we want 645 00:21:13,760 --> 00:21:15,500 to find all attributes in 646 00:21:15,640 --> 00:21:16,820 the student relation that are functionally 647 00:21:17,430 --> 00:21:19,560 determined by these two attributes. 648 00:21:20,530 --> 00:21:22,270 So the algorithm says start with 649 00:21:22,400 --> 00:21:24,080 the two attributes themselves, social security 650 00:21:24,520 --> 00:21:25,480 number and high school code, 651 00:21:26,250 --> 00:21:27,650 and then add attributes that 652 00:21:27,770 --> 00:21:29,490 are functionally determined by ones in the set. 653 00:21:30,290 --> 00:21:31,080 So if we start with our 654 00:21:31,320 --> 00:21:33,080 first functional dependency here, because 655 00:21:33,400 --> 00:21:34,710 we have social security number, that 656 00:21:34,890 --> 00:21:35,790 allows us to add the 657 00:21:35,850 --> 00:21:37,710 student name, the address, 658 00:21:40,120 --> 00:21:44,240 the GPA... and that's it for that one. 659 00:21:44,500 --> 00:21:45,520 Now let's repeat again. 660 00:21:46,500 --> 00:21:47,990 Because we have the GPA, our 661 00:21:48,120 --> 00:21:49,760 second functional dependency tells 662 00:21:49,980 --> 00:21:53,560 us we can add the priorityAnd that's it for that one. 663 00:21:54,310 --> 00:21:55,340 And then since we have the 664 00:21:55,420 --> 00:21:56,390 high school code in this set, 665 00:21:56,710 --> 00:21:57,790 our third one tells us 666 00:21:57,920 --> 00:21:58,830 that we can add the high school 667 00:21:59,140 --> 00:22:00,050 name and the high school 668 00:22:00,350 --> 00:22:01,680 city and at that 669 00:22:01,920 --> 00:22:03,030 point we certainly know we're done 670 00:22:03,270 --> 00:22:05,800 because we've actually got the entire relation at this point. 671 00:22:06,800 --> 00:22:08,910 Now I didn't pick this particular example at random. 672 00:22:09,960 --> 00:22:11,510 We took two attributes, social security 673 00:22:11,910 --> 00:22:13,050 number and high school code, we 674 00:22:13,140 --> 00:22:14,580 computed their closure, and we 675 00:22:14,780 --> 00:22:16,440 discovered that they together functionally 676 00:22:16,950 --> 00:22:19,430 determine all attributes of the student relation. 677 00:22:20,400 --> 00:22:21,460 Now remember just a few slides 678 00:22:21,750 --> 00:22:23,150 ago, when a set 679 00:22:23,280 --> 00:22:24,640 of attributes functionally determine all the 680 00:22:24,920 --> 00:22:27,790 attributes, then those attributes together form a key for the relation. 681 00:22:28,420 --> 00:22:29,390 And in fact if you think 682 00:22:29,600 --> 00:22:31,100 about it, social security number and 683 00:22:31,210 --> 00:22:33,770 high school code together are a natural key for the relation. 684 00:22:34,350 --> 00:22:35,410 A student might go to 685 00:22:35,480 --> 00:22:36,870 multiple high schools, but there's 686 00:22:37,070 --> 00:22:38,160 no reason to have more 687 00:22:38,560 --> 00:22:39,990 than one tupple with the 688 00:22:40,080 --> 00:22:40,940 combination of a particular 689 00:22:41,430 --> 00:22:42,730 student and the high school they attended. 690 00:22:44,100 --> 00:22:45,250 So let's formalize a bit this 691 00:22:45,420 --> 00:22:47,530 relationship between the notion of closure and keys. 692 00:22:48,290 --> 00:22:49,570 Let's suppose we're interested in 693 00:22:49,790 --> 00:22:51,270 answering the question... is a 694 00:22:51,340 --> 00:22:52,610 particular set of attributes 695 00:22:52,710 --> 00:22:55,610 a key for a relation we can use closure to do that. 696 00:22:55,850 --> 00:22:56,690 Remember, we have the relation, 697 00:22:57,300 --> 00:22:58,120 and we have a set of 698 00:22:58,200 --> 00:22:59,480 functional dependencies for the relation, 699 00:23:00,440 --> 00:23:01,610 so what we do is we 700 00:23:01,790 --> 00:23:04,090 compute the closure of 701 00:23:04,490 --> 00:23:05,510 A that's going to give 702 00:23:05,580 --> 00:23:07,150 us a set of attributes and 703 00:23:07,660 --> 00:23:09,840 if that equals all attributes, then 704 00:23:11,080 --> 00:23:11,430 A is the key. 705 00:23:12,630 --> 00:23:14,070 As a more general question, let's 706 00:23:14,260 --> 00:23:15,120 suppose we're given a set 707 00:23:15,270 --> 00:23:16,560 of functional dependencies for a 708 00:23:16,840 --> 00:23:17,850 relation, how can we use 709 00:23:18,090 --> 00:23:19,680 it to find all of the keys for the relation? 710 00:23:20,350 --> 00:23:21,600 So use the same basic idea 711 00:23:22,150 --> 00:23:23,470 of computing closure, but this 712 00:23:23,720 --> 00:23:24,490 time we have to do it for 713 00:23:24,930 --> 00:23:26,370 every subset of the attributes. 714 00:23:27,690 --> 00:23:29,440 So let's call each subset A 715 00:23:30,010 --> 00:23:31,270 bar and we just 716 00:23:31,500 --> 00:23:34,290 check if the closure of A bar determines all attributes. 717 00:23:35,010 --> 00:23:36,410 And if it does, then it's a key. 718 00:23:37,220 --> 00:23:39,100 And by considering every subset of 719 00:23:39,180 --> 00:23:40,210 the attributes in R, then 720 00:23:40,370 --> 00:23:41,970 we're considering every possible key 721 00:23:42,210 --> 00:23:43,480 and we'll just check for 722 00:23:43,680 --> 00:23:45,020 each one whether it's actually a key. 723 00:23:45,830 --> 00:23:47,090 Now that seems fairly inefficient 724 00:23:47,650 --> 00:23:49,560 and actually we can be a little more efficient than that. 725 00:23:50,030 --> 00:23:51,570 We can consider these subsets in 726 00:23:52,160 --> 00:23:53,490 increasing size order. 727 00:23:54,320 --> 00:23:55,720 So for example, if 728 00:23:55,920 --> 00:23:57,910 we determine that A and 729 00:23:58,150 --> 00:23:59,900 B together determine all 730 00:24:00,220 --> 00:24:02,130 attributes functionally determine all attributes in the relation. 731 00:24:03,020 --> 00:24:04,120 That tells us AB is 732 00:24:04,300 --> 00:24:05,820 a key, and that actually 733 00:24:06,060 --> 00:24:07,370 also tells us that every 734 00:24:07,900 --> 00:24:09,630 superset of AB is also a key. 735 00:24:09,880 --> 00:24:11,250 And if you think about it, that fills out naturally. 736 00:24:12,210 --> 00:24:13,600 So the real algorithm would 737 00:24:13,820 --> 00:24:16,510 say let's start with single attributes and determine if they are key. 738 00:24:16,940 --> 00:24:18,240 If a single attribute say 739 00:24:18,570 --> 00:24:19,530 C, is a key then so is 740 00:24:19,640 --> 00:24:20,940 every super set of C, and 741 00:24:21,190 --> 00:24:22,170 then we go to pairs of 742 00:24:22,370 --> 00:24:24,030 attributes and so on. 743 00:24:24,140 --> 00:24:25,080 Finally let's talk about how we 744 00:24:25,210 --> 00:24:27,740 specify the set of functional dependencies for a relation. 745 00:24:28,490 --> 00:24:29,740 First I'll define a notion of 746 00:24:29,840 --> 00:24:32,300 one set of functional dependencies following from another one. 747 00:24:32,910 --> 00:24:33,570 So let's suppose we have a 748 00:24:33,600 --> 00:24:34,740 relation R and we have 749 00:24:34,950 --> 00:24:35,980 two sets of functional 750 00:24:36,310 --> 00:24:38,380 dependencies that aren't identical, S-1 and S-2. 751 00:24:39,410 --> 00:24:41,410 We see that S2 follows from 752 00:24:41,590 --> 00:24:42,890 S1 if every relation instance 753 00:24:43,350 --> 00:24:45,760 satisfying S1 also satisfies S2. 754 00:24:47,140 --> 00:24:49,040 As a simple example, suppose S-2 755 00:24:50,020 --> 00:24:51,550 is social security number determines 756 00:24:52,040 --> 00:24:54,120 priority, and suppose S-1 757 00:24:55,260 --> 00:24:56,690 is the two functional dependencies 758 00:24:57,460 --> 00:24:59,050 - social security number determines GPA, 759 00:25:00,100 --> 00:25:01,150 and GPA determines priority. 760 00:25:02,350 --> 00:25:03,800 Then it's certainly the case that 761 00:25:04,160 --> 00:25:07,200 in an... for this example, S-2 follows from S-1. 762 00:25:08,040 --> 00:25:08,960 Every time we have a 763 00:25:09,000 --> 00:25:11,200 relation that satisfies social security 764 00:25:11,550 --> 00:25:13,000 number determines GPA and GPA 765 00:25:13,140 --> 00:25:15,050 determines priority, then that 766 00:25:15,230 --> 00:25:16,820 relation will also satisfy social 767 00:25:17,140 --> 00:25:18,550 security number determines priority and 768 00:25:18,630 --> 00:25:21,430 we kind of proved that actually in an earlier part of this video. 769 00:25:22,590 --> 00:25:23,390 So one question you might 770 00:25:23,640 --> 00:25:25,040 have is how do we test whether 771 00:25:25,200 --> 00:25:27,310 one set of functional dependencies follows from another. 772 00:25:28,150 --> 00:25:29,340 That really boils down to 773 00:25:29,540 --> 00:25:32,400 testing whether one functional dependency follows from a set. 774 00:25:32,810 --> 00:25:33,590 So and let me just make 775 00:25:33,770 --> 00:25:36,120 this A bar, B bar here to make clear they can be sets of attributes. 776 00:25:36,920 --> 00:25:37,790 Because if we have S-1 and 777 00:25:37,810 --> 00:25:39,740 S-2, then we just check whether 778 00:25:40,000 --> 00:25:41,720 every functional dependency in S-2 779 00:25:42,070 --> 00:25:43,680 follows from the functional dependencies in S-1. 780 00:25:44,700 --> 00:25:46,890 There's actually two ways of going about this test. 781 00:25:48,090 --> 00:25:49,750 One of the ways is to use the closure. 782 00:25:50,750 --> 00:25:52,970 We'll compute A+ based on 783 00:25:53,120 --> 00:25:54,600 the functional dependencies that are 784 00:25:54,760 --> 00:25:55,910 in S and then we'll 785 00:25:56,180 --> 00:25:59,000 check if B is in the set. 786 00:26:00,240 --> 00:26:01,830 Reminder - computing the 787 00:26:01,930 --> 00:26:03,650 closure tells us every attribute 788 00:26:04,020 --> 00:26:05,660 that's functionally determined by the 789 00:26:05,860 --> 00:26:07,080 attributes in A based on 790 00:26:07,310 --> 00:26:08,490 the functional dependencies in S. 791 00:26:09,230 --> 00:26:10,710 If those include B, then 792 00:26:11,250 --> 00:26:12,530 A determines B does indeed 793 00:26:13,050 --> 00:26:14,630 follow from S. The 794 00:26:14,690 --> 00:26:16,330 second oneway to check is based 795 00:26:16,600 --> 00:26:18,080 on a set of axioms, a 796 00:26:18,180 --> 00:26:19,700 set of rules called Armstrong's axioms. 797 00:26:20,570 --> 00:26:21,650 We saw some rules for functional 798 00:26:22,000 --> 00:26:23,530 dependencies earlier, but Armstrong's 799 00:26:24,280 --> 00:26:25,590 Axioms are a specific set 800 00:26:25,850 --> 00:26:27,510 of rules that are what's called complete. 801 00:26:28,470 --> 00:26:29,770 It's guaranteed that if one 802 00:26:30,170 --> 00:26:31,560 thing about functional dependencies can be 803 00:26:31,700 --> 00:26:32,990 proved from another, then it 804 00:26:33,090 --> 00:26:35,400 can be proved using the Armstrong's Axioms. 805 00:26:36,320 --> 00:26:37,660 I'm not going to cover Armstrong's Axioms 806 00:26:38,210 --> 00:26:39,130 in the videos, but you can 807 00:26:39,380 --> 00:26:41,400 look at any of the recommended readings and find them there. 808 00:26:42,400 --> 00:26:43,640 So you might wonder why did 809 00:26:43,780 --> 00:26:45,290 I introduce this notion of one 810 00:26:45,740 --> 00:26:47,250 set of functional dependencies following from 811 00:26:47,430 --> 00:26:48,260 another and for that matter, 812 00:26:48,440 --> 00:26:49,830 why did I introduce trivial and 813 00:26:49,990 --> 00:26:51,480 non-trivial functional dependencies. 814 00:26:52,630 --> 00:26:53,430 Well, I'm going to sum 815 00:26:53,690 --> 00:26:55,650 up in one sentence what 816 00:26:55,940 --> 00:26:56,970 we're looking for when we specify 817 00:26:57,590 --> 00:26:59,540 the set of functional dependencies for a relation. 818 00:26:59,780 --> 00:27:00,990 So we have a notion of the 819 00:27:01,120 --> 00:27:02,640 real world data, we have 820 00:27:02,800 --> 00:27:03,880 our attributes but we need 821 00:27:04,100 --> 00:27:06,020 to specify the functional dependencies in 822 00:27:06,100 --> 00:27:08,180 order to get a good designer for some of the reasons that I mentioned. 823 00:27:08,990 --> 00:27:10,120 What we would like to find 824 00:27:10,440 --> 00:27:11,930 is a minimal set of 825 00:27:12,040 --> 00:27:13,960 completely non-trivial functional dependencies 826 00:27:15,040 --> 00:27:16,210 such that all functional dependencies 827 00:27:16,750 --> 00:27:18,310 that hold on the relation follow 828 00:27:18,810 --> 00:27:20,290 from using the technical definition I 829 00:27:20,420 --> 00:27:22,100 gave, the dependencies in this set. 830 00:27:22,750 --> 00:27:23,870 Wow, that seems like some 831 00:27:24,000 --> 00:27:25,820 very complicated thing, but the 832 00:27:26,080 --> 00:27:27,180 fact is when you start 833 00:27:27,700 --> 00:27:29,010 specifying functional dependencies, you'll discover 834 00:27:29,460 --> 00:27:32,200 that you will actually get this definition pretty naturally. 835 00:27:33,940 --> 00:27:35,100 So to conclude, functional dependencies are 836 00:27:35,230 --> 00:27:37,070 a generally useful concept in database systems. 837 00:27:37,580 --> 00:27:38,850 They 're a key ingredient 838 00:27:39,500 --> 00:27:40,740 of doing relational design by decomposition, 839 00:27:41,940 --> 00:27:43,040 because we use the functional dependencies 840 00:27:43,580 --> 00:27:44,930 to get Boyce-Codd Normal Form which 841 00:27:45,070 --> 00:27:45,910 is what we'll cover in the next 842 00:27:46,120 --> 00:27:47,160 video, but they're also 843 00:27:47,470 --> 00:27:48,690 useful for the system to 844 00:27:48,780 --> 00:27:49,750 determine how to store data, 845 00:27:50,120 --> 00:27:51,220 to compress data and also 846 00:27:51,690 --> 00:27:52,890 to reason about query processing.