1 00:00:00,490 --> 00:00:01,700 Now that we've learned about functional dependencies, 2 00:00:02,390 --> 00:00:03,410 let's talk about how they're used 3 00:00:03,690 --> 00:00:06,050 to create relations that are in Boyce Codd Normal Form. 4 00:00:06,830 --> 00:00:08,870 Very quick reminder about relational design by decomposition. 5 00:00:10,240 --> 00:00:11,780 The database designer creates mega 6 00:00:12,170 --> 00:00:13,370 relations that contain all the 7 00:00:13,690 --> 00:00:14,800 information to be captured and specifies 8 00:00:15,350 --> 00:00:16,530 properties of the data to be captured. 9 00:00:17,470 --> 00:00:18,830 The system uses the properties to 10 00:00:18,950 --> 00:00:20,280 decompose the relations into smaller 11 00:00:20,690 --> 00:00:22,210 ones and those final decomposed 12 00:00:22,630 --> 00:00:24,700 relations satisfy what is known as a Normal Form. 13 00:00:25,270 --> 00:00:27,320 They don't have anomalies and they don't lose information. 14 00:00:28,500 --> 00:00:29,650 Functional dependencies are used to 15 00:00:29,920 --> 00:00:31,080 create relations in Boys 16 00:00:31,490 --> 00:00:32,910 Codd Normal Form and multi-doubt 17 00:00:33,330 --> 00:00:34,680 value dependencies are used to 18 00:00:34,820 --> 00:00:35,870 create relations in Fourth Normal 19 00:00:37,410 --> 00:00:37,410 Form. 20 00:00:39,240 --> 00:00:39,720 This video talks about the process of using 21 00:00:39,960 --> 00:00:42,530 functional dependencies to create relations in Boyce Codd Normal Form. 22 00:00:43,610 --> 00:00:44,720 Let's start by defining what 23 00:00:44,930 --> 00:00:47,130 it means to do a decomposition of a relational schema. 24 00:00:48,020 --> 00:00:49,010 Let's suppose we have a relation 25 00:00:49,280 --> 00:00:51,210 R with a set of attributes A-1 through A-N. 26 00:00:51,490 --> 00:00:54,860 We can decompose R into two relations. 27 00:00:55,360 --> 00:00:56,580 We'll call them R-1 and R-2 28 00:00:56,730 --> 00:00:58,910 such that R-1 has... I'll just 29 00:00:59,280 --> 00:01:01,150 label them B-1 through B-K 30 00:01:02,260 --> 00:01:05,090 and C-1 through C 31 00:01:05,590 --> 00:01:07,610 and let's say, let 32 00:01:07,760 --> 00:01:09,100 me use the notation for the 33 00:01:09,230 --> 00:01:10,450 list of attributes A, B 34 00:01:10,670 --> 00:01:12,020 and C that I've been using in other videos. 35 00:01:13,000 --> 00:01:14,590 So R1 and R2 are 36 00:01:14,710 --> 00:01:16,350 a decomposition of R. First 37 00:01:16,650 --> 00:01:18,380 of all if the attributes 38 00:01:18,880 --> 00:01:20,730 that capture B union C 39 00:01:21,380 --> 00:01:22,230 are equal to the set of 40 00:01:22,320 --> 00:01:23,530 attributes we started with A. 41 00:01:23,730 --> 00:01:24,880 In other words recovering all of 42 00:01:24,990 --> 00:01:26,600 the attributes, and furthermore, 43 00:01:27,890 --> 00:01:29,560 this is the tricky part, R1 44 00:01:29,980 --> 00:01:32,520 natural join R2 equals 45 00:01:33,050 --> 00:01:34,450 R. So, let me draw this pictorially. 46 00:01:35,930 --> 00:01:37,880 So here's our relation R and 47 00:01:38,120 --> 00:01:40,810 all of our attributes together are the A attributes. 48 00:01:41,680 --> 00:01:44,110 And then we're going to decompose R into R1 and R2. 49 00:01:44,410 --> 00:01:45,730 So, let's say this first 50 00:01:45,980 --> 00:01:47,490 set of attributes here are 51 00:01:47,660 --> 00:01:49,560 the B attributes and the 52 00:01:50,280 --> 00:01:51,520 second bunch of attributes here 53 00:01:51,630 --> 00:01:53,650 with some overlap are the C attributes. 54 00:01:54,700 --> 00:01:57,080 So, now R-1 consists of 55 00:01:57,590 --> 00:01:59,700 this portion of R and 56 00:01:59,860 --> 00:02:02,270 the purple part here now is R2. 57 00:02:03,700 --> 00:02:05,190 So, clearly the B 58 00:02:05,660 --> 00:02:07,700 and C attributes are equal 59 00:02:08,010 --> 00:02:10,610 to the original attributes and then 60 00:02:10,800 --> 00:02:12,440 is the join of R-1 and 61 00:02:12,750 --> 00:02:14,950 R-2 giving us R. Now remember, all of this is logical. 62 00:02:15,510 --> 00:02:17,140 We don't have R itself 63 00:02:17,710 --> 00:02:18,590 and we don't have the 64 00:02:18,670 --> 00:02:19,620 data and we don't have R-1 65 00:02:19,800 --> 00:02:21,940 and R-2, so everything is being done at the schema level. 66 00:02:22,670 --> 00:02:24,200 And we will explore later how 67 00:02:24,460 --> 00:02:25,830 we can guarantee that this 68 00:02:26,030 --> 00:02:28,790 join does return R and not something else. 69 00:02:29,920 --> 00:02:30,790 Now just using a little bit 70 00:02:30,930 --> 00:02:32,320 more relational algebra here, let 71 00:02:32,450 --> 00:02:33,990 me mention that R-1 can be 72 00:02:34,120 --> 00:02:35,470 defined as the projection 73 00:02:36,180 --> 00:02:37,890 on the B attributes of R 74 00:02:38,750 --> 00:02:40,160 and then in purple, R-2 75 00:02:40,450 --> 00:02:42,350 is the projection of the 76 00:02:42,440 --> 00:02:43,800 C attributes of R. So 77 00:02:43,930 --> 00:02:44,960 again all of this 78 00:02:45,200 --> 00:02:46,390 is logical, but the idea is 79 00:02:46,560 --> 00:02:48,440 that when we do the projection, if 80 00:02:48,630 --> 00:02:49,730 there are duplicates that are 81 00:02:49,900 --> 00:02:51,570 present simply because we have 82 00:02:51,910 --> 00:02:53,010 say, different values in the 83 00:02:53,090 --> 00:02:54,510 remaining attributes, those duplicates 84 00:02:54,800 --> 00:02:56,080 don't have to be retained in the projection. 85 00:02:57,250 --> 00:02:58,970 We saw in some 86 00:02:59,130 --> 00:03:00,660 of our examples in other videos 87 00:03:01,160 --> 00:03:02,860 where we had redundancy because we 88 00:03:02,930 --> 00:03:06,030 were capturing information multiple times that we didn't need to. 89 00:03:06,220 --> 00:03:06,900 And we're going to see that 90 00:03:07,030 --> 00:03:08,190 what Boyce Codd Normal Form really 91 00:03:08,400 --> 00:03:09,840 does is separate the relations 92 00:03:10,450 --> 00:03:13,200 so that we capture each piece of information exactly once. 93 00:03:13,770 --> 00:03:16,570 I know that's very abstract now, but when we see examples, we'll see how that works. 94 00:03:17,460 --> 00:03:18,360 So let's look at two 95 00:03:18,610 --> 00:03:20,270 possible decompositions of the 96 00:03:20,340 --> 00:03:22,430 student relation and see which ones are correct. 97 00:03:23,240 --> 00:03:24,250 So let's start with a 98 00:03:24,620 --> 00:03:26,510 decomposition where we take... we're 99 00:03:26,640 --> 00:03:29,460 gonna decompose student into S-1 100 00:03:29,520 --> 00:03:30,570 and S-2, and in S-1 we'll 101 00:03:31,180 --> 00:03:32,620 put the social security number, 102 00:03:33,190 --> 00:03:35,690 name, address, let me 103 00:03:35,810 --> 00:03:36,830 abbreviate a little bit here 104 00:03:37,530 --> 00:03:38,720 and the high school code but 105 00:03:38,850 --> 00:03:41,620 no more high school information, the GPA and the priority. 106 00:03:42,880 --> 00:03:44,650 And then in relation 2, 107 00:03:45,110 --> 00:03:46,280 we'll put the high school 108 00:03:46,500 --> 00:03:48,200 code and we'll put 109 00:03:48,400 --> 00:03:49,780 the high school name and the high school city. 110 00:03:50,200 --> 00:03:51,090 So you can see what I've 111 00:03:51,230 --> 00:03:52,480 done done here is I've separated out 112 00:03:52,690 --> 00:03:54,730 the high school information into a separate relation. 113 00:03:55,690 --> 00:03:56,900 So first of all, is this 114 00:03:57,100 --> 00:03:58,840 a correct decomposition in the 115 00:03:58,920 --> 00:04:00,430 sense that A union B 116 00:04:00,740 --> 00:04:02,070 equals C. Certainly all of 117 00:04:02,150 --> 00:04:03,100 the attributes are still present 118 00:04:04,010 --> 00:04:04,920 and furthermore, if you think 119 00:04:05,260 --> 00:04:06,170 about it and we'll formalize 120 00:04:06,650 --> 00:04:08,810 this concept later S1 join 121 00:04:09,220 --> 00:04:11,630 S2 and that's going to occur by 122 00:04:11,800 --> 00:04:13,430 the way based on this highest school code value. 123 00:04:14,260 --> 00:04:15,420 S1 join S2 in this 124 00:04:16,110 --> 00:04:17,520 case will, for the 125 00:04:17,630 --> 00:04:19,120 data that you would expect, equal student. 126 00:04:19,760 --> 00:04:21,480 Again, we'll formalize that momentarily. 127 00:04:22,720 --> 00:04:23,480 Now let's look at a second 128 00:04:23,630 --> 00:04:24,900 possible decomposition of the 129 00:04:24,940 --> 00:04:27,490 student relation again into two relations S1 and S2. 130 00:04:27,760 --> 00:04:30,250 In the first one we'll put the first bunch of attributes. 131 00:04:30,830 --> 00:04:31,830 So we'll put the social security 132 00:04:32,320 --> 00:04:33,980 number, the student's name, 133 00:04:34,370 --> 00:04:36,060 their address, their high school code. 134 00:04:36,910 --> 00:04:39,540 Let's say high school name and high school city. 135 00:04:40,370 --> 00:04:43,230 And then in the second relation, we'll put again the student name. 136 00:04:44,230 --> 00:04:45,150 We'll put the high school name 137 00:04:46,190 --> 00:04:47,520 and we'll put, say the 138 00:04:47,620 --> 00:04:49,470 GPA and lastly the priority. 139 00:04:50,750 --> 00:04:52,130 So again, is this a decomposition? 140 00:04:53,340 --> 00:04:54,670 Well, certainly again we have 141 00:04:54,860 --> 00:04:56,130 the case that the A union 142 00:04:56,600 --> 00:04:57,580 B equals C, in other 143 00:04:57,730 --> 00:04:58,910 words we've captured all of 144 00:04:58,990 --> 00:04:59,880 the attributes of the student 145 00:05:00,150 --> 00:05:01,370 relation in our decomposed relation 146 00:05:02,450 --> 00:05:03,600 and do we think it's 147 00:05:03,840 --> 00:05:05,330 the case that if we join S1 and 148 00:05:05,730 --> 00:05:06,970 S2 then we'll get 149 00:05:07,210 --> 00:05:08,690 the student relation back and I'll 150 00:05:08,870 --> 00:05:09,840 put a question mark here and 151 00:05:09,910 --> 00:05:11,350 you know, of course, the answer is going to be no. 152 00:05:12,190 --> 00:05:13,410 When we join back we'll 153 00:05:13,540 --> 00:05:14,540 be joining in this case 154 00:05:14,920 --> 00:05:16,150 on the student name here 155 00:05:17,110 --> 00:05:18,950 and the high school name and 156 00:05:19,170 --> 00:05:20,160 likely these are not 157 00:05:20,510 --> 00:05:21,710 unique values so when we 158 00:05:21,810 --> 00:05:22,870 join back, we may be 159 00:05:23,190 --> 00:05:25,590 getting information together that doesn't really belong together. 160 00:05:26,070 --> 00:05:27,330 And, again, we'll be formalizing 161 00:05:27,800 --> 00:05:29,700 that and seeing additional examples momentarily. 162 00:05:31,330 --> 00:05:33,840 So now let's dig a little further into the actual process of decomposition. 163 00:05:35,290 --> 00:05:37,370 So, first of all we definitely want good decomposition. 164 00:05:38,270 --> 00:05:39,220 So, as we saw a good 165 00:05:39,440 --> 00:05:42,660 decomposition must capture all of the attributes of course. 166 00:05:42,950 --> 00:05:44,440 But the more important property is 167 00:05:44,780 --> 00:05:46,300 that this reassembly by the 168 00:05:46,390 --> 00:05:48,660 join produces the original relation. 169 00:05:49,980 --> 00:05:51,250 Sometimes that's called, by 170 00:05:51,410 --> 00:05:53,280 the way, a lossless join property. 171 00:05:54,340 --> 00:05:55,400 But the second thing that we 172 00:05:55,500 --> 00:05:56,670 want is not only that 173 00:05:56,850 --> 00:05:58,760 we have a good decomposition but 174 00:05:58,950 --> 00:06:00,620 that the relations that we 175 00:06:00,720 --> 00:06:02,430 decompose into are good relations. 176 00:06:03,510 --> 00:06:04,760 And those relations are going to 177 00:06:04,870 --> 00:06:06,540 be the ones that are in Boyce-Codd Normal Form. 178 00:06:07,180 --> 00:06:08,270 So let me first define 179 00:06:08,980 --> 00:06:10,500 formally Boyce-Codd Normal Form 180 00:06:11,070 --> 00:06:12,090 and then we'll go back to figure 181 00:06:12,450 --> 00:06:14,610 out an algorithm for automatically decomposing 182 00:06:15,300 --> 00:06:16,610 relations using good decompositions 183 00:06:17,750 --> 00:06:20,210 into decomposed relations that are in Boyce-Codd Normal Form. 184 00:06:21,210 --> 00:06:22,680 So here's the formal definition of 185 00:06:22,750 --> 00:06:23,900 when a relation is in Boyce-Codd 186 00:06:24,290 --> 00:06:25,800 Normal Form, usually abbreviated B, 187 00:06:25,940 --> 00:06:28,330 C and F. A 188 00:06:28,440 --> 00:06:29,900 relation R with functional dependencies is in 189 00:06:30,200 --> 00:06:31,670 Boyce-Codd Normal Form if every 190 00:06:32,120 --> 00:06:33,820 functional dependencies is such 191 00:06:34,140 --> 00:06:36,550 that it's left-hand side is a key, ok. 192 00:06:37,230 --> 00:06:38,380 Let's see what happens when it's 193 00:06:38,560 --> 00:06:39,510 not the case that the 194 00:06:39,570 --> 00:06:40,520 left hand side of a functional 195 00:06:40,680 --> 00:06:41,320 dependency is not the key 196 00:06:41,590 --> 00:06:43,060 and we'll see why that's a bad design. 197 00:06:44,390 --> 00:06:45,760 So here's our relation R and 198 00:06:46,070 --> 00:06:47,460 here's a set attributes A on 199 00:06:47,620 --> 00:06:48,300 the left side of the functional 200 00:06:48,640 --> 00:06:50,690 dependency, attribute B and the rest. 201 00:06:51,820 --> 00:06:52,730 And let's just put in some values. 202 00:06:53,210 --> 00:06:54,480 So let's suppose that we have 203 00:06:54,980 --> 00:06:57,320 two tuples here with the same A value. 204 00:06:58,380 --> 00:06:59,500 Then by our functional dependency, 205 00:07:00,000 --> 00:07:01,030 we're going to have the same B 206 00:07:01,280 --> 00:07:02,840 value and the rest can be anything. 207 00:07:04,060 --> 00:07:05,290 What has happened here is 208 00:07:05,400 --> 00:07:06,620 that we've captured the piece 209 00:07:07,320 --> 00:07:09,960 of information the connection between A and B twice. 210 00:07:11,270 --> 00:07:12,730 And the reason that's allowed to 211 00:07:12,980 --> 00:07:14,780 happen is because A is not a key. 212 00:07:15,670 --> 00:07:16,610 if A were a key, we 213 00:07:16,750 --> 00:07:17,830 would not be allowed to 214 00:07:17,980 --> 00:07:19,120 have the situation where we have 215 00:07:19,380 --> 00:07:21,020 these two tuples both present in the relation. 216 00:07:21,910 --> 00:07:24,350 So this relation is not in voice cod normal form. 217 00:07:24,990 --> 00:07:26,610 And this functional dependency here 218 00:07:26,830 --> 00:07:28,600 is what we would call a B C and F violation. 219 00:07:29,980 --> 00:07:31,250 That violation is causing 220 00:07:31,730 --> 00:07:33,560 us to have redundancy in our 221 00:07:33,700 --> 00:07:35,290 relation and that also 222 00:07:35,860 --> 00:07:36,970 give us as we've seen the 223 00:07:37,170 --> 00:07:38,780 update anomalies and deletion anomalies. 224 00:07:38,970 --> 00:07:40,910 Let me clarify a 225 00:07:41,010 --> 00:07:42,100 little bit the requirement that the 226 00:07:42,170 --> 00:07:43,450 left-hand side of functional dependencies 227 00:07:44,040 --> 00:07:45,230 have to be key, so that's 228 00:07:45,590 --> 00:07:47,320 what tells us we're in Boyce-Codd normal form. 229 00:07:47,810 --> 00:07:48,930 Now I'm not saying that the 230 00:07:49,000 --> 00:07:50,020 left-hand side of every functional 231 00:07:50,320 --> 00:07:51,140 dependency has to be declared 232 00:07:51,790 --> 00:07:52,930 as the primary key for a 233 00:07:53,180 --> 00:07:55,000 relation, only that it is, in fact, a key. 234 00:07:55,540 --> 00:07:56,550 And, as you might recall, the definition 235 00:07:57,470 --> 00:07:58,570 of a key is an 236 00:07:58,830 --> 00:08:00,300 attribute that determines all other 237 00:08:00,610 --> 00:08:02,410 attributes, if you're thinking about functional 238 00:08:02,730 --> 00:08:04,070 dependencies, or if you 239 00:08:04,150 --> 00:08:05,140 don't have any duplicates in your 240 00:08:05,240 --> 00:08:06,590 relation, then a key is a 241 00:08:06,710 --> 00:08:08,570 value that is never duplicated across tacets. 242 00:08:09,650 --> 00:08:10,580 So if you think about it for 243 00:08:10,680 --> 00:08:11,800 a second, you'll realize that whenever 244 00:08:12,590 --> 00:08:13,710 a set of attributes is a 245 00:08:13,760 --> 00:08:15,070 key, so is any 246 00:08:15,300 --> 00:08:17,150 superset of those attributes. 247 00:08:17,590 --> 00:08:18,260 So if "A" is a key, 248 00:08:18,550 --> 00:08:21,170 then so is ac and so is a,c,d and so on. 249 00:08:21,650 --> 00:08:22,810 So sometimes you'll see in 250 00:08:22,910 --> 00:08:24,090 the definition of Boyce Codd normal 251 00:08:24,540 --> 00:08:26,190 form this wording not 252 00:08:26,870 --> 00:08:28,220 is a key but will be 253 00:08:28,590 --> 00:08:29,960 contains a key which in 254 00:08:30,250 --> 00:08:31,330 fact is exactly the same 255 00:08:31,770 --> 00:08:33,420 thing or sometimes it 256 00:08:33,490 --> 00:08:34,480 will even say is a 257 00:08:34,820 --> 00:08:35,970 super key, and a super 258 00:08:36,350 --> 00:08:39,330 key is a key or a super set of a key. 259 00:08:39,880 --> 00:08:41,130 Again, all of those are 260 00:08:41,420 --> 00:08:42,460 saying exactly the same thing, 261 00:08:42,690 --> 00:08:43,960 but I just wanted to clarify because 262 00:08:44,180 --> 00:08:45,420 different wording and sometimes different 263 00:08:45,670 --> 00:08:46,950 notation is used for that concept. 264 00:08:48,100 --> 00:08:49,210 So far, things have been pretty abstract. 265 00:08:49,680 --> 00:08:51,140 Let's try to get a bit more concrete here. 266 00:08:51,600 --> 00:08:52,930 Let's look at two examples and 267 00:08:53,120 --> 00:08:55,010 determine if those examples are in "B", "C", and "F". 268 00:08:55,850 --> 00:08:56,980 Remember to determine is something 269 00:08:57,190 --> 00:08:58,060 is in B, C, and F we 270 00:08:58,150 --> 00:09:00,530 need the relational schema and a set of functional dependencies. 271 00:09:01,570 --> 00:09:02,690 So here we have our student 272 00:09:03,050 --> 00:09:04,050 relation and this is a 273 00:09:04,100 --> 00:09:05,390 set of functional dependencies we had 274 00:09:05,600 --> 00:09:07,280 in earlier examples, where the 275 00:09:07,490 --> 00:09:10,260 social security number is determining the name, address, and GPA. 276 00:09:10,800 --> 00:09:11,430 That means that if there's two 277 00:09:11,720 --> 00:09:13,020 tuples with the same social 278 00:09:13,270 --> 00:09:15,680 security number they will have the same name, address, and GPA. 279 00:09:16,250 --> 00:09:19,250 That's the same student and they only live in one place. 280 00:09:19,780 --> 00:09:21,840 The GPA determines the 281 00:09:21,940 --> 00:09:23,050 priority, so any two 282 00:09:23,330 --> 00:09:24,390 students with the same GPA will 283 00:09:24,690 --> 00:09:26,560 have the same priority and, finally, 284 00:09:26,650 --> 00:09:28,720 the high school code determines the high school name and city. 285 00:09:29,250 --> 00:09:30,290 So the high school is a 286 00:09:30,340 --> 00:09:32,240 unique identifier for a 287 00:09:32,350 --> 00:09:33,800 particular high school in a city. 288 00:09:34,490 --> 00:09:35,200 So those are our three functional 289 00:09:35,530 --> 00:09:37,220 dependencies in order to 290 00:09:37,400 --> 00:09:38,570 test whether this is relation 291 00:09:38,850 --> 00:09:39,930 is in normal form with 292 00:09:40,070 --> 00:09:41,000 respect to the functional dependencies 293 00:09:41,570 --> 00:09:42,380 we need to know what the 294 00:09:42,440 --> 00:09:43,640 key of the relation is 295 00:09:43,850 --> 00:09:44,900 or the set of keys of the 296 00:09:45,010 --> 00:09:46,700 relation and we worked 297 00:09:47,010 --> 00:09:48,130 on this in an earlier video 298 00:09:48,480 --> 00:09:50,420 using the closure idea, so 299 00:09:50,580 --> 00:09:51,950 I'll just remind you now, 300 00:09:52,570 --> 00:09:54,260 that for this relation, this 301 00:09:54,430 --> 00:09:56,210 set of functional dependencies, there's one 302 00:09:56,540 --> 00:09:57,950 key or one minimal key 303 00:09:58,600 --> 00:09:59,520 and that's the social security 304 00:10:00,020 --> 00:10:01,040 number together with the high 305 00:10:01,210 --> 00:10:03,230 school code, those two attributes 306 00:10:03,690 --> 00:10:05,450 do functionally determine all other 307 00:10:05,720 --> 00:10:06,890 attributes in the relation and, therefore, 308 00:10:07,270 --> 00:10:09,020 they are together, forming a key. 309 00:10:10,200 --> 00:10:11,070 So now, to check if we're 310 00:10:11,220 --> 00:10:12,390 in Boyce-Codd Normal Form, we 311 00:10:12,480 --> 00:10:14,180 have to ask the question, "Does 312 00:10:14,720 --> 00:10:16,330 every functional dependency have 313 00:10:16,610 --> 00:10:18,010 a key on its left-hand side?" 314 00:10:18,600 --> 00:10:19,970 and the answer, of course, 315 00:10:20,620 --> 00:10:23,280 is no, not all. 316 00:10:23,640 --> 00:10:25,080 In fact, the reality is 317 00:10:25,400 --> 00:10:27,230 that no functional dependency, in 318 00:10:27,310 --> 00:10:28,900 this case, has the key on the left hand side. 319 00:10:29,460 --> 00:10:30,700 We have three left hand sides 320 00:10:30,990 --> 00:10:33,880 and no of them have or contain our one key. 321 00:10:34,560 --> 00:10:35,520 If you've given any thought at 322 00:10:35,610 --> 00:10:38,360 all to this database design, you will see that it's not a good one. 323 00:10:38,590 --> 00:10:40,100 It's combining too much information in 324 00:10:40,200 --> 00:10:41,360 one place, which is our 325 00:10:41,460 --> 00:10:42,470 basic idea, that we start 326 00:10:42,720 --> 00:10:43,970 with a mega relation and break it down. 327 00:10:44,310 --> 00:10:45,270 And so what we're going to 328 00:10:45,410 --> 00:10:46,990 do is use these functional dependencies, 329 00:10:47,680 --> 00:10:49,530 and specifically the fact that 330 00:10:49,650 --> 00:10:50,910 those are BCNF or Boyce 331 00:10:51,260 --> 00:10:52,980 Codd Normal Form violations, to break 332 00:10:53,340 --> 00:10:55,480 this relation down into one that is a better design. 333 00:10:56,620 --> 00:10:57,240 Now let's look at a second 334 00:10:57,580 --> 00:10:59,220 example, our apply relation to 335 00:10:59,290 --> 00:11:01,020 see if this one is in Boyce Codd Normal Form. 336 00:11:01,890 --> 00:11:02,770 So in this case as a 337 00:11:02,820 --> 00:11:03,810 reminder, we have social security 338 00:11:04,180 --> 00:11:06,390 number, college, state, date and major. 339 00:11:06,720 --> 00:11:07,490 So the date is the date 340 00:11:07,690 --> 00:11:08,770 of application; the major is 341 00:11:08,910 --> 00:11:09,780 major the student is applying 342 00:11:10,360 --> 00:11:12,050 for at that particular college and 343 00:11:12,140 --> 00:11:14,070 we'll have one functional dependency which 344 00:11:14,230 --> 00:11:15,580 effectively says in English that 345 00:11:15,730 --> 00:11:17,300 each student may apply to 346 00:11:17,760 --> 00:11:20,290 each college only once and for one major. 347 00:11:21,390 --> 00:11:22,720 Now let's compute the key for 348 00:11:22,860 --> 00:11:23,850 this relation, or keys for 349 00:11:23,980 --> 00:11:25,360 this relation based on the functional dependency. 350 00:11:26,210 --> 00:11:27,650 Well, it's pretty straightforward that these 351 00:11:28,030 --> 00:11:29,190 three attributes form a key 352 00:11:29,770 --> 00:11:31,000 because they determine the other 353 00:11:31,240 --> 00:11:32,540 attributes in the relation and therefore 354 00:11:32,800 --> 00:11:34,390 they determine all the attributes of the relation. 355 00:11:35,530 --> 00:11:36,730 Furthermore, we can see 356 00:11:36,980 --> 00:11:38,000 that our one and only functional 357 00:11:38,330 --> 00:11:39,830 dependency obviously has a 358 00:11:39,870 --> 00:11:40,900 key on its left hand side 359 00:11:41,520 --> 00:11:43,220 and so so this relation is in 360 00:11:43,350 --> 00:11:44,850 fact already in Boyce-Codd normal 361 00:11:45,320 --> 00:11:46,320 form and we'll see there's 362 00:11:46,530 --> 00:11:47,960 no way to decompose this 363 00:11:48,130 --> 00:11:49,680 relation further into a better design. 364 00:11:50,870 --> 00:11:51,790 So here we are again, talking 365 00:11:52,110 --> 00:11:53,910 about our decomposition process, so 366 00:11:54,020 --> 00:11:55,630 now we know what good relations are. 367 00:11:56,050 --> 00:11:56,880 They are in BCNF. 368 00:11:57,650 --> 00:11:59,080 And we saw earlier what good 369 00:11:59,480 --> 00:12:01,000 decompositions are, so now 370 00:12:01,410 --> 00:12:02,300 we're going to give an algorithm 371 00:12:02,980 --> 00:12:04,440 that's going to perform good decompositions 372 00:12:05,620 --> 00:12:06,960 and those decompositions are going to 373 00:12:07,210 --> 00:12:10,040 yield decomposed relations that are in Boyce Codd normal form. 374 00:12:11,160 --> 00:12:11,860 So here's the algorithm. 375 00:12:12,340 --> 00:12:13,130 The input is a relation 376 00:12:13,670 --> 00:12:14,630 r and the set of 377 00:12:14,710 --> 00:12:15,850 functional dependencies for that relation, 378 00:12:16,430 --> 00:12:17,490 and the output is going 379 00:12:17,740 --> 00:12:20,160 to be our good decomposition into good relations. 380 00:12:21,480 --> 00:12:22,930 So let's just go through it step by step. 381 00:12:23,330 --> 00:12:24,240 The first thing we're going to 382 00:12:24,330 --> 00:12:25,700 do is compute keys from 383 00:12:25,910 --> 00:12:26,650 r and were going to 384 00:12:26,750 --> 00:12:28,430 do that using functional dependencies as 385 00:12:28,580 --> 00:12:29,750 we have seen, and we saw 386 00:12:29,920 --> 00:12:32,070 in the actual algorithm for doing that in a previous video. 387 00:12:33,040 --> 00:12:34,030 Then we are going to start doing 388 00:12:34,250 --> 00:12:36,290 the decomposition process and we're 389 00:12:36,660 --> 00:12:37,960 going to repeat the decomposition 390 00:12:38,320 --> 00:12:39,770 until all of the relations are in BCNF. 391 00:12:40,410 --> 00:12:41,540 though we're going to take r and 392 00:12:41,780 --> 00:12:42,910 break it up into smaller relations 393 00:12:43,410 --> 00:12:44,860 and we might further break those 394 00:12:45,430 --> 00:12:46,610 into smaller relations, and so 395 00:12:46,870 --> 00:12:48,500 on, until every relation is good. 396 00:12:49,280 --> 00:12:50,670 So the breaking down process says 397 00:12:50,940 --> 00:12:52,480 pick a relation that's bad. 398 00:12:52,840 --> 00:12:53,540 So we're going to pick a relation, 399 00:12:54,050 --> 00:12:55,340 r prime in our current set, 400 00:12:55,600 --> 00:12:57,430 and again, we're starting with only r in our set. 401 00:12:58,550 --> 00:12:59,990 And we're going define a situation 402 00:13:00,610 --> 00:13:02,280 where that relation has a functional 403 00:13:02,700 --> 00:13:03,790 dependency and I guess, 404 00:13:04,110 --> 00:13:05,330 technically speaking, this would be 405 00:13:05,530 --> 00:13:07,180 with lines on top, that 406 00:13:07,380 --> 00:13:08,800 violates Boyce-Codd Normal Form 407 00:13:09,360 --> 00:13:10,750 and that violation is what's going 408 00:13:10,950 --> 00:13:12,070 to guide us to do the 409 00:13:12,160 --> 00:13:13,700 decomposition into better relations. 410 00:13:14,700 --> 00:13:16,550 So our decomposition then,in the 411 00:13:16,930 --> 00:13:18,160 second step, is going take our 412 00:13:18,380 --> 00:13:19,520 prime and and it's going 413 00:13:19,740 --> 00:13:21,160 to put the attributes involved 414 00:13:21,860 --> 00:13:23,260 in the functional dependency into one 415 00:13:23,540 --> 00:13:24,980 relation, and then it's 416 00:13:25,110 --> 00:13:26,540 going to keep the left-hand side 417 00:13:26,750 --> 00:13:28,310 of the functional dependency and put 418 00:13:28,580 --> 00:13:30,000 the rest of the attributes along 419 00:13:30,200 --> 00:13:32,120 with that left-hand side into a separate relation. 420 00:13:32,760 --> 00:13:33,810 So let's draw this as a picture. 421 00:13:34,660 --> 00:13:35,860 So here's our relation, r prime 422 00:13:36,230 --> 00:13:37,340 and of course the first time through 423 00:13:37,510 --> 00:13:38,910 our loop, that relation would have 424 00:13:39,090 --> 00:13:40,820 to be r self, and then 425 00:13:41,080 --> 00:13:42,270 because we have a functional 426 00:13:42,520 --> 00:13:43,770 dependency from A to 427 00:13:43,890 --> 00:13:44,920 B and A is not 428 00:13:45,310 --> 00:13:46,240 a key - that means 429 00:13:46,520 --> 00:13:48,020 it's BCNF violation - we're 430 00:13:48,160 --> 00:13:50,400 going to decompose this into R1 and R2. 431 00:13:50,680 --> 00:13:52,280 So R1 will contain 432 00:13:53,200 --> 00:13:54,500 the attributes in the functional dependency. 433 00:13:55,270 --> 00:13:56,290 R2 will contain the left-hand 434 00:13:56,840 --> 00:13:58,920 side of the functional dependency and the rest. 435 00:13:59,760 --> 00:14:00,990 We can see clearly that this 436 00:14:01,220 --> 00:14:02,540 is a decomposition that keeps all 437 00:14:02,820 --> 00:14:03,850 attributes, and what we'll 438 00:14:04,060 --> 00:14:05,060 see soon is that this is 439 00:14:05,160 --> 00:14:06,180 also a good one in 440 00:14:06,350 --> 00:14:07,760 that logically the join of 441 00:14:07,890 --> 00:14:09,490 these two relations is guaranteed 442 00:14:09,880 --> 00:14:11,120 to give us back what we had originally. 443 00:14:12,390 --> 00:14:13,990 Now the remaining two lines of 444 00:14:14,080 --> 00:14:16,160 the algorithm compute after the 445 00:14:16,230 --> 00:14:18,150 decomposition the set of 446 00:14:18,340 --> 00:14:19,480 functional dependencies for the decomposed 447 00:14:20,120 --> 00:14:21,560 relations and then the keys for those. 448 00:14:22,460 --> 00:14:23,550 I'm going to come back to this 449 00:14:23,860 --> 00:14:26,060 particular line here after doing an example. 450 00:14:27,310 --> 00:14:28,840 This is our same example, although squished 451 00:14:29,360 --> 00:14:30,450 to give me more space to write. 452 00:14:31,240 --> 00:14:32,250 And, as a reminder, in this 453 00:14:32,440 --> 00:14:33,650 example, we've computed a couple 454 00:14:33,910 --> 00:14:35,580 of the times that the key, given 455 00:14:35,920 --> 00:14:37,540 the functional dependencies, is the 456 00:14:37,640 --> 00:14:38,570 combination of the social 457 00:14:38,920 --> 00:14:40,240 security number and the high school code. 458 00:14:40,930 --> 00:14:41,800 So our goal is to take 459 00:14:42,350 --> 00:14:44,150 the student relation and iteratively break 460 00:14:44,420 --> 00:14:45,920 it down into smaller relations until 461 00:14:46,250 --> 00:14:48,420 all of the relations are in Boyce-Codd normal form. 462 00:14:48,800 --> 00:14:50,170 So let's start the iterative process. 463 00:14:51,170 --> 00:14:52,770 We pick some functional dependency 464 00:14:53,050 --> 00:14:54,610 that violates Boyce-Codd normal form 465 00:14:54,790 --> 00:14:56,200 and we use it guide our decomposition. 466 00:14:57,570 --> 00:14:58,760 So all three of these 467 00:14:59,000 --> 00:15:01,060 functional dependencies actually violate Boyce-Codd 468 00:15:01,660 --> 00:15:03,880 normal form because none of them have a key on the left-hand side. 469 00:15:04,100 --> 00:15:06,390 So let's start with the high school code one. 470 00:15:07,090 --> 00:15:08,760 So, to do the decomposition, 471 00:15:09,410 --> 00:15:10,620 based on this violating functional 472 00:15:10,970 --> 00:15:12,560 dependency, we're going to create two relations. 473 00:15:13,440 --> 00:15:14,520 The first one is going to 474 00:15:14,610 --> 00:15:16,140 contain just the three attributes 475 00:15:16,550 --> 00:15:18,050 that are in the functional dependency itself, 476 00:15:18,610 --> 00:15:20,890 so it's high school code, high school name and high school city. 477 00:15:21,960 --> 00:15:23,090 And the second one is going 478 00:15:23,360 --> 00:15:25,520 to have all remaining attributes in 479 00:15:25,730 --> 00:15:27,120 the relation, plus the left 480 00:15:27,440 --> 00:15:28,540 hand side, so we'll have 481 00:15:28,750 --> 00:15:31,530 the social security number, the name, the address. 482 00:15:32,640 --> 00:15:33,590 We will have the high school 483 00:15:33,830 --> 00:15:34,910 code because it's on the left-hand 484 00:15:35,190 --> 00:15:36,410 side of the functional dependency we're using, 485 00:15:36,710 --> 00:15:37,580 but we won't have the name 486 00:15:37,810 --> 00:15:38,530 and city from the right side 487 00:15:38,550 --> 00:15:40,890 hand side and I'll have a GPA and the priority. 488 00:15:42,370 --> 00:15:43,200 Now at this point our algorithm 489 00:15:43,670 --> 00:15:46,280 computes the functional dependencies for the decomposed those relations. 490 00:15:46,820 --> 00:15:48,200 For this particular example, we 491 00:15:48,310 --> 00:15:49,330 can just see what they 492 00:15:49,680 --> 00:15:51,040 are, they're the same functional dependencies 493 00:15:51,610 --> 00:15:53,430 that we had for the non-decomposed relation. 494 00:15:54,400 --> 00:15:55,130 Sometimes there's a little bit of 495 00:15:55,200 --> 00:15:57,420 computation you have to do and I'm going to talk about that in a bit. 496 00:15:57,920 --> 00:15:58,860 But, in this case, we can 497 00:15:59,050 --> 00:16:00,490 see, for example, for our relation 498 00:16:01,140 --> 00:16:02,460 S1, the only functional 499 00:16:02,800 --> 00:16:04,950 dependency is this functional dependency here. 500 00:16:05,720 --> 00:16:06,920 That tells us that high 501 00:16:07,150 --> 00:16:09,150 school code here is a key for S1. 502 00:16:09,400 --> 00:16:11,010 So our only functional 503 00:16:11,380 --> 00:16:12,540 dependency has a key on 504 00:16:12,670 --> 00:16:14,070 the left-hand side and that 505 00:16:14,330 --> 00:16:15,280 tells us that this relation 506 00:16:15,810 --> 00:16:16,880 is in Boyce-Codd normal form. 507 00:16:17,060 --> 00:16:19,450 So we're done with that one, but we're not done with S2. 508 00:16:20,680 --> 00:16:22,000 So for S2 the key is 509 00:16:22,290 --> 00:16:23,750 still the social security number and 510 00:16:23,890 --> 00:16:25,420 high school code together, so we 511 00:16:25,560 --> 00:16:26,840 still have these two functional dependencies 512 00:16:27,470 --> 00:16:28,960 that are Boyce-Codd normal form violations. 513 00:16:29,990 --> 00:16:31,650 So let's take the GPA priority 514 00:16:32,200 --> 00:16:35,120 one and let's guide that to decompose S3 further. 515 00:16:36,250 --> 00:16:37,820 We'll decompose S2 into S3 516 00:16:38,130 --> 00:16:39,490 and S4. 517 00:16:39,750 --> 00:16:41,600 S3 will contain the GPA and 518 00:16:41,940 --> 00:16:43,940 priority from the functional 519 00:16:44,250 --> 00:16:45,610 dependency we're using, and then 520 00:16:45,930 --> 00:16:46,980 S4 will take the remaining 521 00:16:47,770 --> 00:16:49,540 attributes in S-2 together 522 00:16:50,010 --> 00:16:51,500 with the left hand side of the GPA. 523 00:16:51,760 --> 00:16:52,820 So, we'll keep our social security 524 00:16:53,190 --> 00:16:56,170 number, name, address, high 525 00:16:56,450 --> 00:16:59,730 school code and GPA, but we don't keep the priority. 526 00:17:01,150 --> 00:17:02,580 So at this point, S-2 is 527 00:17:02,810 --> 00:17:05,890 completely gone and let's take a look at the remaining relations. 528 00:17:07,030 --> 00:17:08,040 S-3 now just has one 529 00:17:08,340 --> 00:17:10,290 functional dependency that applies and 530 00:17:10,390 --> 00:17:12,180 the left-hand side is now a key. 531 00:17:12,910 --> 00:17:14,070 And so now we're done with S-2. 532 00:17:14,450 --> 00:17:15,390 It's in Boyce Codd Normal 533 00:17:15,820 --> 00:17:16,780 Form, but we're not done, 534 00:17:17,000 --> 00:17:18,410 I'm sorry, we're done with S-3, 535 00:17:18,600 --> 00:17:21,710 but we're not done yet with S-4. 536 00:17:21,820 --> 00:17:23,320 S-4 still has social security and 537 00:17:23,510 --> 00:17:24,510 high school code as its 538 00:17:24,690 --> 00:17:25,470 key, and so we still 539 00:17:25,700 --> 00:17:28,650 have a violating functional dependency, so let's decompose S-4 further. 540 00:17:30,000 --> 00:17:31,920 We decompose into S-5 and S-6. 541 00:17:33,150 --> 00:17:34,790 S-5 contains the attributes in 542 00:17:35,170 --> 00:17:36,270 the functional dependency that we're using 543 00:17:36,590 --> 00:17:37,610 now, so it's the social security 544 00:17:37,990 --> 00:17:40,040 number, name, address and GPA 545 00:17:41,220 --> 00:17:42,360 and then as 6 contains 546 00:17:42,730 --> 00:17:44,640 the remaining attributes plus the 547 00:17:44,710 --> 00:17:45,650 left hand side, so that's the social 548 00:17:45,970 --> 00:17:47,930 security number and the high school code. 549 00:17:49,370 --> 00:17:50,390 And I will just tell you 550 00:17:50,620 --> 00:17:51,500 right now because you might 551 00:17:51,730 --> 00:17:52,550 be getting bored with this example, 552 00:17:53,110 --> 00:17:54,510 we're done with S-4 S5 553 00:17:55,050 --> 00:17:56,670 and S6 are now both 554 00:17:56,780 --> 00:17:58,980 in Boyce Codd normal form. 555 00:17:59,690 --> 00:18:01,440 So this is our final schema. 556 00:18:02,050 --> 00:18:04,170 It contains 4 relations, S1, 557 00:18:04,780 --> 00:18:05,930 with the information about high schools, 558 00:18:06,820 --> 00:18:08,380 S3 with the information 559 00:18:08,650 --> 00:18:10,250 about GPA's and priorities, S5 560 00:18:10,900 --> 00:18:11,900 has student with their name, 561 00:18:12,250 --> 00:18:13,550 address and GPA and S6, 562 00:18:13,740 --> 00:18:15,550 a student with the high school they went to. 563 00:18:15,710 --> 00:18:16,730 And, if you think about 564 00:18:16,940 --> 00:18:18,310 it, this really is a good 565 00:18:18,610 --> 00:18:20,010 schema design and it's what's 566 00:18:20,200 --> 00:18:21,330 produced automatically by the 567 00:18:21,670 --> 00:18:24,620 BCNF decomposition algorithm, using our functional dependencies. 568 00:18:25,540 --> 00:18:27,460 Let me mention a few more things about the algorithm. 569 00:18:28,160 --> 00:18:29,110 First of all, I left this 570 00:18:29,320 --> 00:18:30,420 step kind of mysterious, which 571 00:18:30,540 --> 00:18:33,560 is how we compute the functional dependencies for the decomposed relations. 572 00:18:34,700 --> 00:18:35,900 We can't just take the functional 573 00:18:36,180 --> 00:18:37,110 dependencies we already have for the 574 00:18:37,210 --> 00:18:38,840 bigger relation, and throw away 575 00:18:39,030 --> 00:18:40,210 the ones that don't apply exclusively 576 00:18:40,930 --> 00:18:41,680 to one or the other of the 577 00:18:41,990 --> 00:18:43,560 decomposed, we actually have to 578 00:18:43,640 --> 00:18:45,090 compute the functional dependencies that 579 00:18:45,220 --> 00:18:47,450 are implied and that apply to these relations. 580 00:18:48,290 --> 00:18:49,350 So in the video on 581 00:18:49,650 --> 00:18:50,870 functional dependencies, we learned all 582 00:18:51,100 --> 00:18:52,460 of implied functional dependencies, 583 00:18:53,110 --> 00:18:53,990 and we would use the closure 584 00:18:55,190 --> 00:18:56,270 as we did in that video 585 00:18:56,590 --> 00:18:58,250 to determine the implied functional dependencies. 586 00:18:58,940 --> 00:18:59,860 Now the reality is, for many 587 00:19:00,160 --> 00:19:01,380 examples, it's pretty obvious, 588 00:19:02,240 --> 00:19:03,670 we saw in the previous example it is. 589 00:19:04,300 --> 00:19:05,120 And, the other thing is, that 590 00:19:05,280 --> 00:19:06,350 this is being done by a computer 591 00:19:06,670 --> 00:19:07,620 so, we don't actually have to 592 00:19:07,790 --> 00:19:09,970 worry about it except when we're doing exercises for our class. 593 00:19:11,670 --> 00:19:14,290 Second, let me mention that there is little nondeterminism in this algorithm. 594 00:19:14,750 --> 00:19:16,080 It says "pick any of 595 00:19:16,250 --> 00:19:17,040 our relations with a violating 596 00:19:17,650 --> 00:19:19,060 functional dependency and use that 597 00:19:19,190 --> 00:19:20,850 to guide the next decomposition step". 598 00:19:21,690 --> 00:19:22,420 So, the fact is, that you 599 00:19:22,480 --> 00:19:24,460 can get a different answer, depending 600 00:19:24,830 --> 00:19:26,580 on which one you choose at this point in time. 601 00:19:27,270 --> 00:19:28,620 All of the results will be 602 00:19:28,860 --> 00:19:31,070 BCNF decomposition but they might not be the same one. 603 00:19:31,230 --> 00:19:32,080 And, in fact, if you go 604 00:19:32,210 --> 00:19:33,250 back to the example that I 605 00:19:33,390 --> 00:19:34,480 did, and you pick the 606 00:19:34,590 --> 00:19:35,600 functional dependencies in a different 607 00:19:35,930 --> 00:19:37,870 order, you might get a different final schema. 608 00:19:38,160 --> 00:19:39,620 But, again, it will be in BCNF. 609 00:19:40,770 --> 00:19:42,540 And lastly, some presentations of 610 00:19:42,610 --> 00:19:44,400 the BCNF decomposition algorithm actually 611 00:19:44,490 --> 00:19:46,180 have another step here, which 612 00:19:46,380 --> 00:19:49,550 is to extend the functional dependency that is used for the decomposition. 613 00:19:50,480 --> 00:19:51,810 So we're using A to 614 00:19:51,960 --> 00:19:53,050 B but if we have 615 00:19:53,290 --> 00:19:54,020 A to B we also have 616 00:19:54,470 --> 00:19:55,340 A to B and we can add 617 00:19:55,590 --> 00:19:56,720 more attributes, those in the 618 00:19:56,860 --> 00:19:58,410 closure of A. If you 619 00:19:58,470 --> 00:19:59,830 remember the closure and that's 620 00:20:00,050 --> 00:20:01,540 also a correct functional dependency. 621 00:20:02,240 --> 00:20:04,350 By doing this extension, we 622 00:20:04,440 --> 00:20:06,210 will still get a correct BCNF answer, 623 00:20:06,840 --> 00:20:07,700 but we'll tend to get relations 624 00:20:08,180 --> 00:20:09,280 that are larger than the ones 625 00:20:09,450 --> 00:20:11,100 we get if we don't do the extension first. 626 00:20:12,290 --> 00:20:14,030 In some cases, larger relations are 627 00:20:14,130 --> 00:20:15,210 better because you don't need 628 00:20:15,470 --> 00:20:16,380 to join them back when you're 629 00:20:16,480 --> 00:20:17,890 doing queries, but that can 630 00:20:18,120 --> 00:20:19,830 depend on the query load on the data base. 631 00:20:21,230 --> 00:20:23,450 So to conclude, does BCNF guarantee a good decomposition? 632 00:20:24,410 --> 00:20:25,330 Well of course the answer is 633 00:20:25,390 --> 00:20:26,310 yes or I wouldn't have spent 634 00:20:26,520 --> 00:20:27,890 all this time teaching you about it. 635 00:20:28,670 --> 00:20:30,030 Does it remove the anomalies that 636 00:20:30,150 --> 00:20:31,030 we looked at in our first 637 00:20:31,310 --> 00:20:33,560 example in another video of bad relational design? 638 00:20:34,040 --> 00:20:34,940 Yes, it does remove anomalies. 639 00:20:35,720 --> 00:20:37,580 When we have multiple instances of 640 00:20:37,650 --> 00:20:38,930 the same piece of information being 641 00:20:39,170 --> 00:20:40,570 captured, that's what's squeezed 642 00:20:41,160 --> 00:20:42,680 out by the decomposition into BCNF, 643 00:20:43,250 --> 00:20:44,590 and that's fairly easy to 644 00:20:44,780 --> 00:20:46,120 see through the examples that we've done already. 645 00:20:47,030 --> 00:20:48,470 It's a little less obvious seeing 646 00:20:48,760 --> 00:20:51,240 why BCNF composition does allow 647 00:20:51,620 --> 00:20:52,740 us to have a breakdown of 648 00:20:52,820 --> 00:20:55,750 a relation that logically reconstructs the original relation. 649 00:20:56,180 --> 00:20:57,550 So let's look at that a little bit more. 650 00:20:57,810 --> 00:20:58,810 So we're taking a relation 651 00:20:58,990 --> 00:21:00,540 in R, we're producing R1 and 652 00:21:00,730 --> 00:21:02,520 R2, and we want to 653 00:21:02,660 --> 00:21:03,730 guarantee that when we 654 00:21:03,860 --> 00:21:06,120 join R1 and R2, we get R back. 655 00:21:06,770 --> 00:21:09,250 We don't get too few tuples and we don't get too many tuples. 656 00:21:10,370 --> 00:21:12,020 Too few is pretty easy to see. 657 00:21:12,350 --> 00:21:13,560 If we break R into R1 658 00:21:13,770 --> 00:21:15,340 and R2 and their projections of 659 00:21:15,540 --> 00:21:16,380 R then when we join 660 00:21:16,700 --> 00:21:18,570 them back, certainly all the data is still present. 661 00:21:19,460 --> 00:21:22,190 If they're too many tuples, that's a little bit more complicated to see. 662 00:21:22,330 --> 00:21:23,810 Let's just use a simple abstract example. 663 00:21:25,300 --> 00:21:26,910 So here's a relation R with three attributes. 664 00:21:28,120 --> 00:21:32,650 Let's have two tuples: 1, 2, 3 and 4, 2, 5. 665 00:21:32,730 --> 00:21:34,360 Let's suppose that we decompose R 666 00:21:34,750 --> 00:21:37,160 into R1 and R2. 667 00:21:37,280 --> 00:21:38,450 R1 is going to contain 668 00:21:39,140 --> 00:21:40,060 AB, and R2 is going to contain 669 00:21:41,660 --> 00:21:41,660 BC. 670 00:21:44,580 --> 00:21:47,570 So let's fill in the data: in R1 we have 1, 2, 4, 2. 671 00:21:47,650 --> 00:21:48,590 And in R2 we have 2, 3, 2, 5. 672 00:21:49,830 --> 00:21:51,010 Now let's see what happens when 673 00:21:51,150 --> 00:21:53,230 we join R1 and R2 back together. 674 00:21:54,270 --> 00:21:56,370 When we do that, you might see what's going to happen. 675 00:21:58,270 --> 00:22:03,290 We're actually going to get four tuples. 676 00:22:03,630 --> 00:22:05,920 We're going to get "1,2,3", "1,2,5", "4,2,3" and "4,2,5". 677 00:22:06,030 --> 00:22:08,550 That is not same as our original relation, so what happened? 678 00:22:09,920 --> 00:22:12,850 Well, what happened is we didn't decompose based on a functional dependency. 679 00:22:13,870 --> 00:22:14,850 The only way we would decompose 680 00:22:15,640 --> 00:22:16,540 with "B" as the shared 681 00:22:16,890 --> 00:22:18,040 attribute were if have 682 00:22:18,100 --> 00:22:19,490 a functional dependency from B 683 00:22:19,700 --> 00:22:21,950 to A or B to C. And we don't. 684 00:22:22,330 --> 00:22:23,820 In both cases, 685 00:22:24,430 --> 00:22:25,480 there's two values of B that 686 00:22:25,620 --> 00:22:26,840 are the same where here the 687 00:22:26,990 --> 00:22:28,020 A values are not the same 688 00:22:28,410 --> 00:22:29,790 and here the C values are not the same. 689 00:22:30,430 --> 00:22:31,650 So neither of these functional dependencies 690 00:22:32,210 --> 00:22:32,990 hold N, B, C, and 691 00:22:33,130 --> 00:22:34,630 F would not perform this decompostion. 692 00:22:35,880 --> 00:22:36,790 So in fact B, C, and 693 00:22:36,790 --> 00:22:39,190 F only performs decompositions when 694 00:22:39,620 --> 00:22:42,070 we will get the property that they're joined back. 695 00:22:42,450 --> 00:22:43,780 Again that's called a lossless join. 696 00:22:44,970 --> 00:22:45,810 So, B, C, and F seems 697 00:22:46,040 --> 00:22:47,190 great, we just list all our 698 00:22:47,430 --> 00:22:48,440 attributes a few functional dependencies 699 00:22:49,230 --> 00:22:51,850 and the systems churns on and out comes a wonderful schema. 700 00:22:52,190 --> 00:22:54,180 And in general that actually is what happens. 701 00:22:54,580 --> 00:22:56,540 In our student example, that worked very well. 702 00:22:57,110 --> 00:22:59,090 There are, however, shortcomings of 703 00:22:59,300 --> 00:23:01,390 B C and F and we will discuss those in a later video