1
00:00:00,490 --> 00:00:01,700
Now that we've learned about functional dependencies,

2
00:00:02,390 --> 00:00:03,410
let's talk about how they're used

3
00:00:03,690 --> 00:00:06,050
to create relations that are in Boyce Codd Normal Form.

4
00:00:06,830 --> 00:00:08,870
Very quick reminder about relational design by decomposition.

5
00:00:10,240 --> 00:00:11,780
The database designer creates mega

6
00:00:12,170 --> 00:00:13,370
relations that contain all the

7
00:00:13,690 --> 00:00:14,800
information to be captured and specifies

8
00:00:15,350 --> 00:00:16,530
properties of the data to be captured.

9
00:00:17,470 --> 00:00:18,830
The system uses the properties to

10
00:00:18,950 --> 00:00:20,280
decompose the relations into smaller

11
00:00:20,690 --> 00:00:22,210
ones and those final decomposed

12
00:00:22,630 --> 00:00:24,700
relations satisfy what is known as a Normal Form.

13
00:00:25,270 --> 00:00:27,320
They don't have anomalies and they don't lose information.

14
00:00:28,500 --> 00:00:29,650
Functional dependencies are used to

15
00:00:29,920 --> 00:00:31,080
create relations in Boys

16
00:00:31,490 --> 00:00:32,910
Codd Normal Form and multi-doubt

17
00:00:33,330 --> 00:00:34,680
value dependencies are used to

18
00:00:34,820 --> 00:00:35,870
create relations in Fourth Normal

19
00:00:37,410 --> 00:00:37,410
Form.

20
00:00:39,240 --> 00:00:39,720
This video talks about the process of using

21
00:00:39,960 --> 00:00:42,530
functional dependencies to create relations in Boyce Codd Normal Form.

22
00:00:43,610 --> 00:00:44,720
Let's start by defining what

23
00:00:44,930 --> 00:00:47,130
it means to do a decomposition of a relational schema.

24
00:00:48,020 --> 00:00:49,010
Let's suppose we have a relation

25
00:00:49,280 --> 00:00:51,210
R with a set of attributes A-1 through A-N.

26
00:00:51,490 --> 00:00:54,860
We can decompose R into two relations.

27
00:00:55,360 --> 00:00:56,580
We'll call them R-1 and R-2

28
00:00:56,730 --> 00:00:58,910
such that R-1 has... I'll just

29
00:00:59,280 --> 00:01:01,150
label them B-1 through B-K

30
00:01:02,260 --> 00:01:05,090
and C-1 through C

31
00:01:05,590 --> 00:01:07,610
and let's say, let

32
00:01:07,760 --> 00:01:09,100
me use the notation for the

33
00:01:09,230 --> 00:01:10,450
list of attributes A, B

34
00:01:10,670 --> 00:01:12,020
and C that I've been using in other videos.

35
00:01:13,000 --> 00:01:14,590
So R1 and R2 are

36
00:01:14,710 --> 00:01:16,350
a decomposition of R.  First

37
00:01:16,650 --> 00:01:18,380
of all if the attributes

38
00:01:18,880 --> 00:01:20,730
that capture B union C

39
00:01:21,380 --> 00:01:22,230
are equal to the set of

40
00:01:22,320 --> 00:01:23,530
attributes we started with A.

41
00:01:23,730 --> 00:01:24,880
In other words recovering all of

42
00:01:24,990 --> 00:01:26,600
the attributes, and furthermore,

43
00:01:27,890 --> 00:01:29,560
this is the tricky part, R1

44
00:01:29,980 --> 00:01:32,520
natural join R2 equals

45
00:01:33,050 --> 00:01:34,450
R. So, let me draw this pictorially.

46
00:01:35,930 --> 00:01:37,880
So here's our relation R and

47
00:01:38,120 --> 00:01:40,810
all of our attributes together are the A attributes.

48
00:01:41,680 --> 00:01:44,110
And then we're going to decompose R into R1 and R2.

49
00:01:44,410 --> 00:01:45,730
So, let's say this first

50
00:01:45,980 --> 00:01:47,490
set of attributes here are

51
00:01:47,660 --> 00:01:49,560
the B attributes and the

52
00:01:50,280 --> 00:01:51,520
second bunch of attributes here

53
00:01:51,630 --> 00:01:53,650
with some overlap are the C attributes.

54
00:01:54,700 --> 00:01:57,080
So, now R-1 consists of

55
00:01:57,590 --> 00:01:59,700
this portion of R and

56
00:01:59,860 --> 00:02:02,270
the purple part here now is R2.

57
00:02:03,700 --> 00:02:05,190
So, clearly the B

58
00:02:05,660 --> 00:02:07,700
and C attributes are equal

59
00:02:08,010 --> 00:02:10,610
to the original attributes and then

60
00:02:10,800 --> 00:02:12,440
is the join of R-1 and

61
00:02:12,750 --> 00:02:14,950
R-2 giving us R. Now remember, all of this is logical.

62
00:02:15,510 --> 00:02:17,140
We don't have R itself

63
00:02:17,710 --> 00:02:18,590
and we don't have the

64
00:02:18,670 --> 00:02:19,620
data and we don't have R-1

65
00:02:19,800 --> 00:02:21,940
and R-2, so everything is being done at the schema level.

66
00:02:22,670 --> 00:02:24,200
And we will explore later how

67
00:02:24,460 --> 00:02:25,830
we can guarantee that this

68
00:02:26,030 --> 00:02:28,790
join does return R and not something else.

69
00:02:29,920 --> 00:02:30,790
Now just using a little bit

70
00:02:30,930 --> 00:02:32,320
more relational algebra here, let

71
00:02:32,450 --> 00:02:33,990
me mention that R-1 can be

72
00:02:34,120 --> 00:02:35,470
defined as the projection

73
00:02:36,180 --> 00:02:37,890
on the B attributes of R

74
00:02:38,750 --> 00:02:40,160
and then in purple, R-2

75
00:02:40,450 --> 00:02:42,350
is the projection of the

76
00:02:42,440 --> 00:02:43,800
C attributes of R.  So

77
00:02:43,930 --> 00:02:44,960
again all of this

78
00:02:45,200 --> 00:02:46,390
is logical, but the idea is

79
00:02:46,560 --> 00:02:48,440
that when we do the projection, if

80
00:02:48,630 --> 00:02:49,730
there are duplicates that are

81
00:02:49,900 --> 00:02:51,570
present simply because we have

82
00:02:51,910 --> 00:02:53,010
say, different values in the

83
00:02:53,090 --> 00:02:54,510
remaining attributes, those duplicates

84
00:02:54,800 --> 00:02:56,080
don't have to be retained in the projection.

85
00:02:57,250 --> 00:02:58,970
We saw in some

86
00:02:59,130 --> 00:03:00,660
of our examples in other videos

87
00:03:01,160 --> 00:03:02,860
where we had redundancy because we

88
00:03:02,930 --> 00:03:06,030
were capturing information multiple times that we didn't need to.

89
00:03:06,220 --> 00:03:06,900
And we're going to see that

90
00:03:07,030 --> 00:03:08,190
what Boyce Codd Normal Form really

91
00:03:08,400 --> 00:03:09,840
does is separate the relations

92
00:03:10,450 --> 00:03:13,200
so that we capture each piece of information exactly once.

93
00:03:13,770 --> 00:03:16,570
I know that's very abstract now, but when we see examples, we'll see how that works.

94
00:03:17,460 --> 00:03:18,360
So let's look at two

95
00:03:18,610 --> 00:03:20,270
possible decompositions of the

96
00:03:20,340 --> 00:03:22,430
student relation and see which ones are correct.

97
00:03:23,240 --> 00:03:24,250
So let's start with a

98
00:03:24,620 --> 00:03:26,510
decomposition where we take... we're

99
00:03:26,640 --> 00:03:29,460
gonna decompose student into S-1

100
00:03:29,520 --> 00:03:30,570
and S-2, and in S-1 we'll

101
00:03:31,180 --> 00:03:32,620
put the social security number,

102
00:03:33,190 --> 00:03:35,690
name, address, let me

103
00:03:35,810 --> 00:03:36,830
abbreviate a little bit here

104
00:03:37,530 --> 00:03:38,720
and the high school code but

105
00:03:38,850 --> 00:03:41,620
no more high school information, the GPA and the priority.

106
00:03:42,880 --> 00:03:44,650
And then in relation 2,

107
00:03:45,110 --> 00:03:46,280
we'll put the high school

108
00:03:46,500 --> 00:03:48,200
code and we'll put

109
00:03:48,400 --> 00:03:49,780
the high school name and the high school city.

110
00:03:50,200 --> 00:03:51,090
So you can see what I've

111
00:03:51,230 --> 00:03:52,480
done done here is I've separated out

112
00:03:52,690 --> 00:03:54,730
the high school information into a separate relation.

113
00:03:55,690 --> 00:03:56,900
So first of all, is this

114
00:03:57,100 --> 00:03:58,840
a correct decomposition in the

115
00:03:58,920 --> 00:04:00,430
sense that A union B

116
00:04:00,740 --> 00:04:02,070
equals C. Certainly all of

117
00:04:02,150 --> 00:04:03,100
the attributes are still present

118
00:04:04,010 --> 00:04:04,920
and furthermore, if you think

119
00:04:05,260 --> 00:04:06,170
about it and we'll formalize

120
00:04:06,650 --> 00:04:08,810
this concept later S1 join

121
00:04:09,220 --> 00:04:11,630
S2 and that's going to occur by

122
00:04:11,800 --> 00:04:13,430
the way based on this highest school code value.

123
00:04:14,260 --> 00:04:15,420
S1 join S2 in this

124
00:04:16,110 --> 00:04:17,520
case will, for the

125
00:04:17,630 --> 00:04:19,120
data that you would expect, equal student.

126
00:04:19,760 --> 00:04:21,480
Again, we'll formalize that momentarily.

127
00:04:22,720 --> 00:04:23,480
Now let's look at a second

128
00:04:23,630 --> 00:04:24,900
possible decomposition of the

129
00:04:24,940 --> 00:04:27,490
student relation again into two relations S1 and S2.

130
00:04:27,760 --> 00:04:30,250
In the first one we'll put the first bunch of attributes.

131
00:04:30,830 --> 00:04:31,830
So we'll put the social security

132
00:04:32,320 --> 00:04:33,980
number, the student's name,

133
00:04:34,370 --> 00:04:36,060
their address, their high school code.

134
00:04:36,910 --> 00:04:39,540
Let's say high school name and high school city.

135
00:04:40,370 --> 00:04:43,230
And then in the second relation, we'll put again the student name.

136
00:04:44,230 --> 00:04:45,150
We'll put the high school name

137
00:04:46,190 --> 00:04:47,520
and we'll put, say the

138
00:04:47,620 --> 00:04:49,470
GPA and lastly the priority.

139
00:04:50,750 --> 00:04:52,130
So again, is this a decomposition?

140
00:04:53,340 --> 00:04:54,670
Well, certainly again we have

141
00:04:54,860 --> 00:04:56,130
the case that the A union

142
00:04:56,600 --> 00:04:57,580
B equals C, in other

143
00:04:57,730 --> 00:04:58,910
words we've captured all of

144
00:04:58,990 --> 00:04:59,880
the attributes of the student

145
00:05:00,150 --> 00:05:01,370
relation in our decomposed relation

146
00:05:02,450 --> 00:05:03,600
and do we think it's

147
00:05:03,840 --> 00:05:05,330
the case that if we join S1 and

148
00:05:05,730 --> 00:05:06,970
S2 then we'll get

149
00:05:07,210 --> 00:05:08,690
the student relation back and I'll

150
00:05:08,870 --> 00:05:09,840
put a question mark here and

151
00:05:09,910 --> 00:05:11,350
you know, of course, the answer is going to be no.

152
00:05:12,190 --> 00:05:13,410
When we join back we'll

153
00:05:13,540 --> 00:05:14,540
be joining in this case

154
00:05:14,920 --> 00:05:16,150
on the student name here

155
00:05:17,110 --> 00:05:18,950
and the high school name and

156
00:05:19,170 --> 00:05:20,160
likely these are not

157
00:05:20,510 --> 00:05:21,710
unique values so when we

158
00:05:21,810 --> 00:05:22,870
join back, we may be

159
00:05:23,190 --> 00:05:25,590
getting information together that doesn't really belong together.

160
00:05:26,070 --> 00:05:27,330
And, again, we'll be formalizing

161
00:05:27,800 --> 00:05:29,700
that and seeing additional examples momentarily.

162
00:05:31,330 --> 00:05:33,840
So now let's dig a little further into the actual process of decomposition.

163
00:05:35,290 --> 00:05:37,370
So, first of all we definitely want good decomposition.

164
00:05:38,270 --> 00:05:39,220
So, as we saw a good

165
00:05:39,440 --> 00:05:42,660
decomposition must capture all of the attributes of course.

166
00:05:42,950 --> 00:05:44,440
But the more important property is

167
00:05:44,780 --> 00:05:46,300
that this reassembly by the

168
00:05:46,390 --> 00:05:48,660
join produces the original relation.

169
00:05:49,980 --> 00:05:51,250
Sometimes that's called, by

170
00:05:51,410 --> 00:05:53,280
the way, a lossless join property.

171
00:05:54,340 --> 00:05:55,400
But the second thing that we

172
00:05:55,500 --> 00:05:56,670
want is not only that

173
00:05:56,850 --> 00:05:58,760
we have a good decomposition but

174
00:05:58,950 --> 00:06:00,620
that the relations that we

175
00:06:00,720 --> 00:06:02,430
decompose into are good relations.

176
00:06:03,510 --> 00:06:04,760
And those relations are going to

177
00:06:04,870 --> 00:06:06,540
be the ones that are in Boyce-Codd Normal Form.

178
00:06:07,180 --> 00:06:08,270
So let me first define

179
00:06:08,980 --> 00:06:10,500
formally Boyce-Codd Normal Form

180
00:06:11,070 --> 00:06:12,090
and then we'll go back to figure

181
00:06:12,450 --> 00:06:14,610
out an algorithm for automatically decomposing

182
00:06:15,300 --> 00:06:16,610
relations using good decompositions

183
00:06:17,750 --> 00:06:20,210
into decomposed relations that are in Boyce-Codd Normal Form.

184
00:06:21,210 --> 00:06:22,680
So here's the formal definition of

185
00:06:22,750 --> 00:06:23,900
when a relation is in Boyce-Codd

186
00:06:24,290 --> 00:06:25,800
Normal Form, usually abbreviated B,

187
00:06:25,940 --> 00:06:28,330
C and F.  A

188
00:06:28,440 --> 00:06:29,900
relation R with functional dependencies is in

189
00:06:30,200 --> 00:06:31,670
Boyce-Codd Normal Form if every

190
00:06:32,120 --> 00:06:33,820
functional dependencies is such

191
00:06:34,140 --> 00:06:36,550
that it's left-hand side is a key, ok.

192
00:06:37,230 --> 00:06:38,380
Let's see what happens when it's

193
00:06:38,560 --> 00:06:39,510
not the case that the

194
00:06:39,570 --> 00:06:40,520
left hand side of a functional

195
00:06:40,680 --> 00:06:41,320
dependency is not the key

196
00:06:41,590 --> 00:06:43,060
and we'll see why that's a bad design.

197
00:06:44,390 --> 00:06:45,760
So here's our relation R and

198
00:06:46,070 --> 00:06:47,460
here's a set attributes A on

199
00:06:47,620 --> 00:06:48,300
the left side of the functional

200
00:06:48,640 --> 00:06:50,690
dependency, attribute B and the rest.

201
00:06:51,820 --> 00:06:52,730
And let's just put in some values.

202
00:06:53,210 --> 00:06:54,480
So let's suppose that we have

203
00:06:54,980 --> 00:06:57,320
two tuples here with the same A value.

204
00:06:58,380 --> 00:06:59,500
Then by our functional dependency,

205
00:07:00,000 --> 00:07:01,030
we're going to have the same B

206
00:07:01,280 --> 00:07:02,840
value and the rest can be anything.

207
00:07:04,060 --> 00:07:05,290
What has happened here is

208
00:07:05,400 --> 00:07:06,620
that we've captured the piece

209
00:07:07,320 --> 00:07:09,960
of information the connection between A and B twice.

210
00:07:11,270 --> 00:07:12,730
And the reason that's allowed to

211
00:07:12,980 --> 00:07:14,780
happen is because A is not a key.

212
00:07:15,670 --> 00:07:16,610
if A were a key, we

213
00:07:16,750 --> 00:07:17,830
would not be allowed to

214
00:07:17,980 --> 00:07:19,120
have the situation where we have

215
00:07:19,380 --> 00:07:21,020
these two tuples both present in the relation.

216
00:07:21,910 --> 00:07:24,350
So this relation is not in voice cod normal form.

217
00:07:24,990 --> 00:07:26,610
And this functional dependency here

218
00:07:26,830 --> 00:07:28,600
is what we would call a B C and F violation.

219
00:07:29,980 --> 00:07:31,250
That violation is causing

220
00:07:31,730 --> 00:07:33,560
us to have redundancy in our

221
00:07:33,700 --> 00:07:35,290
relation and that also

222
00:07:35,860 --> 00:07:36,970
give us as we've seen the

223
00:07:37,170 --> 00:07:38,780
update anomalies and deletion anomalies.

224
00:07:38,970 --> 00:07:40,910
Let me clarify a

225
00:07:41,010 --> 00:07:42,100
little bit the requirement that the

226
00:07:42,170 --> 00:07:43,450
left-hand side of functional dependencies

227
00:07:44,040 --> 00:07:45,230
have to be key, so that's

228
00:07:45,590 --> 00:07:47,320
what tells us we're in Boyce-Codd normal form.

229
00:07:47,810 --> 00:07:48,930
Now I'm not saying that the

230
00:07:49,000 --> 00:07:50,020
left-hand side of every functional

231
00:07:50,320 --> 00:07:51,140
dependency has to be declared

232
00:07:51,790 --> 00:07:52,930
as the primary key for a

233
00:07:53,180 --> 00:07:55,000
relation, only that it is, in fact, a key.

234
00:07:55,540 --> 00:07:56,550
And, as you might recall, the definition

235
00:07:57,470 --> 00:07:58,570
of a key is an

236
00:07:58,830 --> 00:08:00,300
attribute that determines all other

237
00:08:00,610 --> 00:08:02,410
attributes, if you're thinking about functional

238
00:08:02,730 --> 00:08:04,070
dependencies, or if you

239
00:08:04,150 --> 00:08:05,140
don't have any duplicates in your

240
00:08:05,240 --> 00:08:06,590
relation, then a key is a

241
00:08:06,710 --> 00:08:08,570
value that is never duplicated across tacets.

242
00:08:09,650 --> 00:08:10,580
So if you think about it for

243
00:08:10,680 --> 00:08:11,800
a second, you'll realize that whenever

244
00:08:12,590 --> 00:08:13,710
a set of attributes is a

245
00:08:13,760 --> 00:08:15,070
key, so is any

246
00:08:15,300 --> 00:08:17,150
superset of those attributes.

247
00:08:17,590 --> 00:08:18,260
So if "A" is a key,

248
00:08:18,550 --> 00:08:21,170
then so is ac and so is a,c,d and so on.

249
00:08:21,650 --> 00:08:22,810
So sometimes you'll see in

250
00:08:22,910 --> 00:08:24,090
the definition of Boyce Codd normal

251
00:08:24,540 --> 00:08:26,190
form this wording not

252
00:08:26,870 --> 00:08:28,220
is a key but will be

253
00:08:28,590 --> 00:08:29,960
contains a key which in

254
00:08:30,250 --> 00:08:31,330
fact is exactly the same

255
00:08:31,770 --> 00:08:33,420
thing or sometimes it

256
00:08:33,490 --> 00:08:34,480
will even say is a

257
00:08:34,820 --> 00:08:35,970
super key, and a super

258
00:08:36,350 --> 00:08:39,330
key is a key or a super set of a key.

259
00:08:39,880 --> 00:08:41,130
Again, all of those are

260
00:08:41,420 --> 00:08:42,460
saying exactly the same thing,

261
00:08:42,690 --> 00:08:43,960
but I just wanted to clarify because

262
00:08:44,180 --> 00:08:45,420
different wording and sometimes different

263
00:08:45,670 --> 00:08:46,950
notation is used for that concept.

264
00:08:48,100 --> 00:08:49,210
So far, things have been pretty abstract.

265
00:08:49,680 --> 00:08:51,140
Let's try to get a bit more concrete here.

266
00:08:51,600 --> 00:08:52,930
Let's look at two examples and

267
00:08:53,120 --> 00:08:55,010
determine if those examples are in "B", "C", and "F".

268
00:08:55,850 --> 00:08:56,980
Remember to determine is something

269
00:08:57,190 --> 00:08:58,060
is in B, C, and F we

270
00:08:58,150 --> 00:09:00,530
need the relational schema and a set of functional dependencies.

271
00:09:01,570 --> 00:09:02,690
So here we have our student

272
00:09:03,050 --> 00:09:04,050
relation and this is a

273
00:09:04,100 --> 00:09:05,390
set of functional dependencies we had

274
00:09:05,600 --> 00:09:07,280
in earlier examples, where the

275
00:09:07,490 --> 00:09:10,260
social security number is determining the name, address, and GPA.

276
00:09:10,800 --> 00:09:11,430
That means that if there's two

277
00:09:11,720 --> 00:09:13,020
tuples with the same social

278
00:09:13,270 --> 00:09:15,680
security number they will have the same name, address, and GPA.

279
00:09:16,250 --> 00:09:19,250
That's the same student and they only live in one place.

280
00:09:19,780 --> 00:09:21,840
The GPA determines the

281
00:09:21,940 --> 00:09:23,050
priority, so any two

282
00:09:23,330 --> 00:09:24,390
students with the same GPA will

283
00:09:24,690 --> 00:09:26,560
have the same priority and, finally,

284
00:09:26,650 --> 00:09:28,720
the high school code determines the high school name and city.

285
00:09:29,250 --> 00:09:30,290
So the high school is a

286
00:09:30,340 --> 00:09:32,240
unique identifier for a

287
00:09:32,350 --> 00:09:33,800
particular high school in a city.

288
00:09:34,490 --> 00:09:35,200
So those are our three functional

289
00:09:35,530 --> 00:09:37,220
dependencies in order to

290
00:09:37,400 --> 00:09:38,570
test whether this is relation

291
00:09:38,850 --> 00:09:39,930
is in normal form with

292
00:09:40,070 --> 00:09:41,000
respect to the functional dependencies

293
00:09:41,570 --> 00:09:42,380
we need to know what the

294
00:09:42,440 --> 00:09:43,640
key of the relation is

295
00:09:43,850 --> 00:09:44,900
or the set of keys of the

296
00:09:45,010 --> 00:09:46,700
relation and we worked

297
00:09:47,010 --> 00:09:48,130
on this in an earlier video

298
00:09:48,480 --> 00:09:50,420
using the closure idea, so

299
00:09:50,580 --> 00:09:51,950
I'll just remind you now,

300
00:09:52,570 --> 00:09:54,260
that for this relation, this

301
00:09:54,430 --> 00:09:56,210
set of functional dependencies, there's one

302
00:09:56,540 --> 00:09:57,950
key or one minimal key

303
00:09:58,600 --> 00:09:59,520
and that's the social security

304
00:10:00,020 --> 00:10:01,040
number together with the high

305
00:10:01,210 --> 00:10:03,230
school code, those two attributes

306
00:10:03,690 --> 00:10:05,450
do functionally determine all other

307
00:10:05,720 --> 00:10:06,890
attributes in the relation and, therefore,

308
00:10:07,270 --> 00:10:09,020
they are together, forming a key.

309
00:10:10,200 --> 00:10:11,070
So now, to check if we're

310
00:10:11,220 --> 00:10:12,390
in Boyce-Codd Normal Form, we

311
00:10:12,480 --> 00:10:14,180
have to ask the question, "Does

312
00:10:14,720 --> 00:10:16,330
every functional dependency have

313
00:10:16,610 --> 00:10:18,010
a key on its left-hand side?"

314
00:10:18,600 --> 00:10:19,970
and the answer, of course,

315
00:10:20,620 --> 00:10:23,280
is no, not all.

316
00:10:23,640 --> 00:10:25,080
In fact, the reality is

317
00:10:25,400 --> 00:10:27,230
that no functional dependency, in

318
00:10:27,310 --> 00:10:28,900
this case, has the key on the left hand side.

319
00:10:29,460 --> 00:10:30,700
We have three left hand sides

320
00:10:30,990 --> 00:10:33,880
and no of them have or contain our one key.

321
00:10:34,560 --> 00:10:35,520
If you've given any thought at

322
00:10:35,610 --> 00:10:38,360
all to this database design, you will see that it's not a good one.

323
00:10:38,590 --> 00:10:40,100
It's combining too much information in

324
00:10:40,200 --> 00:10:41,360
one place, which is our

325
00:10:41,460 --> 00:10:42,470
basic idea, that we start

326
00:10:42,720 --> 00:10:43,970
with a mega relation and break it down.

327
00:10:44,310 --> 00:10:45,270
And so what we're going to

328
00:10:45,410 --> 00:10:46,990
do is use these functional dependencies,

329
00:10:47,680 --> 00:10:49,530
and specifically the fact that

330
00:10:49,650 --> 00:10:50,910
those are BCNF or Boyce

331
00:10:51,260 --> 00:10:52,980
Codd Normal Form violations, to break

332
00:10:53,340 --> 00:10:55,480
this relation down into one that is a better design.

333
00:10:56,620 --> 00:10:57,240
Now let's look at a second

334
00:10:57,580 --> 00:10:59,220
example, our apply relation to

335
00:10:59,290 --> 00:11:01,020
see if this one is in Boyce Codd Normal Form.

336
00:11:01,890 --> 00:11:02,770
So in this case as a

337
00:11:02,820 --> 00:11:03,810
reminder, we have social security

338
00:11:04,180 --> 00:11:06,390
number, college, state, date and major.

339
00:11:06,720 --> 00:11:07,490
So the date is the date

340
00:11:07,690 --> 00:11:08,770
of application; the major is

341
00:11:08,910 --> 00:11:09,780
major the student is applying

342
00:11:10,360 --> 00:11:12,050
for at that particular college and

343
00:11:12,140 --> 00:11:14,070
we'll have one functional dependency which

344
00:11:14,230 --> 00:11:15,580
effectively says in English that

345
00:11:15,730 --> 00:11:17,300
each student may apply to

346
00:11:17,760 --> 00:11:20,290
each college only once and for one major.

347
00:11:21,390 --> 00:11:22,720
Now let's compute the key for

348
00:11:22,860 --> 00:11:23,850
this relation, or keys for

349
00:11:23,980 --> 00:11:25,360
this relation based on the functional dependency.

350
00:11:26,210 --> 00:11:27,650
Well, it's pretty straightforward that these

351
00:11:28,030 --> 00:11:29,190
three attributes form a key

352
00:11:29,770 --> 00:11:31,000
because they determine the other

353
00:11:31,240 --> 00:11:32,540
attributes in the relation and therefore

354
00:11:32,800 --> 00:11:34,390
they determine all the attributes of the relation.

355
00:11:35,530 --> 00:11:36,730
Furthermore, we can see

356
00:11:36,980 --> 00:11:38,000
that our one and only functional

357
00:11:38,330 --> 00:11:39,830
dependency obviously has a

358
00:11:39,870 --> 00:11:40,900
key on its left hand side

359
00:11:41,520 --> 00:11:43,220
and so so this relation is in

360
00:11:43,350 --> 00:11:44,850
fact already in Boyce-Codd normal

361
00:11:45,320 --> 00:11:46,320
form and we'll see there's

362
00:11:46,530 --> 00:11:47,960
no way to decompose this

363
00:11:48,130 --> 00:11:49,680
relation further into a better design.

364
00:11:50,870 --> 00:11:51,790
So here we are again, talking

365
00:11:52,110 --> 00:11:53,910
about our decomposition process, so

366
00:11:54,020 --> 00:11:55,630
now we know what good relations are.

367
00:11:56,050 --> 00:11:56,880
They are in BCNF.

368
00:11:57,650 --> 00:11:59,080
And we saw earlier what good

369
00:11:59,480 --> 00:12:01,000
decompositions are, so now

370
00:12:01,410 --> 00:12:02,300
we're going to give an algorithm

371
00:12:02,980 --> 00:12:04,440
that's going to perform good decompositions

372
00:12:05,620 --> 00:12:06,960
and those decompositions are going to

373
00:12:07,210 --> 00:12:10,040
yield decomposed relations that are in Boyce Codd normal form.

374
00:12:11,160 --> 00:12:11,860
So here's the algorithm.

375
00:12:12,340 --> 00:12:13,130
The input is a relation

376
00:12:13,670 --> 00:12:14,630
r and the set of

377
00:12:14,710 --> 00:12:15,850
functional dependencies for that relation,

378
00:12:16,430 --> 00:12:17,490
and the output is going

379
00:12:17,740 --> 00:12:20,160
to be our good decomposition into good relations.

380
00:12:21,480 --> 00:12:22,930
So let's just go through it step by step.

381
00:12:23,330 --> 00:12:24,240
The first thing we're going to

382
00:12:24,330 --> 00:12:25,700
do is compute keys from

383
00:12:25,910 --> 00:12:26,650
r and were going to

384
00:12:26,750 --> 00:12:28,430
do that using functional dependencies as

385
00:12:28,580 --> 00:12:29,750
we have seen, and we saw

386
00:12:29,920 --> 00:12:32,070
in the actual algorithm for doing that in a previous video.

387
00:12:33,040 --> 00:12:34,030
Then we are going to start doing

388
00:12:34,250 --> 00:12:36,290
the decomposition process and we're

389
00:12:36,660 --> 00:12:37,960
going to repeat the decomposition

390
00:12:38,320 --> 00:12:39,770
until all of the relations are in BCNF.

391
00:12:40,410 --> 00:12:41,540
though we're going to take r and

392
00:12:41,780 --> 00:12:42,910
break it up into smaller relations

393
00:12:43,410 --> 00:12:44,860
and we might further break those

394
00:12:45,430 --> 00:12:46,610
into smaller relations, and so

395
00:12:46,870 --> 00:12:48,500
on, until every relation is good.

396
00:12:49,280 --> 00:12:50,670
So the breaking down process says

397
00:12:50,940 --> 00:12:52,480
pick a relation that's bad.

398
00:12:52,840 --> 00:12:53,540
So we're going to pick a relation,

399
00:12:54,050 --> 00:12:55,340
r prime in our current set,

400
00:12:55,600 --> 00:12:57,430
and again, we're starting with only r in our set.

401
00:12:58,550 --> 00:12:59,990
And we're going define a situation

402
00:13:00,610 --> 00:13:02,280
where that relation has a functional

403
00:13:02,700 --> 00:13:03,790
dependency and I guess,

404
00:13:04,110 --> 00:13:05,330
technically speaking, this would be

405
00:13:05,530 --> 00:13:07,180
with lines on top, that

406
00:13:07,380 --> 00:13:08,800
violates Boyce-Codd Normal Form

407
00:13:09,360 --> 00:13:10,750
and that violation is what's going

408
00:13:10,950 --> 00:13:12,070
to guide us to do the

409
00:13:12,160 --> 00:13:13,700
decomposition into better relations.

410
00:13:14,700 --> 00:13:16,550
So our decomposition then,in the

411
00:13:16,930 --> 00:13:18,160
second step, is going take our

412
00:13:18,380 --> 00:13:19,520
prime and and it's going

413
00:13:19,740 --> 00:13:21,160
to put the attributes involved

414
00:13:21,860 --> 00:13:23,260
in the functional dependency into one

415
00:13:23,540 --> 00:13:24,980
relation, and then it's

416
00:13:25,110 --> 00:13:26,540
going to keep the left-hand side

417
00:13:26,750 --> 00:13:28,310
of the functional dependency and put

418
00:13:28,580 --> 00:13:30,000
the rest of the attributes along

419
00:13:30,200 --> 00:13:32,120
with that left-hand side into a separate relation.

420
00:13:32,760 --> 00:13:33,810
So let's draw this as a picture.

421
00:13:34,660 --> 00:13:35,860
So here's our relation, r prime

422
00:13:36,230 --> 00:13:37,340
and of course the first time through

423
00:13:37,510 --> 00:13:38,910
our loop, that relation would have

424
00:13:39,090 --> 00:13:40,820
to be r self, and then

425
00:13:41,080 --> 00:13:42,270
because we have a functional

426
00:13:42,520 --> 00:13:43,770
dependency from A to

427
00:13:43,890 --> 00:13:44,920
B and A is not

428
00:13:45,310 --> 00:13:46,240
a key - that means

429
00:13:46,520 --> 00:13:48,020
it's BCNF violation  - we're

430
00:13:48,160 --> 00:13:50,400
going to decompose this into R1 and R2.

431
00:13:50,680 --> 00:13:52,280
So R1 will contain

432
00:13:53,200 --> 00:13:54,500
the attributes in the functional dependency.

433
00:13:55,270 --> 00:13:56,290
R2 will contain the left-hand

434
00:13:56,840 --> 00:13:58,920
side of the functional dependency and the rest.

435
00:13:59,760 --> 00:14:00,990
We can see clearly that this

436
00:14:01,220 --> 00:14:02,540
is a decomposition that keeps all

437
00:14:02,820 --> 00:14:03,850
attributes, and what we'll

438
00:14:04,060 --> 00:14:05,060
see soon is that this is

439
00:14:05,160 --> 00:14:06,180
also a good one in

440
00:14:06,350 --> 00:14:07,760
that logically the join of

441
00:14:07,890 --> 00:14:09,490
these two relations is guaranteed

442
00:14:09,880 --> 00:14:11,120
to give us back what we had originally.

443
00:14:12,390 --> 00:14:13,990
Now the remaining two lines of

444
00:14:14,080 --> 00:14:16,160
the algorithm compute after the

445
00:14:16,230 --> 00:14:18,150
decomposition the set of

446
00:14:18,340 --> 00:14:19,480
functional dependencies for the decomposed

447
00:14:20,120 --> 00:14:21,560
relations and then the keys for those.

448
00:14:22,460 --> 00:14:23,550
I'm going to come back to this

449
00:14:23,860 --> 00:14:26,060
particular line here after doing an example.

450
00:14:27,310 --> 00:14:28,840
This is our same example, although squished

451
00:14:29,360 --> 00:14:30,450
to give me more space to write.

452
00:14:31,240 --> 00:14:32,250
And, as a reminder, in this

453
00:14:32,440 --> 00:14:33,650
example, we've computed a couple

454
00:14:33,910 --> 00:14:35,580
of the times that the key, given

455
00:14:35,920 --> 00:14:37,540
the functional dependencies, is the

456
00:14:37,640 --> 00:14:38,570
combination of the social

457
00:14:38,920 --> 00:14:40,240
security number and the high school code.

458
00:14:40,930 --> 00:14:41,800
So our goal is to take

459
00:14:42,350 --> 00:14:44,150
the student relation and iteratively break

460
00:14:44,420 --> 00:14:45,920
it down into smaller relations until

461
00:14:46,250 --> 00:14:48,420
all of the relations are in Boyce-Codd normal form.

462
00:14:48,800 --> 00:14:50,170
So let's start the iterative process.

463
00:14:51,170 --> 00:14:52,770
We pick some functional dependency

464
00:14:53,050 --> 00:14:54,610
that violates Boyce-Codd normal form

465
00:14:54,790 --> 00:14:56,200
and we use it guide our decomposition.

466
00:14:57,570 --> 00:14:58,760
So all three of these

467
00:14:59,000 --> 00:15:01,060
functional dependencies actually violate Boyce-Codd

468
00:15:01,660 --> 00:15:03,880
normal form because none of them have a key on the left-hand side.

469
00:15:04,100 --> 00:15:06,390
So let's start with the high school code one.

470
00:15:07,090 --> 00:15:08,760
So, to do the decomposition,

471
00:15:09,410 --> 00:15:10,620
based on this violating functional

472
00:15:10,970 --> 00:15:12,560
dependency, we're going to create two relations.

473
00:15:13,440 --> 00:15:14,520
The first one is going to

474
00:15:14,610 --> 00:15:16,140
contain just the three attributes

475
00:15:16,550 --> 00:15:18,050
that are in the functional dependency itself,

476
00:15:18,610 --> 00:15:20,890
so it's high school code, high school name and high school city.

477
00:15:21,960 --> 00:15:23,090
And the second one is going

478
00:15:23,360 --> 00:15:25,520
to have all remaining attributes in

479
00:15:25,730 --> 00:15:27,120
the relation, plus the left

480
00:15:27,440 --> 00:15:28,540
hand side, so we'll have

481
00:15:28,750 --> 00:15:31,530
the social security number, the name, the address.

482
00:15:32,640 --> 00:15:33,590
We will have the high school

483
00:15:33,830 --> 00:15:34,910
code because it's on the left-hand

484
00:15:35,190 --> 00:15:36,410
side of the functional dependency we're using,

485
00:15:36,710 --> 00:15:37,580
but we won't have the name

486
00:15:37,810 --> 00:15:38,530
and city from the right side

487
00:15:38,550 --> 00:15:40,890
hand side and I'll have a GPA and the priority.

488
00:15:42,370 --> 00:15:43,200
Now at this point our algorithm

489
00:15:43,670 --> 00:15:46,280
computes the functional dependencies for the decomposed those relations.

490
00:15:46,820 --> 00:15:48,200
For this particular example, we

491
00:15:48,310 --> 00:15:49,330
can just see what they

492
00:15:49,680 --> 00:15:51,040
are, they're the same functional dependencies

493
00:15:51,610 --> 00:15:53,430
that we had for the non-decomposed relation.

494
00:15:54,400 --> 00:15:55,130
Sometimes there's a little bit of

495
00:15:55,200 --> 00:15:57,420
computation you have to do and I'm going to talk about that in a bit.

496
00:15:57,920 --> 00:15:58,860
But, in this case, we can

497
00:15:59,050 --> 00:16:00,490
see, for example, for our relation

498
00:16:01,140 --> 00:16:02,460
S1, the only functional

499
00:16:02,800 --> 00:16:04,950
dependency is this functional dependency here.

500
00:16:05,720 --> 00:16:06,920
That tells us that high

501
00:16:07,150 --> 00:16:09,150
school code here is a key for S1.

502
00:16:09,400 --> 00:16:11,010
So our only functional

503
00:16:11,380 --> 00:16:12,540
dependency has a key on

504
00:16:12,670 --> 00:16:14,070
the left-hand side and that

505
00:16:14,330 --> 00:16:15,280
tells us that this relation

506
00:16:15,810 --> 00:16:16,880
is in Boyce-Codd normal form.

507
00:16:17,060 --> 00:16:19,450
So we're done with that one, but we're not done with S2.

508
00:16:20,680 --> 00:16:22,000
So for S2 the key is

509
00:16:22,290 --> 00:16:23,750
still the social security number and

510
00:16:23,890 --> 00:16:25,420
high school code together, so we

511
00:16:25,560 --> 00:16:26,840
still have these two functional dependencies

512
00:16:27,470 --> 00:16:28,960
that are Boyce-Codd normal form violations.

513
00:16:29,990 --> 00:16:31,650
So let's take the GPA priority

514
00:16:32,200 --> 00:16:35,120
one and let's guide that to decompose S3 further.

515
00:16:36,250 --> 00:16:37,820
We'll decompose S2 into S3

516
00:16:38,130 --> 00:16:39,490
and S4.

517
00:16:39,750 --> 00:16:41,600
S3 will contain the GPA and

518
00:16:41,940 --> 00:16:43,940
priority from the functional

519
00:16:44,250 --> 00:16:45,610
dependency we're using, and then

520
00:16:45,930 --> 00:16:46,980
S4 will take the remaining

521
00:16:47,770 --> 00:16:49,540
attributes in S-2 together

522
00:16:50,010 --> 00:16:51,500
with the left hand side of the GPA.

523
00:16:51,760 --> 00:16:52,820
So, we'll keep our social security

524
00:16:53,190 --> 00:16:56,170
number, name, address, high

525
00:16:56,450 --> 00:16:59,730
school code and GPA, but we don't keep the priority.

526
00:17:01,150 --> 00:17:02,580
So at this point, S-2 is

527
00:17:02,810 --> 00:17:05,890
completely gone and let's take a look at the remaining relations.

528
00:17:07,030 --> 00:17:08,040
S-3 now just has one

529
00:17:08,340 --> 00:17:10,290
functional dependency that applies and

530
00:17:10,390 --> 00:17:12,180
the left-hand side is now a key.

531
00:17:12,910 --> 00:17:14,070
And so now we're done with S-2.

532
00:17:14,450 --> 00:17:15,390
It's in Boyce Codd Normal

533
00:17:15,820 --> 00:17:16,780
Form, but we're not done,

534
00:17:17,000 --> 00:17:18,410
I'm sorry, we're done with S-3,

535
00:17:18,600 --> 00:17:21,710
but we're not done yet with S-4.

536
00:17:21,820 --> 00:17:23,320
S-4 still has social security and

537
00:17:23,510 --> 00:17:24,510
high school code as its

538
00:17:24,690 --> 00:17:25,470
key, and so we still

539
00:17:25,700 --> 00:17:28,650
have a violating functional dependency, so let's decompose S-4 further.

540
00:17:30,000 --> 00:17:31,920
We decompose into S-5 and S-6.

541
00:17:33,150 --> 00:17:34,790
S-5 contains the attributes in

542
00:17:35,170 --> 00:17:36,270
the functional dependency that we're using

543
00:17:36,590 --> 00:17:37,610
now, so it's the social security

544
00:17:37,990 --> 00:17:40,040
number, name, address and GPA

545
00:17:41,220 --> 00:17:42,360
and then as 6 contains

546
00:17:42,730 --> 00:17:44,640
the remaining attributes plus the

547
00:17:44,710 --> 00:17:45,650
left hand side, so that's the social

548
00:17:45,970 --> 00:17:47,930
security number and the high school code.

549
00:17:49,370 --> 00:17:50,390
And I will just tell you

550
00:17:50,620 --> 00:17:51,500
right now because you might

551
00:17:51,730 --> 00:17:52,550
be getting bored with this example,

552
00:17:53,110 --> 00:17:54,510
we're done with S-4 S5

553
00:17:55,050 --> 00:17:56,670
and S6 are now both

554
00:17:56,780 --> 00:17:58,980
in Boyce Codd normal form.

555
00:17:59,690 --> 00:18:01,440
So this is our final schema.

556
00:18:02,050 --> 00:18:04,170
It contains 4 relations, S1,

557
00:18:04,780 --> 00:18:05,930
with the information about high schools,

558
00:18:06,820 --> 00:18:08,380
S3 with the information

559
00:18:08,650 --> 00:18:10,250
about GPA's and priorities, S5

560
00:18:10,900 --> 00:18:11,900
has student with their name,

561
00:18:12,250 --> 00:18:13,550
address and GPA and S6,

562
00:18:13,740 --> 00:18:15,550
a student with the high school they went to.

563
00:18:15,710 --> 00:18:16,730
And, if you think about

564
00:18:16,940 --> 00:18:18,310
it, this really is a good

565
00:18:18,610 --> 00:18:20,010
schema design and it's what's

566
00:18:20,200 --> 00:18:21,330
produced automatically by the

567
00:18:21,670 --> 00:18:24,620
BCNF decomposition algorithm, using our functional dependencies.

568
00:18:25,540 --> 00:18:27,460
Let me mention a few more things about the algorithm.

569
00:18:28,160 --> 00:18:29,110
First of all, I left this

570
00:18:29,320 --> 00:18:30,420
step kind of mysterious, which

571
00:18:30,540 --> 00:18:33,560
is how we compute the functional dependencies for the decomposed relations.

572
00:18:34,700 --> 00:18:35,900
We can't just take the functional

573
00:18:36,180 --> 00:18:37,110
dependencies we already have for the

574
00:18:37,210 --> 00:18:38,840
bigger relation, and throw away

575
00:18:39,030 --> 00:18:40,210
the ones that don't apply exclusively

576
00:18:40,930 --> 00:18:41,680
to one or the other of the

577
00:18:41,990 --> 00:18:43,560
decomposed, we actually have to

578
00:18:43,640 --> 00:18:45,090
compute the functional dependencies that

579
00:18:45,220 --> 00:18:47,450
are implied and that apply to these relations.

580
00:18:48,290 --> 00:18:49,350
So in the video on

581
00:18:49,650 --> 00:18:50,870
functional dependencies, we learned all

582
00:18:51,100 --> 00:18:52,460
of implied functional dependencies,

583
00:18:53,110 --> 00:18:53,990
and we would use the closure

584
00:18:55,190 --> 00:18:56,270
as we did in that video

585
00:18:56,590 --> 00:18:58,250
to determine the implied functional dependencies.

586
00:18:58,940 --> 00:18:59,860
Now the reality is, for many

587
00:19:00,160 --> 00:19:01,380
examples, it's pretty obvious,

588
00:19:02,240 --> 00:19:03,670
we saw in the previous example it is.

589
00:19:04,300 --> 00:19:05,120
And, the other thing is, that

590
00:19:05,280 --> 00:19:06,350
this is being done by a computer

591
00:19:06,670 --> 00:19:07,620
so, we don't actually have to

592
00:19:07,790 --> 00:19:09,970
worry about it except when we're doing exercises for our class.

593
00:19:11,670 --> 00:19:14,290
Second, let me mention that there is little nondeterminism in this algorithm.

594
00:19:14,750 --> 00:19:16,080
It says "pick any of

595
00:19:16,250 --> 00:19:17,040
our relations with a violating

596
00:19:17,650 --> 00:19:19,060
functional dependency and use that

597
00:19:19,190 --> 00:19:20,850
to guide the next decomposition step".

598
00:19:21,690 --> 00:19:22,420
So, the fact is, that you

599
00:19:22,480 --> 00:19:24,460
can get a different answer, depending

600
00:19:24,830 --> 00:19:26,580
on which one you choose at this point in time.

601
00:19:27,270 --> 00:19:28,620
All of the results will be

602
00:19:28,860 --> 00:19:31,070
BCNF decomposition but they might not be the same one.

603
00:19:31,230 --> 00:19:32,080
And, in fact, if you go

604
00:19:32,210 --> 00:19:33,250
back to the example that I

605
00:19:33,390 --> 00:19:34,480
did, and you pick the

606
00:19:34,590 --> 00:19:35,600
functional dependencies in a different

607
00:19:35,930 --> 00:19:37,870
order, you might get a different final schema.

608
00:19:38,160 --> 00:19:39,620
But, again, it will be in BCNF.

609
00:19:40,770 --> 00:19:42,540
And lastly, some presentations of

610
00:19:42,610 --> 00:19:44,400
the BCNF decomposition algorithm actually

611
00:19:44,490 --> 00:19:46,180
have another step here, which

612
00:19:46,380 --> 00:19:49,550
is to extend the functional dependency that is used for the decomposition.

613
00:19:50,480 --> 00:19:51,810
So we're using A to

614
00:19:51,960 --> 00:19:53,050
B but if we have

615
00:19:53,290 --> 00:19:54,020
A to B we also have

616
00:19:54,470 --> 00:19:55,340
A to B and we can add

617
00:19:55,590 --> 00:19:56,720
more attributes, those in the

618
00:19:56,860 --> 00:19:58,410
closure of A. If you

619
00:19:58,470 --> 00:19:59,830
remember the closure and that's

620
00:20:00,050 --> 00:20:01,540
also a correct functional dependency.

621
00:20:02,240 --> 00:20:04,350
By doing this extension, we

622
00:20:04,440 --> 00:20:06,210
will still get a correct BCNF answer,

623
00:20:06,840 --> 00:20:07,700
but we'll tend to get relations

624
00:20:08,180 --> 00:20:09,280
that are larger than the ones

625
00:20:09,450 --> 00:20:11,100
we get if we don't do the extension first.

626
00:20:12,290 --> 00:20:14,030
In some cases, larger relations are

627
00:20:14,130 --> 00:20:15,210
better because you don't need

628
00:20:15,470 --> 00:20:16,380
to join them back when you're

629
00:20:16,480 --> 00:20:17,890
doing queries, but that can

630
00:20:18,120 --> 00:20:19,830
depend on the query load on the data base.

631
00:20:21,230 --> 00:20:23,450
So to conclude, does BCNF guarantee a good decomposition?

632
00:20:24,410 --> 00:20:25,330
Well of course the answer is

633
00:20:25,390 --> 00:20:26,310
yes or I wouldn't have spent

634
00:20:26,520 --> 00:20:27,890
all this time teaching you about it.

635
00:20:28,670 --> 00:20:30,030
Does it remove the anomalies that

636
00:20:30,150 --> 00:20:31,030
we looked at in our first

637
00:20:31,310 --> 00:20:33,560
example in another video of bad relational design?

638
00:20:34,040 --> 00:20:34,940
Yes, it does remove anomalies.

639
00:20:35,720 --> 00:20:37,580
When we have multiple instances of

640
00:20:37,650 --> 00:20:38,930
the same piece of information being

641
00:20:39,170 --> 00:20:40,570
captured, that's what's squeezed

642
00:20:41,160 --> 00:20:42,680
out by the decomposition into BCNF,

643
00:20:43,250 --> 00:20:44,590
and that's fairly easy to

644
00:20:44,780 --> 00:20:46,120
see through the examples that we've done already.

645
00:20:47,030 --> 00:20:48,470
It's a little less obvious seeing

646
00:20:48,760 --> 00:20:51,240
why BCNF composition does allow

647
00:20:51,620 --> 00:20:52,740
us to have a breakdown of

648
00:20:52,820 --> 00:20:55,750
a relation that logically reconstructs the original relation.

649
00:20:56,180 --> 00:20:57,550
So let's look at that a little bit more.

650
00:20:57,810 --> 00:20:58,810
So we're taking a relation

651
00:20:58,990 --> 00:21:00,540
in R, we're producing R1 and

652
00:21:00,730 --> 00:21:02,520
R2, and we want to

653
00:21:02,660 --> 00:21:03,730
guarantee that when we

654
00:21:03,860 --> 00:21:06,120
join R1 and R2, we get R back.

655
00:21:06,770 --> 00:21:09,250
We don't get too few tuples and we don't get too many tuples.

656
00:21:10,370 --> 00:21:12,020
Too few is pretty easy to see.

657
00:21:12,350 --> 00:21:13,560
If we break R into R1

658
00:21:13,770 --> 00:21:15,340
and R2 and their projections of

659
00:21:15,540 --> 00:21:16,380
R then when we join

660
00:21:16,700 --> 00:21:18,570
them back, certainly all the data is still present.

661
00:21:19,460 --> 00:21:22,190
If they're too many tuples, that's a little bit more complicated to see.

662
00:21:22,330 --> 00:21:23,810
Let's just use a simple abstract example.

663
00:21:25,300 --> 00:21:26,910
So here's a relation R with three attributes.

664
00:21:28,120 --> 00:21:32,650
Let's have two tuples: 1, 2, 3 and 4, 2, 5.

665
00:21:32,730 --> 00:21:34,360
Let's suppose that we decompose R

666
00:21:34,750 --> 00:21:37,160
into R1 and R2.

667
00:21:37,280 --> 00:21:38,450
R1 is going to contain

668
00:21:39,140 --> 00:21:40,060
AB, and R2 is going to contain

669
00:21:41,660 --> 00:21:41,660
BC.

670
00:21:44,580 --> 00:21:47,570
So let's fill in the data: in R1 we have 1, 2, 4, 2.

671
00:21:47,650 --> 00:21:48,590
And in R2 we have 2, 3, 2, 5.

672
00:21:49,830 --> 00:21:51,010
Now let's see what happens when

673
00:21:51,150 --> 00:21:53,230
we join R1 and R2 back together.

674
00:21:54,270 --> 00:21:56,370
When we do that, you might see what's going to happen.

675
00:21:58,270 --> 00:22:03,290
We're actually going to get four tuples.

676
00:22:03,630 --> 00:22:05,920
We're going to get "1,2,3", "1,2,5", "4,2,3" and "4,2,5".

677
00:22:06,030 --> 00:22:08,550
That is not same as our original relation, so what happened?

678
00:22:09,920 --> 00:22:12,850
Well, what happened is we didn't decompose based on a functional dependency.

679
00:22:13,870 --> 00:22:14,850
The only way we would decompose

680
00:22:15,640 --> 00:22:16,540
with "B" as the shared

681
00:22:16,890 --> 00:22:18,040
attribute were if have

682
00:22:18,100 --> 00:22:19,490
a functional dependency from B

683
00:22:19,700 --> 00:22:21,950
to A or B to C. And we don't.

684
00:22:22,330 --> 00:22:23,820
In both cases,

685
00:22:24,430 --> 00:22:25,480
there's two values of B that

686
00:22:25,620 --> 00:22:26,840
are the same where here the

687
00:22:26,990 --> 00:22:28,020
A values are not the same

688
00:22:28,410 --> 00:22:29,790
and here the C values are not the same.

689
00:22:30,430 --> 00:22:31,650
So neither of these functional dependencies

690
00:22:32,210 --> 00:22:32,990
hold N, B, C, and

691
00:22:33,130 --> 00:22:34,630
F would not perform this decompostion.

692
00:22:35,880 --> 00:22:36,790
So in fact B, C, and

693
00:22:36,790 --> 00:22:39,190
F only performs decompositions when

694
00:22:39,620 --> 00:22:42,070
we will get the property that they're joined back.

695
00:22:42,450 --> 00:22:43,780
Again that's called a lossless join.

696
00:22:44,970 --> 00:22:45,810
So, B, C, and F seems

697
00:22:46,040 --> 00:22:47,190
great, we just list all our

698
00:22:47,430 --> 00:22:48,440
attributes a few functional dependencies

699
00:22:49,230 --> 00:22:51,850
and the systems churns on and out comes a wonderful schema.

700
00:22:52,190 --> 00:22:54,180
And in general that actually is what happens.

701
00:22:54,580 --> 00:22:56,540
In our student example, that worked very well.

702
00:22:57,110 --> 00:22:59,090
There are, however, shortcomings of

703
00:22:59,300 --> 00:23:01,390
B C and F and we will discuss those in a later video