Now that we've learned about functional dependencies,
let's talk about how they're used
to create relations that are in Boyce Codd Normal Form.
Very quick reminder about relational design by decomposition.
The database designer creates mega
relations that contain all the
information to be captured and specifies
properties of the data to be captured.
The system uses the properties to
decompose the relations into smaller
ones and those final decomposed
relations satisfy what is known as a Normal Form.
They don't have anomalies and they don't lose information.
Functional dependencies are used to
create relations in Boys
Codd Normal Form and multi-doubt
value dependencies are used to
create relations in Fourth Normal
Form.
This video talks about the process of using
functional dependencies to create relations in Boyce Codd Normal Form.
Let's start by defining what
it means to do a decomposition of a relational schema.
Let's suppose we have a relation
R with a set of attributes A-1 through A-N.
We can decompose R into two relations.
We'll call them R-1 and R-2
such that R-1 has... I'll just
label them B-1 through B-K
and C-1 through C
and let's say, let
me use the notation for the
list of attributes A, B
and C that I've been using in other videos.
So R1 and R2 are
a decomposition of R.  First
of all if the attributes
that capture B union C
are equal to the set of
attributes we started with A.
In other words recovering all of
the attributes, and furthermore,
this is the tricky part, R1
natural join R2 equals
R. So, let me draw this pictorially.
So here's our relation R and
all of our attributes together are the A attributes.
And then we're going to decompose R into R1 and R2.
So, let's say this first
set of attributes here are
the B attributes and the
second bunch of attributes here
with some overlap are the C attributes.
So, now R-1 consists of
this portion of R and
the purple part here now is R2.
So, clearly the B
and C attributes are equal
to the original attributes and then
is the join of R-1 and
R-2 giving us R. Now remember, all of this is logical.
We don't have R itself
and we don't have the
data and we don't have R-1
and R-2, so everything is being done at the schema level.
And we will explore later how
we can guarantee that this
join does return R and not something else.
Now just using a little bit
more relational algebra here, let
me mention that R-1 can be
defined as the projection
on the B attributes of R
and then in purple, R-2
is the projection of the
C attributes of R.  So
again all of this
is logical, but the idea is
that when we do the projection, if
there are duplicates that are
present simply because we have
say, different values in the
remaining attributes, those duplicates
don't have to be retained in the projection.
We saw in some
of our examples in other videos
where we had redundancy because we
were capturing information multiple times that we didn't need to.
And we're going to see that
what Boyce Codd Normal Form really
does is separate the relations
so that we capture each piece of information exactly once.
I know that's very abstract now, but when we see examples, we'll see how that works.
So let's look at two
possible decompositions of the
student relation and see which ones are correct.
So let's start with a
decomposition where we take... we're
gonna decompose student into S-1
and S-2, and in S-1 we'll
put the social security number,
name, address, let me
abbreviate a little bit here
and the high school code but
no more high school information, the GPA and the priority.
And then in relation 2,
we'll put the high school
code and we'll put
the high school name and the high school city.
So you can see what I've
done done here is I've separated out
the high school information into a separate relation.
So first of all, is this
a correct decomposition in the
sense that A union B
equals C. Certainly all of
the attributes are still present
and furthermore, if you think
about it and we'll formalize
this concept later S1 join
S2 and that's going to occur by
the way based on this highest school code value.
S1 join S2 in this
case will, for the
data that you would expect, equal student.
Again, we'll formalize that momentarily.
Now let's look at a second
possible decomposition of the
student relation again into two relations S1 and S2.
In the first one we'll put the first bunch of attributes.
So we'll put the social security
number, the student's name,
their address, their high school code.
Let's say high school name and high school city.
And then in the second relation, we'll put again the student name.
We'll put the high school name
and we'll put, say the
GPA and lastly the priority.
So again, is this a decomposition?
Well, certainly again we have
the case that the A union
B equals C, in other
words we've captured all of
the attributes of the student
relation in our decomposed relation
and do we think it's
the case that if we join S1 and
S2 then we'll get
the student relation back and I'll
put a question mark here and
you know, of course, the answer is going to be no.
When we join back we'll
be joining in this case
on the student name here
and the high school name and
likely these are not
unique values so when we
join back, we may be
getting information together that doesn't really belong together.
And, again, we'll be formalizing
that and seeing additional examples momentarily.
So now let's dig a little further into the actual process of decomposition.
So, first of all we definitely want good decomposition.
So, as we saw a good
decomposition must capture all of the attributes of course.
But the more important property is
that this reassembly by the
join produces the original relation.
Sometimes that's called, by
the way, a lossless join property.
But the second thing that we
want is not only that
we have a good decomposition but
that the relations that we
decompose into are good relations.
And those relations are going to
be the ones that are in Boyce-Codd Normal Form.
So let me first define
formally Boyce-Codd Normal Form
and then we'll go back to figure
out an algorithm for automatically decomposing
relations using good decompositions
into decomposed relations that are in Boyce-Codd Normal Form.
So here's the formal definition of
when a relation is in Boyce-Codd
Normal Form, usually abbreviated B,
C and F.  A
relation R with functional dependencies is in
Boyce-Codd Normal Form if every
functional dependencies is such
that it's left-hand side is a key, ok.
Let's see what happens when it's
not the case that the
left hand side of a functional
dependency is not the key
and we'll see why that's a bad design.
So here's our relation R and
here's a set attributes A on
the left side of the functional
dependency, attribute B and the rest.
And let's just put in some values.
So let's suppose that we have
two tuples here with the same A value.
Then by our functional dependency,
we're going to have the same B
value and the rest can be anything.
What has happened here is
that we've captured the piece
of information the connection between A and B twice.
And the reason that's allowed to
happen is because A is not a key.
if A were a key, we
would not be allowed to
have the situation where we have
these two tuples both present in the relation.
So this relation is not in voice cod normal form.
And this functional dependency here
is what we would call a B C and F violation.
That violation is causing
us to have redundancy in our
relation and that also
give us as we've seen the
update anomalies and deletion anomalies.
Let me clarify a
little bit the requirement that the
left-hand side of functional dependencies
have to be key, so that's
what tells us we're in Boyce-Codd normal form.
Now I'm not saying that the
left-hand side of every functional
dependency has to be declared
as the primary key for a
relation, only that it is, in fact, a key.
And, as you might recall, the definition
of a key is an
attribute that determines all other
attributes, if you're thinking about functional
dependencies, or if you
don't have any duplicates in your
relation, then a key is a
value that is never duplicated across tacets.
So if you think about it for
a second, you'll realize that whenever
a set of attributes is a
key, so is any
superset of those attributes.
So if "A" is a key,
then so is ac and so is a,c,d and so on.
So sometimes you'll see in
the definition of Boyce Codd normal
form this wording not
is a key but will be
contains a key which in
fact is exactly the same
thing or sometimes it
will even say is a
super key, and a super
key is a key or a super set of a key.
Again, all of those are
saying exactly the same thing,
but I just wanted to clarify because
different wording and sometimes different
notation is used for that concept.
So far, things have been pretty abstract.
Let's try to get a bit more concrete here.
Let's look at two examples and
determine if those examples are in "B", "C", and "F".
Remember to determine is something
is in B, C, and F we
need the relational schema and a set of functional dependencies.
So here we have our student
relation and this is a
set of functional dependencies we had
in earlier examples, where the
social security number is determining the name, address, and GPA.
That means that if there's two
tuples with the same social
security number they will have the same name, address, and GPA.
That's the same student and they only live in one place.
The GPA determines the
priority, so any two
students with the same GPA will
have the same priority and, finally,
the high school code determines the high school name and city.
So the high school is a
unique identifier for a
particular high school in a city.
So those are our three functional
dependencies in order to
test whether this is relation
is in normal form with
respect to the functional dependencies
we need to know what the
key of the relation is
or the set of keys of the
relation and we worked
on this in an earlier video
using the closure idea, so
I'll just remind you now,
that for this relation, this
set of functional dependencies, there's one
key or one minimal key
and that's the social security
number together with the high
school code, those two attributes
do functionally determine all other
attributes in the relation and, therefore,
they are together, forming a key.
So now, to check if we're
in Boyce-Codd Normal Form, we
have to ask the question, "Does
every functional dependency have
a key on its left-hand side?"
and the answer, of course,
is no, not all.
In fact, the reality is
that no functional dependency, in
this case, has the key on the left hand side.
We have three left hand sides
and no of them have or contain our one key.
If you've given any thought at
all to this database design, you will see that it's not a good one.
It's combining too much information in
one place, which is our
basic idea, that we start
with a mega relation and break it down.
And so what we're going to
do is use these functional dependencies,
and specifically the fact that
those are BCNF or Boyce
Codd Normal Form violations, to break
this relation down into one that is a better design.
Now let's look at a second
example, our apply relation to
see if this one is in Boyce Codd Normal Form.
So in this case as a
reminder, we have social security
number, college, state, date and major.
So the date is the date
of application; the major is
major the student is applying
for at that particular college and
we'll have one functional dependency which
effectively says in English that
each student may apply to
each college only once and for one major.
Now let's compute the key for
this relation, or keys for
this relation based on the functional dependency.
Well, it's pretty straightforward that these
three attributes form a key
because they determine the other
attributes in the relation and therefore
they determine all the attributes of the relation.
Furthermore, we can see
that our one and only functional
dependency obviously has a
key on its left hand side
and so so this relation is in
fact already in Boyce-Codd normal
form and we'll see there's
no way to decompose this
relation further into a better design.
So here we are again, talking
about our decomposition process, so
now we know what good relations are.
They are in BCNF.
And we saw earlier what good
decompositions are, so now
we're going to give an algorithm
that's going to perform good decompositions
and those decompositions are going to
yield decomposed relations that are in Boyce Codd normal form.
So here's the algorithm.
The input is a relation
r and the set of
functional dependencies for that relation,
and the output is going
to be our good decomposition into good relations.
So let's just go through it step by step.
The first thing we're going to
do is compute keys from
r and were going to
do that using functional dependencies as
we have seen, and we saw
in the actual algorithm for doing that in a previous video.
Then we are going to start doing
the decomposition process and we're
going to repeat the decomposition
until all of the relations are in BCNF.
though we're going to take r and
break it up into smaller relations
and we might further break those
into smaller relations, and so
on, until every relation is good.
So the breaking down process says
pick a relation that's bad.
So we're going to pick a relation,
r prime in our current set,
and again, we're starting with only r in our set.
And we're going define a situation
where that relation has a functional
dependency and I guess,
technically speaking, this would be
with lines on top, that
violates Boyce-Codd Normal Form
and that violation is what's going
to guide us to do the
decomposition into better relations.
So our decomposition then,in the
second step, is going take our
prime and and it's going
to put the attributes involved
in the functional dependency into one
relation, and then it's
going to keep the left-hand side
of the functional dependency and put
the rest of the attributes along
with that left-hand side into a separate relation.
So let's draw this as a picture.
So here's our relation, r prime
and of course the first time through
our loop, that relation would have
to be r self, and then
because we have a functional
dependency from A to
B and A is not
a key - that means
it's BCNF violation  - we're
going to decompose this into R1 and R2.
So R1 will contain
the attributes in the functional dependency.
R2 will contain the left-hand
side of the functional dependency and the rest.
We can see clearly that this
is a decomposition that keeps all
attributes, and what we'll
see soon is that this is
also a good one in
that logically the join of
these two relations is guaranteed
to give us back what we had originally.
Now the remaining two lines of
the algorithm compute after the
decomposition the set of
functional dependencies for the decomposed
relations and then the keys for those.
I'm going to come back to this
particular line here after doing an example.
This is our same example, although squished
to give me more space to write.
And, as a reminder, in this
example, we've computed a couple
of the times that the key, given
the functional dependencies, is the
combination of the social
security number and the high school code.
So our goal is to take
the student relation and iteratively break
it down into smaller relations until
all of the relations are in Boyce-Codd normal form.
So let's start the iterative process.
We pick some functional dependency
that violates Boyce-Codd normal form
and we use it guide our decomposition.
So all three of these
functional dependencies actually violate Boyce-Codd
normal form because none of them have a key on the left-hand side.
So let's start with the high school code one.
So, to do the decomposition,
based on this violating functional
dependency, we're going to create two relations.
The first one is going to
contain just the three attributes
that are in the functional dependency itself,
so it's high school code, high school name and high school city.
And the second one is going
to have all remaining attributes in
the relation, plus the left
hand side, so we'll have
the social security number, the name, the address.
We will have the high school
code because it's on the left-hand
side of the functional dependency we're using,
but we won't have the name
and city from the right side
hand side and I'll have a GPA and the priority.
Now at this point our algorithm
computes the functional dependencies for the decomposed those relations.
For this particular example, we
can just see what they
are, they're the same functional dependencies
that we had for the non-decomposed relation.
Sometimes there's a little bit of
computation you have to do and I'm going to talk about that in a bit.
But, in this case, we can
see, for example, for our relation
S1, the only functional
dependency is this functional dependency here.
That tells us that high
school code here is a key for S1.
So our only functional
dependency has a key on
the left-hand side and that
tells us that this relation
is in Boyce-Codd normal form.
So we're done with that one, but we're not done with S2.
So for S2 the key is
still the social security number and
high school code together, so we
still have these two functional dependencies
that are Boyce-Codd normal form violations.
So let's take the GPA priority
one and let's guide that to decompose S3 further.
We'll decompose S2 into S3
and S4.
S3 will contain the GPA and
priority from the functional
dependency we're using, and then
S4 will take the remaining
attributes in S-2 together
with the left hand side of the GPA.
So, we'll keep our social security
number, name, address, high
school code and GPA, but we don't keep the priority.
So at this point, S-2 is
completely gone and let's take a look at the remaining relations.
S-3 now just has one
functional dependency that applies and
the left-hand side is now a key.
And so now we're done with S-2.
It's in Boyce Codd Normal
Form, but we're not done,
I'm sorry, we're done with S-3,
but we're not done yet with S-4.
S-4 still has social security and
high school code as its
key, and so we still
have a violating functional dependency, so let's decompose S-4 further.
We decompose into S-5 and S-6.
S-5 contains the attributes in
the functional dependency that we're using
now, so it's the social security
number, name, address and GPA
and then as 6 contains
the remaining attributes plus the
left hand side, so that's the social
security number and the high school code.
And I will just tell you
right now because you might
be getting bored with this example,
we're done with S-4 S5
and S6 are now both
in Boyce Codd normal form.
So this is our final schema.
It contains 4 relations, S1,
with the information about high schools,
S3 with the information
about GPA's and priorities, S5
has student with their name,
address and GPA and S6,
a student with the high school they went to.
And, if you think about
it, this really is a good
schema design and it's what's
produced automatically by the
BCNF decomposition algorithm, using our functional dependencies.
Let me mention a few more things about the algorithm.
First of all, I left this
step kind of mysterious, which
is how we compute the functional dependencies for the decomposed relations.
We can't just take the functional
dependencies we already have for the
bigger relation, and throw away
the ones that don't apply exclusively
to one or the other of the
decomposed, we actually have to
compute the functional dependencies that
are implied and that apply to these relations.
So in the video on
functional dependencies, we learned all
of implied functional dependencies,
and we would use the closure
as we did in that video
to determine the implied functional dependencies.
Now the reality is, for many
examples, it's pretty obvious,
we saw in the previous example it is.
And, the other thing is, that
this is being done by a computer
so, we don't actually have to
worry about it except when we're doing exercises for our class.
Second, let me mention that there is little nondeterminism in this algorithm.
It says "pick any of
our relations with a violating
functional dependency and use that
to guide the next decomposition step".
So, the fact is, that you
can get a different answer, depending
on which one you choose at this point in time.
All of the results will be
BCNF decomposition but they might not be the same one.
And, in fact, if you go
back to the example that I
did, and you pick the
functional dependencies in a different
order, you might get a different final schema.
But, again, it will be in BCNF.
And lastly, some presentations of
the BCNF decomposition algorithm actually
have another step here, which
is to extend the functional dependency that is used for the decomposition.
So we're using A to
B but if we have
A to B we also have
A to B and we can add
more attributes, those in the
closure of A. If you
remember the closure and that's
also a correct functional dependency.
By doing this extension, we
will still get a correct BCNF answer,
but we'll tend to get relations
that are larger than the ones
we get if we don't do the extension first.
In some cases, larger relations are
better because you don't need
to join them back when you're
doing queries, but that can
depend on the query load on the data base.
So to conclude, does BCNF guarantee a good decomposition?
Well of course the answer is
yes or I wouldn't have spent
all this time teaching you about it.
Does it remove the anomalies that
we looked at in our first
example in another video of bad relational design?
Yes, it does remove anomalies.
When we have multiple instances of
the same piece of information being
captured, that's what's squeezed
out by the decomposition into BCNF,
and that's fairly easy to
see through the examples that we've done already.
It's a little less obvious seeing
why BCNF composition does allow
us to have a breakdown of
a relation that logically reconstructs the original relation.
So let's look at that a little bit more.
So we're taking a relation
in R, we're producing R1 and
R2, and we want to
guarantee that when we
join R1 and R2, we get R back.
We don't get too few tuples and we don't get too many tuples.
Too few is pretty easy to see.
If we break R into R1
and R2 and their projections of
R then when we join
them back, certainly all the data is still present.
If they're too many tuples, that's a little bit more complicated to see.
Let's just use a simple abstract example.
So here's a relation R with three attributes.
Let's have two tuples: 1, 2, 3 and 4, 2, 5.
Let's suppose that we decompose R
into R1 and R2.
R1 is going to contain
AB, and R2 is going to contain
BC.
So let's fill in the data: in R1 we have 1, 2, 4, 2.
And in R2 we have 2, 3, 2, 5.
Now let's see what happens when
we join R1 and R2 back together.
When we do that, you might see what's going to happen.
We're actually going to get four tuples.
We're going to get "1,2,3", "1,2,5", "4,2,3" and "4,2,5".
That is not same as our original relation, so what happened?
Well, what happened is we didn't decompose based on a functional dependency.
The only way we would decompose
with "B" as the shared
attribute were if have
a functional dependency from B
to A or B to C. And we don't.
In both cases,
there's two values of B that
are the same where here the
A values are not the same
and here the C values are not the same.
So neither of these functional dependencies
hold N, B, C, and
F would not perform this decompostion.
So in fact B, C, and
F only performs decompositions when
we will get the property that they're joined back.
Again that's called a lossless join.
So, B, C, and F seems
great, we just list all our
attributes a few functional dependencies
and the systems churns on and out comes a wonderful schema.
And in general that actually is what happens.
In our student example, that worked very well.
There are, however, shortcomings of
B C and F and we will discuss those in a later video