This video covers functional dependencies.
First a quick recap of relational design by decomposition.
The idea is that the application
designer writes mega relations
that contain all the information that
we want to have and properties of the data that we're storing.
And then the system will automatically
decompose those based on the properties that are specified.
The final set of decomposed relations
will satisfy what's called the
normal form, and normal forms
are good relations in the sense
that they have no anomalies and they
don't lose information from what
was specified in the original mega relations.
Now the properties themselves are defined
either as functional dependencies in
which case the system will
generate Boyce-Codd Normal Form relations
or multi-value dependencies, which
will then yield fourth normal form relations.
So this video, as you
can tell, is about functional dependencies themselves.
And let me say that functional
dependencies are actually a generally
useful concept in databases not only for relational design.
So for functional dependencies,
as we'll see soon, are a generalization
of the notion of keys and they
allow the system to, for
example, store the data more
efficiently when the system knows about functional dependencies.
Compression schemes can be
used based on functional dependencies for
storage, and also functional
dependencies, as a generalization of
keys, can be used to
reason about queries and for query optimization.
Which is a reminder of a very important aspect of database systems.
Which allows declarative queries to be executed by the system efficiently.
By the way a third use of
functional dependencies is for exam
questions in database courses because
there is a very nice theory functional dependencies as you'll see.
It's quite easy to write questions about them.
The remainder of the video will
cover functional dependencies in general
as a general concept and not
specifically to relational design and
then later videos will tie functional
dependencies back to design by decomposition.
As always we'll be using as
a sample a college application
database and this case I've
expanded the information that we're including quite a bit.
We'll be using these same two
relations as examples throughout
this video and subsequent videos on relational design.
In this case we're gonna
look at two relations, one with
information about students and then
a separate one with information about where they're applying.
The student information will have social
security number, the student's name,
their address and then three
attributes about their high school. We'll assume
there are unique codes for high
schools, but they also
have a name and are in
a city, finally the student's
GPA, and a priority
field for admissions that we'll see in a moment.
For applications, we'll have the
student's social security number, the
college name they're applying to,
the state of the college, the date of application and the major.
Not all of these attributes
will even be used in this
video, but like I said,
this will permeate several videos as our running example.
To motivate functional dependencies, let's
focus on the student relation
and specifically on the GPA and priority attributes.
Let's suppose that a student's
priority is determined by their GPA.
For example, we might have
a rule that says if GPA
is 3.8, greater than 3.8 then priority is 1.
If the GPA is
say, in between, let's say, 3.3 and 3.8.
Then we'll set priority
to be equal 2, and let's
say if the GPA, then, is
less than 3.3, then the priority value is three.
So, if this relationship is guaranteed
in our data then what we
can say is that any two
tuples that have the same
priority are guaranteed to have the same GPA.
And let's formalize that concept.
So I'm going to write
a little logical statement here to formalize the concept.
I'm going to use the for all
symbol from predicate logic and
I'm going to say if we have any pair of tuples.
So for all T or U,
those are tuples in the
student relation, then if
the student, if the T
and U have the same priorities
I'm sorry the same, let me
fix that, they have the the same GPA.
So if T.GPA equals U.GPA,
then and this is the
logical implication symbol, then
T.priority will equal U.priority.
So this logical statement is in
fact a definition of a functional
dependency, and we would
write that functional dependency as
GPA arrow priority, so
that says the GPA determines
the priority, any 2 tuples
with the same GPA must have the same priority.
So that was a specific example.
Now let's generalize our definition.
So, let me replace GPA
and priority here with just
two attributes A and B
E of say a relation
R. And then we'll also need to modify our definition.
So you can see, I've erased the specific attributes and relation.
And I'll just say for every.
T and U in our
relation, R. If T
dot A equals U
dot A then T dot
B equals U dot B
and that's the definition of
the functional dependency A determines B
for a relation R. 
Actually
I'm gonna generalize this definition even further
because functional dependencies don't
always have to have one
attribute on each side, they can actually have a set of attributes.
So now I write A1 A2
dot dot dot AN on
the left hand side, these will
all be attributes of relation
R. And on the right
hand side, B1 B2 comma
BM, again, attributes
of R. 
Modifying the formal
definition in red, now I
can't use the dot notation anymore,
so I'll use a square bracket and
I'll write A1 through An
equals U square bracket A1
through AN, so what
I'm saying here in this
case is that the two
tuples T and U
have the same values for
all of the attributes A-1 through
A-N, and if they do,
then they will also though have
the same values for B-1 through B-M.
We'll be getting to
some concrete examples soon.
Just one last bit of notation before we move on.
For simplicity I'm going
to often in the video
abbreviate a list of
attributes or set of attributes by using a bar.
So I'll write A bar to
indicate a set of attributes A
and B bar to indicate
a set of attributes B. And again, this is just for convenience.
So we've seen the motivation for a functional dependency in a relation.
A functional dependency for a
relation is based on knowledge of
the real world data that's being captured.
And when we specify one, just
like specifying keys, all instances
of the relation must adhere to the functional dependency.
So just to summarize functional dependencies,
we say that attribute, a
set of attributes A functionally
determines a set of attributes
B, if again any
time tuples agree in their
A values, they also agree in their B values.
And let's say that our relation
here R has [xx] tuples
A the tuples and B
and also a few more
attributes we'll call those C.
So, let me draw a picture
of a relation now here that
has those attributes in it.
So, we'll have here..
lets just three columns, but again these are multiple attributes.
And these are the attributes
A, these are the attributes B,
and these are the attributes C
And if we put in a couple
of tuples, then what we'll
say is, if we have
two tuples here, that have
and I'm gonna use a bar even for the values in the tuples.
If we have two tuples whose A
values are the same then their
B values must also be the same.
And we're going to be using this type of template for some reasoning later on.
But we're not saying their C values
have to be the same so we
could have C1 and and different C values here as well.
But again, if we specify this
functional dependency, we are saying
that every instance of our
relation must satisfy the
condition that if the A
values are the same, then the B values are also the same.
Finally, let's come back to our example.
And I think when we start writing
functional dependencies for our actual
relations it'll give you a good idea of what they're really capturing.
So let's write a few
functional dependencies for our student
relation based on what we
expect to be true in
the real world in the data that we're capturing in the relation.
So here's a first example, Social
Security number functionally determines S-name, the student's name.
So what we say, we have
multiple tuples about a
particular student and they have
the same social security number, say
two tuples about student
123, we're expecting them to
have the same name in fact,
we're requiring them to have the same name.
And presumably because 1 to
3 is sort of identifying the
student that would be
a natural functional dependency that
would hold in this case and
similarly we would expect social security
number to determine address, although
we're already making an assumption
about the real world here, if
we have this particular functional dependency,
then we're saying a student doesn't move.
They don't have multiple addresses.
Every tuple that describes that
student by their social security
number will have the same address.
Let's go to the high school and see what might be going on there.
So I mentioned that the
high school code, what I'm
trying to capture there is
a unique code for each
high school that might be filled
in college application then we
would expect the high school code
to determine the high school name.
Every time we have the particular
high school code, maybe for different
students, it would have the same
name and also it would have the same city.
So that's an example of a
functional dependency with two attributes on the right hand side.
Now, let's look at one
that's a little more complicated, which
is one that has two attributes
on the left hand side instead.
That actually turns out to be a more interesting case.
In fact, in this particular case,
we can probably reverse the arrow
and have a functional dependency in the other direction.
If we have a combination of
high school name and high school
city, I'm going to
assume that's unique, that there aren't,
there's never two high
schools with the same name in the same city.
And if that's the case,
if that's unique, then we would
expect a functional dependency to the high school code.
Any time we have the same
name and city, we're talking about
the same high school so we should have the same code.
What other examples do we have?
If we assume that there's one
GPA for each student then
we'd have the social security number
determines the GPA and we
already talked about GPA determines
priority and another
example, actually if we
put these two together we should
see well if we have the
same social security number twice we should have the same priority.
And you may be thinking
well, that's kind of a
transitive rule if it takes these two and produces that one.
And indeed it is.
And we'll talk about rules for functional dependencies later.
And there may be more in this case.
Now let's take a look at functional dependencies for our apply relation.
Actually, this one is a
little trickier, it's even possible
there are no functional dependencies at all.
It really depends on the real world data, the real world constraints.
One possibility for example is
that every college has a
particular single date on which it receives its application.
So if that were the case, then
we'd have the college name determines the date.
In other words, every application
for a particular college must have the same date.
Another constraint might be that
students are only allowed to
apply to a single major at each college they apply to.
So if that were the case, this
is another one with two attributes
on the left hand sid, we'd
say that the social security number,
together with the college, implies the major.
In other words, we cannot have
a student and college combination
with two different majors and that captured that constraint.
Maybe we have a constraint
that students are only allowed to apply to colleges in one state.
That seems rather unlikely, but I
was struggling to find functional
dependencies for this case.
In that case, we'd have this
function dependency, again, saying a
student could only apply to colleges in a single state.
For the apply relation specifically, again,
it's really the real world constraints
that drive which functional dependencies hold for the relation.
But it's important to understand those
constraints so they can be
translated to functional dependencies, which
then can drive a good relational design.
So I've alluded a few
times to the fact that functional
dependencies generalize the notion of keys.
And let's make that connection explicit now.
Let's suppose we have a relation
R and R has no duplicate tuples.
R has some attributes A and it has some other attributes.
Let's call those B.  And let's
suppose that we have a functional dependency that
A determines all of the attributes in the relation.
Now let me draw a picture of that.
So here's our relation R with
attributes A and attributes B.
And now let's suppose that
we have two tuples with the
same values for A.  Now
we'll just write those as little a bar.
And now let's see what happens with the B values.
We'll make them B1 bar and B2 bar.
Because we have the functional
dependency, the equal values
here for A say that B1 and B2 have to be equal.
So B1 equals B2.
Let's just erase the little one and two.
But now we've generated duplicate tuples.
So what the functional
dependency tells us in
that case is that these two tuples
are actually the same, or rather
we cannot have two separate
tuples with the same A values.
So we cannot have two separate
tuples with the same A values
is exactly what it means
for A to be a key.
Now again, this is only the case when we have
no duplicates in R. But
if we have no duplicates, if
a set of attributes determines
all the other attributes, then those attributes are a key.
So here are a few other definitions related to functional dependencies.
We have a notion of a trivial functional dependency.
A functional dependency is trivial
A to B if B
is a subset of A. And
let's draw a little picture of what that means.
So here we have our attributes
A, and then we're saying
and that's all of the attributes
here, and what we're saying is
that B is a subset
of A. So in other words,
some portion of these attributes
here, we'll just mark that in
purple here are attributes
B. Well, it's pretty obvious
that if two tuples have
the same values across their
entire expanse here for A's,
then obviously they're also going
to have the same values for
just this portion here, the B portion.
So that's why it's called the trivial functional dependency.
So a nontrivial functional dependency
is a functional dependency that's not a trivial one.
By the way, FD is a common abbreviation for functional dependency.
So if we have A determines
B, then that is
non-trivial if it's not
the case that B is a
subset of A.  Going to
our picture, let's have here
our attributes A. And now
we're saying there are some
attributes in B that are
not part of A. So
we can say, well maybe B
is partially part of A,
but there's some portion that is
not part of A. So
let's say that these are our B attributes here.
So now our functional dependency
is actually saying something.
It's saying we have two
attributes that agree in these
values, then they're also
going to agree in these values over here.
And the last definition is
a completely nontrivial functional dependency,
and that's A determines B
where A and B have no intersection at all.
So in that case, again, going
back to our picture, we'll have
our A attributes here and
then our B attributes are going
to be some portion of the remaining attributes.
And here we're saying a lot.
We're saying if these two have
the same value, then these two have the same value as well.
And the reality is that completely
nontrivial functional dependencies are the
ones that we're most interested in specifying.
I mentioned that there are some
rules that apply to all
functional dependencies and I'll give a couple of those right now.
The first one is the splitting rule.
The splitting rules says that if
we have a set of attributes
that determine another set of
attributes - and this time I'm
going to write it out instead of using
the bar notation - then
we also have...this implies that
we have A determines B-1
and A determines B-2 and so on.
In other words, we can split the right side of the functional dependency.
And if you think about it,
this is pretty If we say
that the when the A value
are the same all of the
B values have to be the
same then certainly when the
A values are the same the B values have to be the same independently.
Now you might wonder if the splitting rule also goes the other way.
So let's say we have, I'll
put the left hand side, I'll
write out the attributes explicitly so
let's say we have A-1
through A-N determines B then is
it from that the case
that A-1 determines B and
A-2 determines B on its own, and so on.
Well, the answer to that is no.
And I'll give a simple example from our college application database.
So let's say that we
have the functional dependency high school
name and high school
city determines high school code.
We talked about that one before.
Oops, here, that's an arrow there.
So that says that when we
have a particular name and
city combination for a high
school, that's identifying a single
high school, and so it will always have the same code.
So that makes a lot of sense
but it is not the case,
I'll put a big X here,
necessarily, that if we
split the left hand side
that high school name alone will determine a high school code.
So for example, I would
venture to guess that there
are a lot of Jefferson High Schools
all over the country and they
won't all will be the same high school.
So it's really the combination of attributes
on the left that together functionally
determine the right hand side
and so we do not then have
the splitting rule as a general principle.
The combining rule is the inverse of the splitting rule.
It says if we have A
determines B-1 and we have
A determines B-2 and so
on up to A determines
B-N, then we also
have A determines B-1 through B-N together.
Next, we have two trivial dependency rules.
Let me remind remind you what a trivial dependency is.
It's A determines B where
B is a subset of A.
So in other words, every left hand
side determines itself or any
subset of itself, and that's
what drives the two trivial dependency rules.
One of them says that if
we have A determines B,
then we also have A determines
A union B, so in
other words we can add to
the right hand side of every
dependency what's already on the left hand side.
Sort of as a converse,
we can also say that if
A determines B then A
determines A intersect B.
Actually this one is also
implied by the splitting rule.
So we have two different rules in this case that are doing the same thing.
And finally the transitive rule, which is the one we alluded to earlier.
It says if we have A
determines B and and
we have B determines C, then
we have A determines C.
Now all of these rules
can actually be proven formally
and I'm going to go through
a sketch of this particular one.
So here's my relation R and
I'll have attributes A, B,
C and then let's let D be the left of our attributes.
And my goal is to prove
that A determines C. And
the information I have to
prove that is these two functional dependencies here.
So to prove that A determines
C, I have to
prove that if two tupples
have the same A values--we'll put
little bars there--then they also have the same C values.
So I need to show that these two C values here are going to be the same.
So you can see what's going to happen.
Using the first functional dependency,
because these two A values
are the same, I know their B values must be the same.
And then I just use the second
functional dependency and because
the two B values are the
same, I then know that
the two C values are the
same, and that has
shown that this functional dependency holds.
And you can do a
similar thing to prove the other rules to yourself.
Now, I'm going to introduce the concept of closure of attributes.
Let's suppose we're given a
relation, a set of
functional dependencies for that
relation, and then a set
of attributes A bar that are part of that relation.
I'm interested in finding all attributes
B of the relation such
that A bar functionally determines
B.  And this is what's
called the closure, and I'll
show in a bit why we might want to compute that.
Notationally the closure is written using the +  sign.
So the closure of A bar is A bar +.
Let me be a little
more explicit, let me write
out A bar, because remember
whenever we write bar, we're actually talking about a list of attributes.
So we're going to write it
as A-1 through A-N, and I'm interested
in computing the closure of that set of attributes.
In other words, the entire set
of attributes that are functionally
determined by the attributes A-1 through A-N.
And I'm going to give an algorithm for doing that.
My algorithm says start with the set itself.
So I'm going to start with
A1 through AN, except I'm not going
to close that.
I'm going to leave a little space there.
And then I'm going to repeat until there's no change.
I'm going to add attributes to
that set until I get to the closure.
So I'm going to repeat.
If A determines B
- and now we'll put bars
in here - and all
of A are in the set,
then add B to the set.
So I might have my
attributes here, A1 through AN,
and it might be the case that
A, you know, 4 determines
attributes C and D,
so I'll add C and D to the set.
I repeat.
Maybe there's a C goes
to E and I'll add E to the set, and so on.
And when there's no more change, then I've computed the complete closure.
Now if you happen to be someone
who likes to think in terms of
rules instead, what we're effectively
doing is applying the combining
and transitive rules to the
attributes in the set until there's no change.
So let's run an example of the closure algorithm.
Let's go to our complete student
table and let's add three functional dependencies.
One that says that student's social
security number determines their name,
address and GPA and that
would be a normal... GPA determines
priority, and high school
code determines high school name and high school city.
Again, these are all examples we
gave earlier that would be
natural functional dependencies for this particular relation.
Let's suppose that we're interested
in computing the closure of
the two attributes - social security
number and high school code.
So in other words, we want
to find all attributes in
the student relation that are functionally
determined by these two attributes.
So the algorithm says start with
the two attributes themselves, social security
number and high school code,
and then add attributes that
are functionally determined by ones in the set.
So if we start with our
first functional dependency here, because
we have social security number, that
allows us to add the
student name, the address,
the GPA...  and that's it for that one.
Now let's repeat again.
Because we have the GPA, our
second functional dependency tells
us we can add the priorityAnd that's it for that one.
And then since we have the
high school code in this set,
our third one tells us
that we can add the high school
name and the high school
city and at that
point we certainly know we're done
because we've actually got the entire relation at this point.
Now I didn't pick this particular example at random.
We took two attributes, social security
number and high school code, we
computed their closure, and we
discovered that they together functionally
determine all attributes of the student relation.
Now remember just a few slides
ago, when a set
of attributes functionally determine all the
attributes, then those attributes together form a key for the relation.
And in fact if you think
about it, social security number and
high school code together are a natural key for the relation.
A student might go to
multiple high schools, but there's
no reason to have more
than one tupple with the
combination of a particular
student and the high school they attended.
So let's formalize a bit this
relationship between the notion of closure and keys.
Let's suppose we're interested in
answering the question... is a
particular set of attributes
a key for a relation we can use closure to do that.
Remember, we have the relation,
and we have a set of
functional dependencies for the relation,
so what we do is we
compute the closure of
A that's going to give
us a set of attributes and
if that equals all attributes, then
A is the key.
As a more general question, let's
suppose we're given a set
of functional dependencies for a
relation, how can we use
it to find all of the keys for the relation?
So use the same basic idea
of computing closure, but this
time we have to do it for
every subset of the attributes.
So let's call each subset A
bar and we just
check if the closure of A bar determines all attributes.
And if it does, then it's a key.
And by considering every subset of
the attributes in R, then
we're considering every possible key
and we'll just check for
each one whether it's actually a key.
Now that seems fairly inefficient
and actually we can be a little more efficient than that.
We can consider these subsets in
increasing size order.
So for example, if
we determine that A and
B together determine all
attributes functionally determine all attributes in the relation.
That tells us AB is
a key, and that actually
also tells us that every
superset of AB is also a key.
And if you think about it, that fills out naturally.
So the real algorithm would
say let's start with single attributes and determine if they are key.
If a single attribute say
C, is a key then so is
every super set of C, and
then we go to pairs of
attributes and so on.
Finally let's talk about how we
specify the set of functional dependencies for a relation.
First I'll define a notion of
one set of functional dependencies following from another one.
So let's suppose we have a
relation R and we have
two sets of functional
dependencies that aren't identical, S-1 and S-2.
We see that S2 follows from
S1 if every relation instance
satisfying S1 also satisfies S2.
As a simple example, suppose S-2
is social security number determines
priority, and suppose S-1
is the two functional dependencies
-  social security number determines GPA,
and GPA determines priority.
Then it's certainly the case that
in an... for this example, S-2 follows from S-1.
Every time we have a
relation that satisfies social security
number determines GPA and GPA
determines priority, then that
relation will also satisfy social
security number determines priority and
we kind of proved that actually in an earlier part of this video.
So one question you might
have is how do we test whether
one set of functional dependencies follows from another.
That really boils down to
testing whether one functional dependency follows from a set.
So and let me just make
this A bar, B bar here to make clear they can be sets of attributes.
Because if we have S-1 and
S-2, then we just check whether
every functional dependency in S-2
follows from the functional dependencies in S-1.
There's actually two ways of going about this test.
One of the ways is to use the closure.
We'll compute A+ based on
the functional dependencies that are
in S and then we'll
check if B is in the set.
Reminder - computing the
closure tells us every attribute
that's functionally determined by the
attributes in A based on
the functional dependencies in S.
If those include B, then
A determines B does indeed
follow from S. The
second oneway to check is based
on a set of axioms, a
set of rules called Armstrong's axioms.
We saw some rules for functional
dependencies earlier, but Armstrong's
Axioms are a specific set
of rules that are what's called complete.
It's guaranteed that if one
thing about functional dependencies can be
proved from another, then it
can be proved using the Armstrong's Axioms.
I'm not going to cover Armstrong's Axioms
in the videos, but you can
look at any of the recommended readings and find them there.
So you might wonder why did
I introduce this notion of one
set of functional dependencies following from
another and for that matter,
why did I introduce trivial and
non-trivial functional dependencies.
Well, I'm going to sum
up in one sentence what
we're looking for when we specify
the set of functional dependencies for a relation.
So we have a notion of the
real world data, we have
our attributes but we need
to specify the functional dependencies in
order to get a good designer for some of the reasons that I mentioned.
What we would like to find
is a minimal set of
completely non-trivial functional dependencies
such that all functional dependencies
that hold on the relation follow
from using the technical definition I
gave, the dependencies in this set.
Wow, that seems like some
very complicated thing, but the
fact is when you start
specifying functional dependencies, you'll discover
that you will actually get this definition pretty naturally.
So to conclude, functional dependencies are
a generally useful concept in database systems.
They 're a key ingredient
of doing relational design by decomposition,
because we use the functional dependencies
to get Boyce-Codd Normal Form which
is what we'll cover in the next
video, but they're also
useful for the system to
determine how to store data,
to compress data and also
to reason about query processing.