This video covers functional dependencies. First a quick recap of relational design by decomposition. The idea is that the application designer writes mega relations that contain all the information that we want to have and properties of the data that we're storing. And then the system will automatically decompose those based on the properties that are specified. The final set of decomposed relations will satisfy what's called the normal form, and normal forms are good relations in the sense that they have no anomalies and they don't lose information from what was specified in the original mega relations. Now the properties themselves are defined either as functional dependencies in which case the system will generate Boyce-Codd Normal Form relations or multi-value dependencies, which will then yield fourth normal form relations. So this video, as you can tell, is about functional dependencies themselves. And let me say that functional dependencies are actually a generally useful concept in databases not only for relational design. So for functional dependencies, as we'll see soon, are a generalization of the notion of keys and they allow the system to, for example, store the data more efficiently when the system knows about functional dependencies. Compression schemes can be used based on functional dependencies for storage, and also functional dependencies, as a generalization of keys, can be used to reason about queries and for query optimization. Which is a reminder of a very important aspect of database systems. Which allows declarative queries to be executed by the system efficiently. By the way a third use of functional dependencies is for exam questions in database courses because there is a very nice theory functional dependencies as you'll see. It's quite easy to write questions about them. The remainder of the video will cover functional dependencies in general as a general concept and not specifically to relational design and then later videos will tie functional dependencies back to design by decomposition. As always we'll be using as a sample a college application database and this case I've expanded the information that we're including quite a bit. We'll be using these same two relations as examples throughout this video and subsequent videos on relational design. In this case we're gonna look at two relations, one with information about students and then a separate one with information about where they're applying. The student information will have social security number, the student's name, their address and then three attributes about their high school. We'll assume there are unique codes for high schools, but they also have a name and are in a city, finally the student's GPA, and a priority field for admissions that we'll see in a moment. For applications, we'll have the student's social security number, the college name they're applying to, the state of the college, the date of application and the major. Not all of these attributes will even be used in this video, but like I said, this will permeate several videos as our running example. To motivate functional dependencies, let's focus on the student relation and specifically on the GPA and priority attributes. Let's suppose that a student's priority is determined by their GPA. For example, we might have a rule that says if GPA is 3.8, greater than 3.8 then priority is 1. If the GPA is say, in between, let's say, 3.3 and 3.8. Then we'll set priority to be equal 2, and let's say if the GPA, then, is less than 3.3, then the priority value is three. So, if this relationship is guaranteed in our data then what we can say is that any two tuples that have the same priority are guaranteed to have the same GPA. And let's formalize that concept. So I'm going to write a little logical statement here to formalize the concept. I'm going to use the for all symbol from predicate logic and I'm going to say if we have any pair of tuples. So for all T or U, those are tuples in the student relation, then if the student, if the T and U have the same priorities I'm sorry the same, let me fix that, they have the the same GPA. So if T.GPA equals U.GPA, then and this is the logical implication symbol, then T.priority will equal U.priority. So this logical statement is in fact a definition of a functional dependency, and we would write that functional dependency as GPA arrow priority, so that says the GPA determines the priority, any 2 tuples with the same GPA must have the same priority. So that was a specific example. Now let's generalize our definition. So, let me replace GPA and priority here with just two attributes A and B E of say a relation R. And then we'll also need to modify our definition. So you can see, I've erased the specific attributes and relation. And I'll just say for every. T and U in our relation, R. If T dot A equals U dot A then T dot B equals U dot B and that's the definition of the functional dependency A determines B for a relation R. Actually I'm gonna generalize this definition even further because functional dependencies don't always have to have one attribute on each side, they can actually have a set of attributes. So now I write A1 A2 dot dot dot AN on the left hand side, these will all be attributes of relation R. And on the right hand side, B1 B2 comma BM, again, attributes of R. Modifying the formal definition in red, now I can't use the dot notation anymore, so I'll use a square bracket and I'll write A1 through An equals U square bracket A1 through AN, so what I'm saying here in this case is that the two tuples T and U have the same values for all of the attributes A-1 through A-N, and if they do, then they will also though have the same values for B-1 through B-M. We'll be getting to some concrete examples soon. Just one last bit of notation before we move on. For simplicity I'm going to often in the video abbreviate a list of attributes or set of attributes by using a bar. So I'll write A bar to indicate a set of attributes A and B bar to indicate a set of attributes B. And again, this is just for convenience. So we've seen the motivation for a functional dependency in a relation. A functional dependency for a relation is based on knowledge of the real world data that's being captured. And when we specify one, just like specifying keys, all instances of the relation must adhere to the functional dependency. So just to summarize functional dependencies, we say that attribute, a set of attributes A functionally determines a set of attributes B, if again any time tuples agree in their A values, they also agree in their B values. And let's say that our relation here R has [xx] tuples A the tuples and B and also a few more attributes we'll call those C. So, let me draw a picture of a relation now here that has those attributes in it. So, we'll have here.. lets just three columns, but again these are multiple attributes. And these are the attributes A, these are the attributes B, and these are the attributes C And if we put in a couple of tuples, then what we'll say is, if we have two tuples here, that have and I'm gonna use a bar even for the values in the tuples. If we have two tuples whose A values are the same then their B values must also be the same. And we're going to be using this type of template for some reasoning later on. But we're not saying their C values have to be the same so we could have C1 and and different C values here as well. But again, if we specify this functional dependency, we are saying that every instance of our relation must satisfy the condition that if the A values are the same, then the B values are also the same. Finally, let's come back to our example. And I think when we start writing functional dependencies for our actual relations it'll give you a good idea of what they're really capturing. So let's write a few functional dependencies for our student relation based on what we expect to be true in the real world in the data that we're capturing in the relation. So here's a first example, Social Security number functionally determines S-name, the student's name. So what we say, we have multiple tuples about a particular student and they have the same social security number, say two tuples about student 123, we're expecting them to have the same name in fact, we're requiring them to have the same name. And presumably because 1 to 3 is sort of identifying the student that would be a natural functional dependency that would hold in this case and similarly we would expect social security number to determine address, although we're already making an assumption about the real world here, if we have this particular functional dependency, then we're saying a student doesn't move. They don't have multiple addresses. Every tuple that describes that student by their social security number will have the same address. Let's go to the high school and see what might be going on there. So I mentioned that the high school code, what I'm trying to capture there is a unique code for each high school that might be filled in college application then we would expect the high school code to determine the high school name. Every time we have the particular high school code, maybe for different students, it would have the same name and also it would have the same city. So that's an example of a functional dependency with two attributes on the right hand side. Now, let's look at one that's a little more complicated, which is one that has two attributes on the left hand side instead. That actually turns out to be a more interesting case. In fact, in this particular case, we can probably reverse the arrow and have a functional dependency in the other direction. If we have a combination of high school name and high school city, I'm going to assume that's unique, that there aren't, there's never two high schools with the same name in the same city. And if that's the case, if that's unique, then we would expect a functional dependency to the high school code. Any time we have the same name and city, we're talking about the same high school so we should have the same code. What other examples do we have? If we assume that there's one GPA for each student then we'd have the social security number determines the GPA and we already talked about GPA determines priority and another example, actually if we put these two together we should see well if we have the same social security number twice we should have the same priority. And you may be thinking well, that's kind of a transitive rule if it takes these two and produces that one. And indeed it is. And we'll talk about rules for functional dependencies later. And there may be more in this case. Now let's take a look at functional dependencies for our apply relation. Actually, this one is a little trickier, it's even possible there are no functional dependencies at all. It really depends on the real world data, the real world constraints. One possibility for example is that every college has a particular single date on which it receives its application. So if that were the case, then we'd have the college name determines the date. In other words, every application for a particular college must have the same date. Another constraint might be that students are only allowed to apply to a single major at each college they apply to. So if that were the case, this is another one with two attributes on the left hand sid, we'd say that the social security number, together with the college, implies the major. In other words, we cannot have a student and college combination with two different majors and that captured that constraint. Maybe we have a constraint that students are only allowed to apply to colleges in one state. That seems rather unlikely, but I was struggling to find functional dependencies for this case. In that case, we'd have this function dependency, again, saying a student could only apply to colleges in a single state. For the apply relation specifically, again, it's really the real world constraints that drive which functional dependencies hold for the relation. But it's important to understand those constraints so they can be translated to functional dependencies, which then can drive a good relational design. So I've alluded a few times to the fact that functional dependencies generalize the notion of keys. And let's make that connection explicit now. Let's suppose we have a relation R and R has no duplicate tuples. R has some attributes A and it has some other attributes. Let's call those B. And let's suppose that we have a functional dependency that A determines all of the attributes in the relation. Now let me draw a picture of that. So here's our relation R with attributes A and attributes B. And now let's suppose that we have two tuples with the same values for A. Now we'll just write those as little a bar. And now let's see what happens with the B values. We'll make them B1 bar and B2 bar. Because we have the functional dependency, the equal values here for A say that B1 and B2 have to be equal. So B1 equals B2. Let's just erase the little one and two. But now we've generated duplicate tuples. So what the functional dependency tells us in that case is that these two tuples are actually the same, or rather we cannot have two separate tuples with the same A values. So we cannot have two separate tuples with the same A values is exactly what it means for A to be a key. Now again, this is only the case when we have no duplicates in R. But if we have no duplicates, if a set of attributes determines all the other attributes, then those attributes are a key. So here are a few other definitions related to functional dependencies. We have a notion of a trivial functional dependency. A functional dependency is trivial A to B if B is a subset of A. And let's draw a little picture of what that means. So here we have our attributes A, and then we're saying and that's all of the attributes here, and what we're saying is that B is a subset of A. So in other words, some portion of these attributes here, we'll just mark that in purple here are attributes B. Well, it's pretty obvious that if two tuples have the same values across their entire expanse here for A's, then obviously they're also going to have the same values for just this portion here, the B portion. So that's why it's called the trivial functional dependency. So a nontrivial functional dependency is a functional dependency that's not a trivial one. By the way, FD is a common abbreviation for functional dependency. So if we have A determines B, then that is non-trivial if it's not the case that B is a subset of A. Going to our picture, let's have here our attributes A. And now we're saying there are some attributes in B that are not part of A. So we can say, well maybe B is partially part of A, but there's some portion that is not part of A. So let's say that these are our B attributes here. So now our functional dependency is actually saying something. It's saying we have two attributes that agree in these values, then they're also going to agree in these values over here. And the last definition is a completely nontrivial functional dependency, and that's A determines B where A and B have no intersection at all. So in that case, again, going back to our picture, we'll have our A attributes here and then our B attributes are going to be some portion of the remaining attributes. And here we're saying a lot. We're saying if these two have the same value, then these two have the same value as well. And the reality is that completely nontrivial functional dependencies are the ones that we're most interested in specifying. I mentioned that there are some rules that apply to all functional dependencies and I'll give a couple of those right now. The first one is the splitting rule. The splitting rules says that if we have a set of attributes that determine another set of attributes - and this time I'm going to write it out instead of using the bar notation - then we also have...this implies that we have A determines B-1 and A determines B-2 and so on. In other words, we can split the right side of the functional dependency. And if you think about it, this is pretty If we say that the when the A value are the same all of the B values have to be the same then certainly when the A values are the same the B values have to be the same independently. Now you might wonder if the splitting rule also goes the other way. So let's say we have, I'll put the left hand side, I'll write out the attributes explicitly so let's say we have A-1 through A-N determines B then is it from that the case that A-1 determines B and A-2 determines B on its own, and so on. Well, the answer to that is no. And I'll give a simple example from our college application database. So let's say that we have the functional dependency high school name and high school city determines high school code. We talked about that one before. Oops, here, that's an arrow there. So that says that when we have a particular name and city combination for a high school, that's identifying a single high school, and so it will always have the same code. So that makes a lot of sense but it is not the case, I'll put a big X here, necessarily, that if we split the left hand side that high school name alone will determine a high school code. So for example, I would venture to guess that there are a lot of Jefferson High Schools all over the country and they won't all will be the same high school. So it's really the combination of attributes on the left that together functionally determine the right hand side and so we do not then have the splitting rule as a general principle. The combining rule is the inverse of the splitting rule. It says if we have A determines B-1 and we have A determines B-2 and so on up to A determines B-N, then we also have A determines B-1 through B-N together. Next, we have two trivial dependency rules. Let me remind remind you what a trivial dependency is. It's A determines B where B is a subset of A. So in other words, every left hand side determines itself or any subset of itself, and that's what drives the two trivial dependency rules. One of them says that if we have A determines B, then we also have A determines A union B, so in other words we can add to the right hand side of every dependency what's already on the left hand side. Sort of as a converse, we can also say that if A determines B then A determines A intersect B. Actually this one is also implied by the splitting rule. So we have two different rules in this case that are doing the same thing. And finally the transitive rule, which is the one we alluded to earlier. It says if we have A determines B and and we have B determines C, then we have A determines C. Now all of these rules can actually be proven formally and I'm going to go through a sketch of this particular one. So here's my relation R and I'll have attributes A, B, C and then let's let D be the left of our attributes. And my goal is to prove that A determines C. And the information I have to prove that is these two functional dependencies here. So to prove that A determines C, I have to prove that if two tupples have the same A values--we'll put little bars there--then they also have the same C values. So I need to show that these two C values here are going to be the same. So you can see what's going to happen. Using the first functional dependency, because these two A values are the same, I know their B values must be the same. And then I just use the second functional dependency and because the two B values are the same, I then know that the two C values are the same, and that has shown that this functional dependency holds. And you can do a similar thing to prove the other rules to yourself. Now, I'm going to introduce the concept of closure of attributes. Let's suppose we're given a relation, a set of functional dependencies for that relation, and then a set of attributes A bar that are part of that relation. I'm interested in finding all attributes B of the relation such that A bar functionally determines B. And this is what's called the closure, and I'll show in a bit why we might want to compute that. Notationally the closure is written using the + sign. So the closure of A bar is A bar +. Let me be a little more explicit, let me write out A bar, because remember whenever we write bar, we're actually talking about a list of attributes. So we're going to write it as A-1 through A-N, and I'm interested in computing the closure of that set of attributes. In other words, the entire set of attributes that are functionally determined by the attributes A-1 through A-N. And I'm going to give an algorithm for doing that. My algorithm says start with the set itself. So I'm going to start with A1 through AN, except I'm not going to close that. I'm going to leave a little space there. And then I'm going to repeat until there's no change. I'm going to add attributes to that set until I get to the closure. So I'm going to repeat. If A determines B - and now we'll put bars in here - and all of A are in the set, then add B to the set. So I might have my attributes here, A1 through AN, and it might be the case that A, you know, 4 determines attributes C and D, so I'll add C and D to the set. I repeat. Maybe there's a C goes to E and I'll add E to the set, and so on. And when there's no more change, then I've computed the complete closure. Now if you happen to be someone who likes to think in terms of rules instead, what we're effectively doing is applying the combining and transitive rules to the attributes in the set until there's no change. So let's run an example of the closure algorithm. Let's go to our complete student table and let's add three functional dependencies. One that says that student's social security number determines their name, address and GPA and that would be a normal... GPA determines priority, and high school code determines high school name and high school city. Again, these are all examples we gave earlier that would be natural functional dependencies for this particular relation. Let's suppose that we're interested in computing the closure of the two attributes - social security number and high school code. So in other words, we want to find all attributes in the student relation that are functionally determined by these two attributes. So the algorithm says start with the two attributes themselves, social security number and high school code, and then add attributes that are functionally determined by ones in the set. So if we start with our first functional dependency here, because we have social security number, that allows us to add the student name, the address, the GPA... and that's it for that one. Now let's repeat again. Because we have the GPA, our second functional dependency tells us we can add the priorityAnd that's it for that one. And then since we have the high school code in this set, our third one tells us that we can add the high school name and the high school city and at that point we certainly know we're done because we've actually got the entire relation at this point. Now I didn't pick this particular example at random. We took two attributes, social security number and high school code, we computed their closure, and we discovered that they together functionally determine all attributes of the student relation. Now remember just a few slides ago, when a set of attributes functionally determine all the attributes, then those attributes together form a key for the relation. And in fact if you think about it, social security number and high school code together are a natural key for the relation. A student might go to multiple high schools, but there's no reason to have more than one tupple with the combination of a particular student and the high school they attended. So let's formalize a bit this relationship between the notion of closure and keys. Let's suppose we're interested in answering the question... is a particular set of attributes a key for a relation we can use closure to do that. Remember, we have the relation, and we have a set of functional dependencies for the relation, so what we do is we compute the closure of A that's going to give us a set of attributes and if that equals all attributes, then A is the key. As a more general question, let's suppose we're given a set of functional dependencies for a relation, how can we use it to find all of the keys for the relation? So use the same basic idea of computing closure, but this time we have to do it for every subset of the attributes. So let's call each subset A bar and we just check if the closure of A bar determines all attributes. And if it does, then it's a key. And by considering every subset of the attributes in R, then we're considering every possible key and we'll just check for each one whether it's actually a key. Now that seems fairly inefficient and actually we can be a little more efficient than that. We can consider these subsets in increasing size order. So for example, if we determine that A and B together determine all attributes functionally determine all attributes in the relation. That tells us AB is a key, and that actually also tells us that every superset of AB is also a key. And if you think about it, that fills out naturally. So the real algorithm would say let's start with single attributes and determine if they are key. If a single attribute say C, is a key then so is every super set of C, and then we go to pairs of attributes and so on. Finally let's talk about how we specify the set of functional dependencies for a relation. First I'll define a notion of one set of functional dependencies following from another one. So let's suppose we have a relation R and we have two sets of functional dependencies that aren't identical, S-1 and S-2. We see that S2 follows from S1 if every relation instance satisfying S1 also satisfies S2. As a simple example, suppose S-2 is social security number determines priority, and suppose S-1 is the two functional dependencies - social security number determines GPA, and GPA determines priority. Then it's certainly the case that in an... for this example, S-2 follows from S-1. Every time we have a relation that satisfies social security number determines GPA and GPA determines priority, then that relation will also satisfy social security number determines priority and we kind of proved that actually in an earlier part of this video. So one question you might have is how do we test whether one set of functional dependencies follows from another. That really boils down to testing whether one functional dependency follows from a set. So and let me just make this A bar, B bar here to make clear they can be sets of attributes. Because if we have S-1 and S-2, then we just check whether every functional dependency in S-2 follows from the functional dependencies in S-1. There's actually two ways of going about this test. One of the ways is to use the closure. We'll compute A+ based on the functional dependencies that are in S and then we'll check if B is in the set. Reminder - computing the closure tells us every attribute that's functionally determined by the attributes in A based on the functional dependencies in S. If those include B, then A determines B does indeed follow from S. The second oneway to check is based on a set of axioms, a set of rules called Armstrong's axioms. We saw some rules for functional dependencies earlier, but Armstrong's Axioms are a specific set of rules that are what's called complete. It's guaranteed that if one thing about functional dependencies can be proved from another, then it can be proved using the Armstrong's Axioms. I'm not going to cover Armstrong's Axioms in the videos, but you can look at any of the recommended readings and find them there. So you might wonder why did I introduce this notion of one set of functional dependencies following from another and for that matter, why did I introduce trivial and non-trivial functional dependencies. Well, I'm going to sum up in one sentence what we're looking for when we specify the set of functional dependencies for a relation. So we have a notion of the real world data, we have our attributes but we need to specify the functional dependencies in order to get a good designer for some of the reasons that I mentioned. What we would like to find is a minimal set of completely non-trivial functional dependencies such that all functional dependencies that hold on the relation follow from using the technical definition I gave, the dependencies in this set. Wow, that seems like some very complicated thing, but the fact is when you start specifying functional dependencies, you'll discover that you will actually get this definition pretty naturally. So to conclude, functional dependencies are a generally useful concept in database systems. They 're a key ingredient of doing relational design by decomposition, because we use the functional dependencies to get Boyce-Codd Normal Form which is what we'll cover in the next video, but they're also useful for the system to determine how to store data, to compress data and also to reason about query processing.