in this video, we'll be giving a demo of constraint of several types, as a reminder constraints also known as integrity constraints impose restrictions on the allowable states of a database, beyond those that are imposed by the schema that's been defined and the types of the attributes. We have a number of different types of constraints, we have non null constraints which specified that a particular attribute cannot have no values, we have key constraints that talk about uniqueness in columns or sets of columns, we have attribute base and tuple base constraints which specify, a restrictions on the values and attributes or the values across attributes in particular tuples and finally we have general insertions which are quite powerful, they allow you to specify constraints across an entire database. As we'll see in the demo, not all of these constraint types are fully implemented. There are some limits on the attribute base and tuple base constraints in systems as compared to the sequel standard and general assertions have not been implemented yet in any database system, but we will give examples what they look like, had they been implemented. A very important type of constraint, is referential integrity or foreign key constraints and those will be covered in the next video. For a demonstration of constraints we'll be returning to the same simple college admissions database that we use for our SQL demos, we have three tables one with a few colleges, one with a number of students and finally a table that has information about students applying to colleges. Let's start by creating a table with a non-null constraint. So non-null is a pretty simple type of constraint. If we decide that our GPA values in our database must not take on the null value when we create the table we just add the key words not null in the declaration with that attribute. Let's run the creation of the table, let me mention right up front we're going to be seeing a lot of this word affected, this misspelling here which gets on my nerve but I'm not going to mention it again. Okay, so let's do some insertions and updates just to experiment with a not null constraint. We'll start by asserting three tuples, the first one has no null values at all, the second one has a null value for the high school size which should be allowed, and the third one has a null value for the GPA which should not be allowed. Let's run these three insert commands together and we see in fact the first two succeeded where the third one generated an error. If we go and look at the table, we'll see that indeed we got our first two tuples including the null for the high school size but there was no third tuple inserted. Now we'll try a couple of update commands. Both of them are going to set the GPA to null, the first one for the student with ID 123 and the second for the student with ID 456. If we look at our data, we see that we do have a student with ID 123, so when we try to update that GPA to null, we should get an error. But we don't have a student whose ID is 456, so even though we're going to run a command that tries to set GPAs to null, because there's no matching data, no data will attempt to be updated and we won't get an error. Let's run the query, the two updates, and we see indeed that the first one caused the constraint violation and the second one did not. Now let's take a look at key constraints. I've dropped the previous version of the student table and now we will create a new one where we're declaring the student ID to be what's called a primary key. As you may remember, a key constraint specifies that the values in the column that's declared as a key must be unique. So let's go ahead and create the table and now let's experiment with inserting and updating some data. We'll attempt to insert three students: first one, 123 Amy; second, 234 Bob and third one, 123 Craig. Since the third insert will generate a key violation because there will be two copies of 123 in the ID column, that one should generate an error. We run the queries, the inserts, and indeed the first two are fine and the third one has a key error. If we go and look at the data itself, we'll see that the first two are inserted and the third one wasn't. Now let's take a look at updates. The first update is very simple. It tries to set Bob's ID to 123. Since Amy already has ID 12 three that should generate and error and when we run the update command indeed it does. Now we're going to do something a little bit trickier. We're gonna run an update command that subtracts 111 from each student ID, now you might wonder why did I choose 111, let's take a look, if we subtract 111 from Bob's ID; two, three, four will turn into one, two, three and will have a key violation, on the other hand if the command first updates Amy's student ID to 12, then we won't have a key violation when Bob's in turned into two three, into one two three. So whether we get a key violation in this case could depend on what order the system chooses to execute the update. So let's just run it and let's see what happens. Well, things look good. We didn't get an error. Let's go look back and refresh the table and we see indeed that both of the update succeeded without a violation, so now let's set the as the student ID back to what they were by adding 111, let's run it see what happens, well this time we got an error. So we got a constraint violation error, a key violation, and nothing was updated. That's presumably because the system again updated Amy's ID first and that generated an error with the one two three for Amy. So this sort of demonstrates it one it can be pretty tricky when key violations or other types of constraint violations are detected and when they aren't, now we did mention earlier that there's a notion of for constraint checking so if an application have knowledge that it would rather have constraints checked after a bunch of changes rather than in the middle, the for constraint checking can be use for that purpose and this demo we're doing immediate constraint checking. You might have noticed in the previous example that I use the term primary key when I declared the student ID as a key. In the SQL standard and in every database system only one primary key is allowed per table, that's why it's called primary, and often the table will be organized based on that key, making it efficient to do look ups on that, for values for that particular key. So if we decided we wanted to declare two primary keys in our table, the student ID and the student name, we would get an error, now that's not to say we're not allow to have multiple keys in a table, in fact we can have as many as we want, only one of them can be declared as primary but we can declare any number of attributes or sets of attributes to be unique and that's again declaring a key constraint, it says we can only have one, we must have unique values in that column, so let's create our table with the student name now also a key along with the student ID and we'll do a few updates just to check that. So we'll attempt to insert five students, 1-2-3 Amy, 2-3-4 Bob, so far so good. When we try 3,4,5 Amy we should get an error because we have now declared that the name must be a key as well as the student ID so we won't be allowed to have 2 Amy. 4,5,6 door should be good. 5,6,7 Amy should again generate an error, we ran the query and indeed we get two errors. So far we seen only keys that are one attribute but as you know we can have keys that spans several attributes, that's not to say that each attribute is the key individually but rather the combination of values for all of the attributes must be unique in each tuple. So let's suppose that our college name is not unique on its own but college name and state together are expected to be unique, now syntactically we can't put the primary key statement with the attribute anymore because it involves multiple attributes, so the syntax is to list the attributes first in the Create Table command, then use the keywords primary key and put the list of attributes constituting the key in parentheses. So let's create the table. Now let's insert some data. I've tried to pick a college name that's kind of generic Mason, I don't know if I've succeeded but we'll try to answer the Mason college in California, a Mason college in New York those should succeed because the two columns together need to be unique but not the individual column and then we should get an error when we try to generate a third tuple with Mason California. We run the query, we run the inserts and indeed we do. Now lets use multi-attribute keys to declare some interesting constraints. We're going to create our apply table and we're going to have two key constraints. The first one says that the combination of student ID and college name must be unique in each tuple. What that's really saying is that each student can apply to each college only one time. We're also going to say that the combination of student ID and major must be unique in each tuple. That means that each student can apply to each major only once. Now, a student can still apply to several colleges and several majors, but only one time for each. So let's create the table and then let's try inserting some data. We'll insert quite a number of tuples and lets take a look at what we expect to happen. Our first tuple says 123 applies to Stanford and CS and then 123 also applies to Berkeley and EE, no problem. 123 tries to apply again to Stanford and that should be an error because that's the second instance of the combination of 123 Stanford. On the other hand, 234 should be able to apply to Stanford. 123 comes back and wants to go to MIT but tries once again to major in EE. That should generate an error because the combination of 123 and EE already appears in our second tuple. And finally 123 applies to MIT but in biology and that should work just fine. So we'll run the query and we'll find indeed the first two tuples and the fourth and the sixth were fine but the third tuple generated an error because of the second application to Stanford and the fifth because of the second application to EE. Let's go take a look at the data. And here we see in the apply relation that we did indeed insert the four tuples, but not the two tuples that generated the key error. Now we'll try a sneaky update command. We'll try to take our fourth tuple, and we'll identify it by having the college name equal to MIT, and we'll try to be sneaky and change the the biology major to CS. That will then violate the constraint of the uniqueness of 123 CS so if all goes well that update will be disallow, here is the update command, setting the major to CS with the college name is MIT, rerun the command and indeed it generates an error. The last thing we'll show in this example is how NULL values work with keys, so we'll try to insert two tuples again using 1,2,3, where both the college name and the major are null. So, as a reminder, the first and second attributes need to be unique in the first and third attributes need to be unique, so if NULLs counts for keys so it will generate an error, what we'll see is that we actually don't get an error and we in fact do have the data in the table with the NULL values, so the sequel standard and most database systems do allow repeated NULL values even in column that are declared as unique, for primary key declared columns most systems though not all, do not permit repeated NULL values in them. That completes our demonstration of key constraints, now let's look at attribute base check constraints. Lets create our table again with four students and this time we'll add two constraints to two of the attributes. For the GPA we're going to add the keyword check and a condition that looks kinda like the where clause in the SQL query. This condition specifies that GPAs must be less than or equal to 4.0 and greater than zero. We'll also put a check constraint on the high school size, saying that the size of the high school must be less than five thousand. So these are examples of, sort of, sanity checks, that are mostly use for catching data entry errors, saying that the attribute values must be within the expected range. Lets create the table and now we'll take a look at some data. This time we'll insert two tuples. It will be pretty easy to see how these constraints work. The first one inserts Amy with a reasonable GPA and a reasonable high school size; the second one inserts Bob with a reasonable high school size but his GPA looks a little out of whack. We run the query and the first row is inserted but the second one isn't. We take a look at the data and we see that Amy has been inserted. Now to test the constraints on the size of high school, we'll try to run an update command that multiplies all high school sizes by six. Here's the command, and when we run it, we get an error. So attribute based constraints allow us to associate a condition with a specific attribute and that condition is checked whenever we insert a tuple or update a tuple to make sure that all of the values in that attribute satisfy the constraint. A slightly more general notion is tuple based constraints. Tuple based constraints are also checked when ever a tuple is inserted or updated, but they're allowed to talk about relationships between different values in each tuple. And because we don't associate them with a specific attribute, the check itself is put at the end of the declaration of of the table. So, we start by declaring all of the attributes and then afterwards we put the keyword check again, and then the condition inside parentheses. Now this condition may look at first a little bit odd to you. It says that for each apply tuple either the decision is null or the college name is not Stanford or the major is not CS. Why don't you think about that for a second and think about what it might be saying. Now if you're good in Boolean Logic, you might have written this down using logical expressions and use some of De Morgan's laws and turned your or's and not's into implications. If not, I'll just tell you that what this is saying is that there are no people who have applied to Stanford and been admitted to CS at Stanford. Specifically, either they haven't been admitted or the college is not Stanford or the major is not CS. We'll create the table and then we'll experiment with some data. First we'll try to insert three tuples. The first one has a student applying to Stanford CS but not being admitted, second they apply to CS but it says MIT and they are admitted and then finally will generate a constraint violation by having the student apply to Stanford CS and be admitted. We run the query and, as expected the first two tuples are inserted and the third generates a violation. Now let's try some update statements. So we have a student who applied to Stanford CS and was not admitted. And with a student who applied to MITCS and was admitted. So first we'll try to take that Standford student and change the decision to yes, that's not going to work. So then we'll try taking the students admission to MIT and converting that to be an admission to Stanford and that shouldn't work either. We try all of those and neither of them succeed and both cases are tuple based constraintless check and the check condition was violated. Before I do my last set of examples, I did want to explain one thing in case you're trying these constraints at home. The constraints that I've shown so far were perfectly well in SQLite and in post risks. In my SQL as of the time of this video the check constraints both the attribute based and tuple based check constraints are accepted syntactically[sp?] by the MySQL system, but they're not enforced. So it can be a bit deceptive because you may create the tables exactly as I've done in my SQL, but then you will be allowed to insert enough data and violate the constraints. So again I recommend for trying check constraints for now SQLite or post[xx]. If you've been a shrewd observer of what we've done so far, it might have occurred to you that we had some redundancy. Specifically, the attribute base check constraints that we' ve showed can be used to enforce some other types of constraints. Very specifically if we want to have a not null constraint we can just write not null. That's a built in type of constraint. But that's equivalent to adding an attribute based check constraint that for the GPA for example checks that the GPA is not null. As a reminder is not null is a key word in the SQL language. Let's create this table and let's try to insert a tuple with a null value. We have student Amy again with a null GPA and that generates an error. A little more challenging and interesting is to try to implement key constraints using attribute based check constraints. So here's an attempt at doing so. Let's just consider a very simple table. We'll call it T and it will have one attribute A. And we'll try to write a check constraint that specifies that A is a key for T. So, here is my attempt at doing so. I declare the attribute and then in at my check I say that the value of A is not in select A from T. In other words the value of that or say attempting to insert or update is unique in table T. Well first I'm gonna tell you that I'm not allowed to execute that, there's various reasons that I can't execute it. One simple one is that I'm trying to declare a table T and refer to it before it has been declared. Another issue with declaring it is the sub query in the check constraint, we'll talk about that in a moment. There's actually a third problem with this constraint, which is we need to think about when it's being checked. If we say first attempt to insert the value A and then check the constraint, then the constraint will be violated based on the existence of itself. So this is clearly not going to work. There is in fact a different expression that might work if it weren't for a couple of other obstacles. Here's an expression that doesn't have the problem of whether we check it before or after we insert A. This is an expression of a key Constraint in a way you might not have thought of. What this says is that the number of distinct values for an attribute A must be equal to the number of tuples in the table. In other words every tuple has a distinct value for A. Now there was one small issue here which is null values because, as we mentioned, unique key constraints allow multiple instances of null. But, if we don't worry about nulls then this is expression really is a different way of saying that A is a key. We run the query and it doesn't allow it. Again we have the same problem that we're referring to table T within the check constraint that we're putting in the definition of table T. By the way, that can be overcome. Some systems do allow constraints to be declared or added to tables after the table has been specified. So that would go away. But no systems that I know of allow sub queries and especially not aggregation within check constraints. Let's pursue a little further the question of subqueries and check constraints. The key example's a little bit contrived because of course we can declare key constraints directly. But in some cases are very natural constraint that we might want to express a check constraint using sub query. And I've set up a situation right here. We create our student table as usual but when we create our apply table we want to have a constraint that says that any student ID that appears in the apply table is a valid student. In other words, there is a student coupled with that student ID. Now, we can write that as a check constraint. This is syntactically valid in the SQL standard. We specify that the student ID here in the apply table is in the set of student IDs in the student table but currently no SQL system actually supports sub queries and check constraints. Now for this, the civic type of constraint it happens to fall into a class that is known as referential integrity where we say that this student ID is referencing a student in the other, is referencing the student id in the student table and therefore any student id and apply must also exist in the student and in another video we will referential integrity in some detail. But, not every check constraint with a subquery falls in the class of referential integrity constraints. The example I gave for keys doesn't and neither does the one here. Now, this is admittedly a little contrived. But what this says is that every college's enrollment must be bigger than any high school and so we write that by writing the check constraint in the college table that the enrollment is greater than the maximum high school size from the student table. Now, again, no system currently will support this. However, it is in the SQL standard. Now one thing I want to mention about check constraints with subqueries is that they can be kind of deceptive. And we can take a look at the apply table again. Supposing this was in fact supported by a system. It would check whenever we inserted a tuple into apply or it updated a student ID in apply that the constraint holds. But what it will not check is when things change in student. So we could write this constraint and every time we do an insert or update and apply it could be verified but somebody could go and change the student table and delete a student ID and then what we feel as the constraint here is no longer actually holding. So it can be tricky to use those subqueries. Again, when we do referential integrity as in this example, the referential integrity system will take care of making sure the constraint holds. But when we have an example like the one with the enrollments, if we say change the high school size in the student table, it would not activate this constraint checking that's specified with the college table. The last type of constraint I am going to show are general assertions. General assertions are very powerful and they are in the SQL standard, but unfortunately they currently are not supported by any database system. The first assertion I'm going to write is coming back to the issue of trying to declare or trying to enforce a key constraint without using the built-in facilities. Let me just write the command here. It says we're going to create a assertion called key. Notice that this assertion is not associated with a specific table. Although this assertion only talks about table T Assertions can refer to any number of tables in the database. The way an assertion works is we create an assertion and we give it a name, and the reason for this is so we can delete it later if we wish. Then the keyword check appears. And then we write a condition. And the conditions can be quite complicated. The assertion is saying that this condition, written in SQL-like language must always be true on the database. So the particular condition that I've put in this first example is a condition we use to check whether attribute A is a key in table T. It says that the number of distinct values in attribute A must be equal to the number tuples in T. Now I can try to run this, but I guarantee you that it's not supported. Let's look at some other example assertions we might write if they were supported. Here's an example that implements this referential integrity that I was describing earlier. This referential integrity constraint is saying that the student IDs in the apply table must also exist in the student table. Now when we write an assertion of that form, we tend to often write it in the negative form, specifically we say that it's not the case that something bad happens. It's not the case that there's some tuple in apply where the student ID is not in the student table. You can try on your own to write this in a more positive fashion, but you'll actually find that using SQL constructs it's not possible. It's actually very common for assertions to specify the bad thing in a subquery and then write not exists. As a final example, let's suppose that we require that the average GPA of students who are accepted to college is greater than 3. 0. We can write that as an assertion pretty much exactly as I just described it. We take the average GPA of students where their ID is among the IDs in the apply relation where the decision was yes, and our assertion states that that average must be greater than 3.0 So so far I've described how assertions are created. Let me just briefly mention how they are checked or how they would be checked if they were implemented. Any system that implements this very general form of assertion must determine every possible change to the database that could violate the assertion. In this case, modifying a GPA, modifying a student ID, inserting or deleting from students or apply could all potentially violate the constraint. And in that case, after each of those types of modifications to the database, the system would need to check the constraint, make sure that it's still satisfied, and if not, generate an error and disallow the database change. So I've only talked about creating assertions. Let me just talk very briefly about how a system would enforce general assertions of this form if it supported them. What the system needs to do, is monitor every possible change to the database that could cause the assertion to become violated. So we take a look at this particular assertion, it could become violated if we changed the student GPA. If we inserted a student, even if we deleted a student, or if we inserted an application that was now having a decision of yes, or updated the application status. So all of those changes have to be monitored by the system, the constraint has to be checked after the change, and if the constraint is no longer satisfied, an error is generated, and the change is undone. That concludes our discussion of constraints with the exception of referential integrity which is covered in a separate video, and the related topic of triggers, which is also covered in a separate video.