In this demo, we'll be learning some more features of the SQL language. Specifically, we'll be learning about table variables and about set operators. We already learned the basic select statement. which can be quite powerful for writing queries but we'll learn some constructs in these demos that will give us even more expressive power. The first construct is table variables. Table variables are in the FROM clause, and they actually serve two uses. One is simply to make queries more readable, as we'll see. But a second purpose is to rename relations that are used in the FROM clause, particularly when we have two instances of the same relation. This is exactly what we needed in the relational algebra when we wrote joins that included two instances of the same relation. The second construct we'll be learning, actually a set of constructs, in this video, are the set operators. And we'll be learning the same three set operators we had in relational algebra: the union operator, the intersect operator, and the except operator which is the minus operator. We'll be doing a demo and the demo will use the same college admissions database that we've been using in previous demos where we have tables about college information, student information and students applying to colleges. Let's move to the demo. Let's start with a big join query that we'll use to introduce table variables. This query involves all three relations. It joins the three relations on their shared attributes and then it selects a bunch of information. So here we see the result of that query. So the main point of this query is not the result but just to show you how table variables are used in the FROM clause. We can add two, each of our relation names, a variable. We'll use S for student and C for college and A for apply. And then everywhere else in the query, instead of writing the full relation name we can just use the variable. In this case we're not changing the expressiveness, we're not changing the outcome of the query, we're really just making it a bit more readable and we can do the same thing here in this left clause. We'll take S and A and so on. Then we'll run the query and we'll get exactly the same result, no change. Now let's look at where table variables are actually useful. What we want to get in this query is all pairs of students who have the same GPA. This is kind of similar to the relational algebra query we did where we found all pairs of colleges that are in the same state. In order to do that we need to have two instances of the student relation. So we'll call one instance, S1, and we'll call the other instance S2. And the FROM will do the cross-product of those two, so it will consider every every possible pair of students from the student relation. From all those pairs, we'll take the pairs where the student had the same GPA and will return the ID, name, and GPA for each of the two students. So let's go ahead and execute the query; and here we can see the result. Now, this result is exactly what we wrote. It literally is every pair of students that have the same GPA, but it might not be what we intended. Amy and Amy, the same student. Well Amy has the same GPA as herself, but more likely we just wanted different students who had the same GPA. So to do that we'll add an and that says these are two different students. The SIDs of the students are different. Now, let's run the query and see what happens. Now, we see that we no longer have Amy and Amy, and every student is paired with a different student. We do have two Amy's here, but don't be alarmed, this Amy is 123 and this Amy is 654. So things are looking quite a bit better, but there's still one thing that we might not want in the result of the query which is that we have Amy paired with Doris and then we have Doris paired with Amy. So we're actually getting every pair of students twice in the two different orders. As it turns out, that's very easy to fix. We only need to erase one character to make that work. Maybe you can think about what that character is. Here it is. Instead of looking at not equals, we'll just make it less than. And then we'll get every pair of students only once because we'll always be listing the one with the smaller SID first and finally we get the answer that we probably intended in the first place. Now let's take a look at the set operators and we'll start with union. Just like in our relational algebra video let's use the union operator to generate a list that includes names of colleges together with names of students. So here's the query that will do it for us and we go ahead and execute the query and we see our result. Now I left the schema as having the C name in the first part of the union and the S name in the second. SQL allowed me to do that and it chose to use the C name to label the result. If I want to unify the schemas of the two sides of the union and give a new label for the result, I use the "as" as we saw earlier for re-naming attributes in the result of queries. So I'll add "as name" to both sides of the union, run the query, and now I see name in the result. Now one thing you might have noticed is that this result is actually sorted. We didn't ask for it to be sorted, but for some reason the system sorted it for us. And I can actually explain why that happened. I'll also mention that if I ran this same query on another system, it might not come out sorted. In fact, it will not come out sorted because I tried it. Here's the deal. The union operator in SQL by default eliminates duplicates in its results. So if we have two Amy's, which in fact we do, we only get one Amy in our result. And similarly, for Craig, we have two of those as well. So that's the default, and it so happens the system I'm using today, which is called SQL Lite, eliminates duplicates gets by sorting the result. So, it sorts the result, looks for adjacent pairs that are the same and eliminates all but one of those, and then it gives us the answer. But again, I want to emphasize that's not something one can count on when one runs the same query on a different system or even on the same system on a different day. Now, if we want to have the duplicates in our result, that's something we can do quite easily. We add to union the word all that will turn the set operator into what's technically a multi-set operator that retains duplicates. We run the query. Well, the first thing we notice is it's not sorted anymore. That's because it didn't need to eliminate the duplicates. But if we look closely, we'll also see that the duplicates are now there. We have two Amys, for example, and we have two Craigs as well. If we want this result to be sorted and to guarantee that the other one's sorted, we would add an order by clause. So we can just say order by name. We run the query and now we have the result in sorted order. Our next query demonstrates the intersect operator. This query is going to get the IDs of all students who have applied to both CS for a major and EE for a major. So, very simple query. We get the IDs of students who applied to CS, the IDs of students who applied to EE, and then we perform the intersect operator on the result of those two queries. We execute it, and we find that there are indeed two students who applied to CS and EE. Some database systems don't support the intersect operator. They don't lose any expressive power. We just have to write our queries in different ways. So, this next query is computing exactly the same thing. The sIDs of students who have applied to both CS and EE, but this time we're doing it by doing two instances of the apply relation. One of these self joins, so we have to use table variables again, so we take every pair of apply tuples, we look at cases where it's the same student, and in one case, they're applying for CS, in the other case they're applying for EE, and we'll return the sID of those students. So we run the query and we get sort of the same answer, but not exactly, because we have a whole bunch of duplicates now that we didn't get when we did it with an intersect operator. Now, where did those duplicates come from? Let's take a look at the apply relation itself. Here, we see that student 123 applied to that there are indeed two students who applied to CS and EE. Some database systems don't support the intersect operator. They don't lose any expressive power. We just have to write our queries in different ways. So this next query is computing exactly the same thing, the SIDs of students who have applied to both CS and EE, but this time we're doing it by doing two instances of the apply relation, one of these self joins, so we have to use table variables again. So we take every pair of apply tuples, we look at cases where it's the same student, and in one case, they're applying for CS. In the other case, they're applying for EE, and we'll return the SID of those students. So we run the query and we get sort of the same answer, but not exactly because we have a whole bunch of duplicates now that we didn't get when we did it with an intersect operator. Now where did those duplicates come from? Let's take a look at the apply relation itself. Here we see that student 123 applied to CS and to EE and to CS again and to EE again. And we're gonna get all pairs of 123 tuples, where one pair pair of the tuples is CS and the other is EE. So, we'll get CS with EE, CS with EE and so on. Going back to our query result. Here it is. We can see that we got the four 123's when we ran the query. Well, that's easy to get rid of. We just write select distinct and that will get rid of duplicates and now we're back to our original query result. Now instead of finding students who applied to both CS and EE, let's find students who applied to CS but did not apply to EE. For that we need the difference operator. It's called difference in relational Algebra. Sometimes, it's called minus. The word that's used in the SQL standard is the word except. So here's our query. We find the student IDs who applied to CS and then we take away from those the IDs of students who applied to EE. We run the query and we find that there are three students who applied to CS and not to EE. Some database systems don't support the except operator either and here, things get a little tricky. So, let's try to rewrite that query without using the except operator. So as a reminder, we want to find students who applied to CS but did not apply to EE. So, here's my attempt at writing that query. I again do a self join of apply with apply, and I find all pairs where it's the same student we're talking about and the major in one of the tuples of CS and the major in the other one is not EE. Well, it looks pretty good. Let's see what happens. Whoa, we got a lot of results. Okay, well that's probably just that problem with duplicates again, so let's just add distinct and go for it. It still seems like a lot of results. Let's go back to our previous query that uses except and then we found that there were three students in the result where here we're still getting five in the result. Well, if we think about exactly what we wrote, what we wrote is finding all pairs of apply records where it's the same student and they applied to CS in one of the pairs and they didn't apply to EE in the other. So, it could be, for example, biology or geology. But the problem is that when we consider these pairs, that doesn't mean there's not another pair with the same student where they applied to CS and EE. All this is actually finding is students who applied to CS and also applied to another major that's not EE. So, that's quite different from the query we're shooting for. And actually, the interesting thing is that with the constructs we've seen so far in SQL, it's not possible to write the query we had earlier without using the except operator. But in later videos we will see additional constructs in SQL that do allow us to write that query.