This is the second of two videos about the relational algebra. In the first video, we learned about the select and project operators in various types of joins. This video will cover set operators, union difference and intersection, the renaming operator, and different notations for expressions of relational algebra. Just as a reminder, we apply a relational algebra query or expression to a set of relations and we get as a result of that expression a relation as our answer. For our examples, we're using an imaginary database about college admissions. We have a relation of colleges, a relation of students, and a relation with information about students applying to colleges. We'll keep at the bottom of the video these three depictions of those relations with a few abbreviations used so that names aren't too long. Let's move ahead to our first operator. The first of three set operators is the union operator, and it's a standard set union that you learned about in elementary school. Let's suppose, for example, that we want a list of the college and student names in our database. So we just want those as list. For example, we might want Stanford and Susan and Cornell and Mary and John and so on. Now you might think we can generate this list by using one of the operators we've already learned for combining information from multiple relations, such as the cross-product operator or the natural join operator. The problem with those operators is that they kind of combine information from multiple relations horizontally. They might take a tuple T1 from one relation and tuple T2 from the other and kind of match them. But that's not what we want to do here. We want to combine the information vertically to create our list. And to do that we're going to use is the union operator. So in order to get a list of the college names and the student names, we'll project the college name from the college relation. That gives us a list of college names. We'll similarly project the student name from the student relation, and we've got those two lists and we'll just apply the union operator between them and that will give us our result. Now, technically, in relational algebra in order to union two lists they have to have the same schema, that means that same attribute name and these don't, but we'll correct that later. For now, you get the basic idea of the union operator. Our next set operator is the difference operator, and this one can be extremely useful. As an example, let's suppose we want to find the IDs of students who didn't apply to any colleges. It sounds like a complicated query, but we'll actually write it in a very simple fashion. We'll start by projecting the student ID from the student relation itself and that will give us all of this student IDs. Then lets project the student ID from the apply relation and that gives us the IDs of all students who have applied somewhere. All we need to do is take the difference operator, written with the minus sign, and that gives us the result of our query. It will take all IDs of the students and then subtract the ones who have applied somewhere. Suppose instead that we wanted the names of the students who didn't apply anywhere, not just their IDs. So that's a little bit more complicated. You might think, "Oh, just add student name to the projection list here," but if we do that, then we're trying to subtract a set that has just IDs from a set that has the pair of ID names. And we can't have the student name here because the student name isn't part of the apply relation. So there is a nice trick, however, that's going to do what we want. Let me erase these here. What we're going to do is we're going to take this whole expression here which gives us the student IDs who didn't apply anywhere and watch this. Pretty clever. We're gonna do a natural join with the student relation. And now, that's called a join back. So we've taken the IDs, a certain select set of IDs and we've joined them back to the student relation. That's going to give us a schema that's the student relation itself, and then we're just going to add to that a projection of the student name. And that will give us our desired answer. The last of the three set operators is the intersection operator. So let's suppose we want to find names that are both a college name and a student name. So perhaps, Washington is the name of a student and a college. To find those, we're going to do something similar to what we've done in the previous examples. Let's start by getting the college names. Then let's get the student names, and then what we're going to do is just perform an intersection of those two expressions and that will give us the result that we want. Now like our previous example, technically speaking, the two expressions on the two sides of the intersection ought to have the same schema and again, I'll show you just a little bit later, how we take care of that. Now, it turns out that intersection actually doesn't add any expressive power to our language and I'm going to show that actually in two different ways. The first way is that if we have two expressions, let's say E1 and E2 and we perform their intersection, that is exactly equivalent to writing E1 minus, using the difference operator, E1 minus E2. Now if you're not convinced of that immediately, let's go back to Venn diagrams, again a concept you probably learned early in your schooling. So let's make a picture of two circles. And let's say that the first circle Circle represents the result of expression E1 and the second rear circle represents the result of expression E2. Now if we take the entire circle E1. Let's shade that in purple. And then we take the result, so that's E1 here, and then we take E1, the result of the expression E1 minus E2 here, we'll write that in green, so that's everything in E1 that's not in E2, that's this. Okay? And if we take the purple minus the green you will see that we actually do get the intersection here. So that's a simple property of set Operations but what that's telling us is that this intersection operator here isn't giving us more expressive power because any expression that we can write in this fashion, we can equivalently right with the difference operator in this fashion. Let me show you a completely different way in which intersection doesn't add any expressive power. So, let's go back to E1 intersect E2 and as a reminder for this to be completely correct these have to have the same schema as equal between the two. E1 intersect E2 turns out to be exactly the same as E1 natural join E2 in this particular case because the schema is the same. Remember what natural join does. Natural join says that you match up all columns that are equal and you eliminate duplicate values of columns. So I'll just let you work out for yourself that this is indeed an equivalence and a second reason that the intersection doesn't add any expressive power. Nevertheless, the intersection can be very useful to use in queries. Our last operator is the rename operator. The rename operator is necessary to express certain queries in relational algebra. Let me first show the form of the operator and then we'll see it in use. The rename operator uses the Greek symbol rho. And like all of our other operators, it applies to the result of any expression of relational algebra. And what the rename operator does is it reassigns the schema in the result of E. So we compute E, we get a relation as a result, and it says that I'm going to call the result of that, relation R with attributes A1 through An and then when this expression itself is embedded in more complex expression, we can use this schema to describe the result of the E. Again we'll see shortly why that's useful. There are a couple of the abbreviations that are used in the rename operator, this form is the general form here. One abbreviation is if we just want to use the same attribute names that came out of E, but change the relation name, we write row sub R applied to E, and similarly, if we want to use just the attribute names, if we want to change, I'm sorry, just the attribute names then we write attribute list here and it would keep the same relation name. This form of course has to have a list of attributes or we would not be able to distinguish it from the previous form. But again these are just abbreviations and the general form is the one up here. Okay, so now let's see the rename operator in use. The first use of the rename operator is something I alluded to earlier in this video which is the fact that when we do the set operators, the union, difference, and intersect operators, we do expect the schemas on the two the sides of the operator to match, and in a couple of our examples they didn't match, and the rename operator will allow us to fix that. So, for example, if we're doing the list of college and student names, and let me just remind you how we wrote that query. We took the C name from college and we took the s name from students and we did the big union of those. Now, to make this technically correct, these two attribute names would have to be the same. So we're just going to apply the rename operator. Let's say that we're gonna rename the result of this first expression to say the relation name C with attribute name. And let's make the result of the second expression similarly be the relation C with attribute name. And now we have two matching schemas and then we can properly perform the union operator. Again, this is just a syntactic necessity to have well-formed relational algebra expressions. Now, the second use of the rename operator is a little more complicated and quite a bit more important actually which is disambiguation in self joins and you probably have no idea what I'm talking about when I say that, but let me give an example. Let's suppose that we wanted to have a query that finds pairs of colleges in the same state. Now, think about that. So we want to have, for example, Stanford and Berkeley and Berkeley and UCLA and so on. So that, as you can see, unlike the union operator, we're looking for this horizontal joining here. So we're going to have to combine essentially two instances of the college relation. And that's exactly what we're going to do. We're effectively going to do college join college making the state equal. So, let's work on that a little bit. So, what we wanna do is we wanna have college and we want to, let's just start with, say, the cross-product of college. And then we want to somehow say, "Well, the state equals the state." But that's not gonna work. Which state are these? And how do we describe the two instances of college? So what we're going to do and let me just erase this, is we're going to rename those two instances of colleges so they have different names. So we're going to take the first instance of college here and we're going to apply a rename operator to that. And we'll call it C1 and we'll say that that has name1, state1, and enrollment1. And then we'll take the second instance here. We'll call it C2, so N2, S2, E2 of college and now we have two different relations. So what we can do is we can take the cross-product of those two like that, and then we can select where S1 equals S2, okay? And that gives us pairs of college in the same state. Actually, let me show you an even trickier, simpler way of doing this. Let's take away the selection operator here, okay? And let's take away this. And let's make this into a natural join. Now that's not gonna work quite yet because the natural join requires attribute names to be the same, and we don't have any attribute names that are the same. So the last little trick we're gonna do is we're gonna make those two attribute names, S, be the same. And now when we do the natural join, it's gonna require equality on those two S's and everything is gonna be great. Okay? Now, things are still a little bit more complicated. One problem with this query is that we are going to get colleges paired with themselves. So we're going to get from this, for example, Stanford Stanford. If you think about it, right? Berkley Berkeley, as well as Stanford Berkeley. Now, that's not really what we want presumably. Presumably we actually want different colleges. but that's pretty easy to handle, actually. Let's put a selection condition here so that the name one is not equal to name two. Great. We took care of that. So in that case we will no longer get Stanford Standford and Berkeley Berkeley. Ah, but there's still one more problem. We'll get Stanford Berkeley but we'll also get Berkeley Stanford. Now, let me pause for a moment and see if you can think of a simple way to solve this problem. Actually, there's a surprisingly simple way, kind of clever. We're gonna take away this not equals and we're going replace it with a less than. And now we'll only get pairs where the first one is less than the second. So Stanford and Berkeley goes away and we get Berkley Stanford. And this is our final query for what we wanted to do here. Now what I really wanted to show, aside from some of the uses of relational algebra, is the fact that the rename operator was for this query absolutely necessary. We could not have done this query without the rename operator. Now we've seen all the operators of relational algebra. Before we wrap up the video I did want to mention that there are some other notations that can be used for relational algebra expressions. So far we've just been writing our expressions in a standard form with relation names and operators between those names and applying to those names. But sometimes people prefer to write using a more linear notation of assignment statements and sometimes people like to write the expressions as trees. So I'm just gonna briefly show a couple of examples of those and then we'll wrap up. So assignment statements are a way to break down relational algebra expressions into their parts. Let's do the same query we just finished as a big expression which is the pairs of colleges that are on the same state. We'll start by writing two assignment statements that do the rename of the two instances of the college relation. So we'll start with C1 colon equals and we'll use a rename operator and now we use the abbreviated form that just lists attribute names. So we'll see say C one, S, E one of college and we'll similarly say that C2 gets the rename, and we'll call it C2SE2 of college, and remember we use the same S here so that we can do the natural join. So, now we'll say that college pairs gets C1 natural join C2, and then finally we'll do our selection condition. So our final answer will be the selection where N1 is less than N2 of CP. And again, this is equivalent to the expression that we saw on the earlier slide. It's just a notation that sometimes people prefer to modularize their expressions. The second alternate notation I'm going to show is expression trees. And expression trees are actually commonly used in relational algebra. They allow you to visualize the structure of the expression a little bit better. And as it turns out when SQL is compiled in database systems, it's often compiled into an expression tree that looks very much like what I'm gonna show you right now. So for this example let's suppose that we want to find the GPAs of students who are applying to CS in California. So that's going to involve all three relations because we're looking at the state is in California, and we're looking at the student GPA's and we're looking at them applying to CS. So what we're going to do is we're going to make a little tree notation here where we're going to first do the natural join of these three relations. So technically the expression I'm going to show you is going to stop down here. It's not going to actually have the tables. So the leaves of the expression are going to be the three relations: college, students, and apply. And in relational algebra trees, the leaves are always relation names. And we're going to do the natural join of those three which as a reminder enforces equality of the college name against the college name here against the college name here, and the student ID here and the student ID here. That enforcement means that we get triples that are talking about a student applying to a particular college. And then we're going to apply to that, and so that's going to be written as a new note above this one in the tree. The selection condition that says that the state equals California and the major equals CS. And finally, we'll put on top of that the projection that gets the GPA. okay? Now actually this expression is exactly equivalent to if we wrote it linearly, project the GPA, select etc. of the three college join student, join apply. I'm just abbreviating here. That would be an equivalent expression. But again, people often like to use the tree notation because it does allow you to visualize the structure of the expression, and it is used inside implementations of the SQL language. Let me finish up by summarizing relational algebra. Let's start with the core constructs of the language. So a relation name is a query in relational algebra, and then we use operators that combine relations and filter relations. So we have the select operator that applies a condition to the result of an expression. We have the project operator that gives us a set of attributes that we take from the result of an expression. We have the expression one cross-product expression two. And again those can be any expressions. Then we have expression one union expression two. And we have expression one minus expression two. And finally we have the rename operator that takes an expression and renames the result of that, the schema in the result of that expression. Now, you probably noticed that I skipped a few of our favorite operators, but this is the core of the language and all the other operators are actually abbreviations that don't increase the expressive power of the language but they can be very useful performing queries. And the abbreviations that we learned were expression one natural join expression two. They were expression one theta join expression two. And, finally, expression one intersect expression two. All of those where we had a method of rewriting them using the core operators. Just a small aside about parentheses. A parentheses are used in relational expressions for, relational algebraic expressions, for disambiguation, similarly to arithmetic expressions. I was a little cavalier about whether I included parentheses or not, but as you write your relational algebra expressions you will see that it's pretty straightforward to figure out when disambiguation is needed. So to conclude relational algebra, entirely it's a formal language. It's based on sets, set operators and other operators that combine data from multiple relations. It takes relations as input, it produces relations as answers and it does form the formal foundation of implemented relational database management.