This is the second of two videos about the relational algebra.
In the first video, we learned
about the select and project operators in various types of joins.
This video will cover set operators,
union difference and intersection, the
renaming operator, and different
notations for expressions of relational algebra.
Just as a reminder, we apply
a relational algebra query or
expression to a set
of relations and we get
as a result of that expression
a relation as our answer.
For our examples, we're using
an imaginary database about college admissions.
We have a relation of colleges,
a relation of students, and a
relation with information about students applying to colleges.
We'll keep at the bottom of
the video these three depictions of
those relations with a few
abbreviations used so that names aren't too long.
Let's move ahead to our first operator.
The first of three set operators
is the union operator, and it's
a standard set union that
you learned about in elementary school.
Let's suppose, for example, that we
want a list of the
college and student names in our database.
So we just want those as list.
For example, we might want
Stanford and Susan
and Cornell and Mary
and John and so on.
Now you might think
we can generate this list by
using one of the operators we've
already learned for combining information
from multiple relations, such as
the cross-product operator or the natural join operator.
The problem with those operators
is that they kind of combine information
from multiple relations horizontally.
They might take a tuple T1
from one relation and tuple T2
from the other and kind of match them.
But that's not what we want to do here.
We want to combine the information vertically to create our list.
And to do that we're going to use is the union operator.
So in order to get a list
of the college names and the
student names, we'll project the college name from the college relation.
That gives us a list of college names.
We'll similarly project the student
name from the student relation,
and we've got those two lists
and we'll just apply the union
operator between them and that will give us our result.
Now, technically, in relational algebra
in order to union two lists
they have to have the same schema,
that means that same attribute name
and these don't, but we'll correct that later.
For now, you get the basic idea of the union operator.
Our next set operator is
the difference operator, and this one can be extremely useful.
As an example, let's suppose
we want to find the IDs
of students who didn't apply to any colleges.
It sounds like a complicated query, but
we'll actually write it in a very simple fashion.
We'll start by projecting the student
ID from the student relation
itself and that will
give us all of this student IDs.
Then lets project the student
ID from the apply relation
and that gives us the IDs
of all students who have applied somewhere.
All we need to do is
take the difference operator, written
with the minus sign, and that gives us the result of our query.
It will take all IDs of
the students and then subtract
the ones who have applied somewhere.
Suppose instead that we
wanted the names of the students
who didn't apply anywhere, not just their IDs.
So that's a little bit more complicated.
You might think, "Oh, just add
student name to the projection
list here," but if we
do that, then we're trying to
subtract a set that has
just IDs from a set that has the pair of ID names.
And we can't have
the student name here because the
student name isn't part of the apply relation.
So there is a nice trick, however, that's going to do what we want.
Let me erase these here.
What we're going to do is
we're going to take this whole
expression here which gives us
the student IDs who didn't apply anywhere and watch this.
Pretty clever.
We're gonna do a natural join
with the student relation.
And now, that's called a join back.
So we've taken the IDs,
a certain select set of IDs
and we've joined them back to the student relation.
That's going to give us a
schema that's the student relation
itself, and then we're just
going to add to that a
projection of the student name.
And that will give us our desired answer.
The last of the three set operators is the intersection operator.
So let's suppose we want
to find names that are both
a college name and a student name.
So perhaps, Washington is the name of a student and a college.
To find those, we're going to
do something similar to what we've done in the previous examples.
Let's start by getting the college names.
Then let's get the student names,
and then what we're going to do
is just perform an intersection of
those two expressions and that will give us the result that we want.
Now like our previous example,
technically speaking, the two
expressions on the two sides
of the intersection ought to have
the same schema and again, I'll
show you just a little bit
later, how we take care of that.
Now, it turns out
that intersection actually doesn't add
any expressive power to our
language and I'm going
to show that actually in two different ways.
The first way is that
if we have two expressions, let's
say E1 and E2 and
we perform their intersection, that
is exactly equivalent to
writing E1 minus, using the
difference operator, E1 minus E2.
Now if you're not
convinced of that immediately, let's go
back to Venn diagrams, again
a concept you probably learned early in your schooling.
So let's make a picture of two circles.
And let's say that the first
circle Circle represents the result of
expression E1 and the
second rear circle represents the
result of expression E2.
Now if we
take the entire circle E1.
Let's shade that in purple.
And then we take the
result, so that's E1 here,
and then we take E1, the
result of the expression E1
minus E2 here, we'll write
that in green, so that's everything
in E1 that's not in
E2, that's this. Okay?
And if we take the purple minus
the green you will see
that we actually do get the intersection here.
So that's a simple property
of set Operations but what
that's telling us is that
this intersection operator here isn't
giving us more expressive power because
any expression that we can
write in this fashion, we can equivalently
right with the difference operator in this fashion.
Let me show you a completely
different way in which intersection
doesn't add any expressive power.
So, let's go back to
E1 intersect E2 and as
a reminder for this to be
completely correct these have to
have the same schema as equal between the two.
E1 intersect E2 turns out
to be exactly the
same as E1 natural join
E2 in this particular case
because the schema is the same.
Remember what natural join does.
Natural join says that you
match up all columns that
are equal and you eliminate duplicate values of columns.
So I'll just let you work
out for yourself that this
is indeed an equivalence and
a second reason that the
intersection doesn't add any expressive power.
Nevertheless, the intersection can be
very useful to use in queries.
Our last operator is the rename operator.
The rename operator is necessary
to express certain queries in relational algebra.
Let me first show the form of the operator and then we'll see it in use.
The rename operator uses the Greek symbol rho.
And like all of our
other operators, it applies to
the result of any expression of relational algebra.
And what the rename operator does
is it reassigns the schema
in the result of E. So
we compute E, we get
a relation as a result,
and it says that I'm
going to call the result of
that, relation R with
attributes A1 through An and
then when this expression itself
is embedded in more complex expression,
we can use this schema to
describe the result of the
E. Again we'll see shortly why that's useful.
There are a couple of the abbreviations
that are used in the rename
operator, this form is the general form here.
One abbreviation is if we
just want to use the
same attribute names that came
out of E, but change the
relation name, we write row
sub R applied to E,
and similarly, if we want
to use just the attribute
names, if we
want to change, I'm sorry, just
the attribute names then we
write attribute list here
and it would keep the same relation name.
This form of course has
to have a list of attributes
or we would not be able to
distinguish it from the previous form.
But again these are just abbreviations
and the general form is the one up here.
Okay, so now let's see the rename operator in use.
The first use of the rename
operator is something I alluded
to earlier in this video
which is the fact that when
we do the set operators,
the union, difference, and intersect
operators, we do expect
the schemas on the two
the sides of the operator to
match, and in a couple
of our examples they didn't match,
and the rename operator will allow us to fix that.
So, for example, if we're
doing the list of college and
student names, and let me
just remind you how we wrote that query.
We took the C name from
college and we
took the s name from students
and we did the big union of those.
Now, to make this technically
correct, these two attribute names would have to be the same.
So we're just going to apply the rename operator.
Let's say that we're gonna
rename the result of this
first expression to say
the relation name C with attribute name.
And let's make the
result of the second expression similarly
be the relation C with attribute name.
And now we have two
matching schemas and then
we can properly perform the union operator.
Again, this is just a
syntactic necessity to have
well-formed relational algebra expressions.
Now, the second use of
the rename operator is a
little more complicated and quite
a bit more important actually which
is disambiguation in self
joins and you probably have no
idea what I'm talking
about when I say that, but let me give an example.
Let's suppose that we wanted
to have a query that finds
pairs of colleges in the same state.
Now, think about that.
So we want to have, for example,
Stanford and Berkeley and
Berkeley and UCLA and so on.
So that, as you
can see, unlike the union
operator, we're looking for this
horizontal joining here. So
we're going to have to combine essentially
two instances of the college relation.
And that's exactly what we're going to do.
We're effectively going to do college join
college making the state equal.
So, let's work on that a little bit.
So, what we wanna
do is we wanna have college
and we want to, let's just
start with, say, the cross-product of college.
And then we want to somehow say,
"Well, the state equals the state."
But that's not gonna work.
Which state are these?
And how do we describe the two instances of college?
So what we're going to
do and let me just
erase this, is we're going
to rename those two instances
of colleges so they have different names.
So we're going to take the first
instance of college here and
we're going to apply a rename operator to that.
And we'll call it C1 and we'll
say that that has name1, state1, and enrollment1.
And then we'll take the second instance here.
We'll call it C2, so N2,
S2, E2 of college and
now we have two different relations.
So what we can do is
we can take the cross-product
of those two like that,
and then we can select where
S1 equals S2, okay?
And that gives us pairs of college in the same state.
Actually, let me show you
an even trickier, simpler way of doing this.
Let's take away the selection operator
here, okay?
And let's take away this.
And let's make this into a natural join.
Now that's not gonna work quite
yet because the natural join
requires attribute names to
be the same, and we don't
have any attribute names that are the same.
So the last little trick we're
gonna do is we're
gonna make those two attribute names,
S, be the same.
And now when we
do the natural join, it's gonna
require equality on those two
S's and everything is gonna be great.
Okay?
Now, things are still a little bit more complicated.
One problem with this query
is that we are going to
get colleges paired with themselves.
So we're going to get from
this, for example, Stanford Stanford.
If you think about it, right?
Berkley Berkeley, as well as Stanford Berkeley.
Now, that's not really what we want presumably.
Presumably we actually want different colleges.
but that's pretty easy to handle, actually.
Let's put a selection condition here
so that the name one is not equal to name two.
Great. We took care of that.
So in that case we will
no longer get Stanford Standford and Berkeley Berkeley.
Ah, but there's still one more problem.
We'll get Stanford Berkeley but we'll  also get Berkeley Stanford.
Now, let me
pause for a moment and see
if you can think of a
simple way to solve this problem.
Actually, there's a surprisingly simple way, kind of clever.
We're gonna take away this not
equals and we're going replace it with a less than.
And now we'll only
get pairs where the first one is less than the second.
So Stanford and Berkeley goes away and we get Berkley Stanford.
And this is our final query
for what we wanted to do here.
Now what I really wanted to
show, aside from some of the
uses of relational algebra, is
the fact that the rename operator
was for this query absolutely necessary.
We could not have done this query without the rename operator.
Now we've seen all the operators of relational algebra.
Before we wrap up the
video I did want
to mention that there are
some other notations that can
be used for relational algebra expressions.
So far we've just been writing
our expressions in a standard form
with relation names and operators
between those names and applying to those names.
But sometimes people prefer to
write using a more
linear notation of assignment statements
and sometimes people like to write the expressions as trees.
So I'm just gonna briefly show a couple of examples of those and then we'll wrap up.
So assignment statements are a
way to break down relational
algebra expressions into their parts.
Let's do the same query we
just finished as a big
expression which is the
pairs of colleges that are on the same state.
We'll start by writing two assignment
statements that do the rename
of the two instances of the college relation.
So we'll start with
C1 colon equals and we'll
use a rename operator and now
we use the abbreviated form that just lists attribute names.
So we'll see say C one,
S, E one of
college and we'll similarly
say that C2 gets the
rename, and we'll call it
C2SE2 of college,
and remember we use the
same S here so that we can do the natural join.
So, now we'll say
that college pairs gets C1
natural join C2, and
then finally we'll do our selection condition.
So our final answer will be
the selection where N1 is
less than N2 of CP.
And again, this is equivalent to
the expression that we saw on the earlier slide.
It's just a notation that sometimes
people prefer to modularize their expressions.
The second alternate notation I'm going to show is expression trees.
And expression trees are actually commonly used in relational algebra.
They allow you to visualize the structure
of the expression a little bit better.
And as it turns out when SQL
is compiled in database systems,
it's often compiled into an
expression tree that looks very
much like what I'm gonna show you right now.
So for this example let's suppose
that we want to find the
GPAs of students who are applying to CS in California.
So that's going to
involve all three relations because
we're looking at the
state is in California, and
we're looking at the student GPA's
and we're looking at them applying to CS.
So what we're going
to do is we're going to
make a little tree notation here
where we're going to first do
the natural join of these three relations.
So technically the expression
I'm going to show you is going to stop down here.
It's not going to actually have the tables.
So the leaves of the expression are
going to be the three relations: college, students, and apply.
And in relational algebra trees, the
leaves are always relation names.
And we're going to do the natural
join of those three which
as a reminder enforces equality of
the college name against the
college name here against the
college name here, and the
student ID here and the student ID here.
That enforcement means that we
get triples that are talking
about a student applying to a particular college.
And then we're going to apply to
that, and so that's going to
be written as a new note above this one in the tree.
The selection condition that says
that the state equals California and the major equals CS.
And finally, we'll put on
top of that the projection that gets the GPA.
okay?
Now actually this expression is
exactly equivalent to if
we wrote it linearly, project the
GPA, select etc. of
the three college join student, join apply.
I'm just abbreviating here.
That would be an equivalent expression.
But again, people often like
to use the tree notation
because it does allow you to
visualize the structure of the
expression, and it is used
inside implementations of the SQL language.
Let me finish up by summarizing relational algebra.
Let's start with the core constructs of the language.
So a relation name is
a query in relational algebra,
and then we use operators that combine relations and filter relations.
So we have the select operator
that applies a condition to the result of an expression.
We have the project operator that
gives us a set of
attributes that we take from the result of an expression.
We have the expression one
cross-product expression two.
And again those can be any expressions.
Then we have expression one
union expression two.
And we have expression one minus expression two.
And finally we have
the rename operator that takes
an expression and renames the
result of that, the
schema in the result of that expression.
Now, you probably noticed that
I skipped a few of our
favorite operators, but this
is the core of the language
and all the other operators are
actually abbreviations that don't
increase the expressive power of
the language but they can be very useful performing queries.
And the abbreviations that we
learned were expression one
natural join expression two.
They were expression one
theta join expression two.
And, finally, expression one intersect expression two.
All of those where we had
a method of rewriting them using the core operators.
Just a small aside about parentheses.
A parentheses are used in
relational expressions for, relational
algebraic expressions, for disambiguation, similarly to arithmetic expressions.
I was a little cavalier about whether
I included parentheses or not,
but as you write your relational
algebra expressions you will
see that it's pretty straightforward
to figure out when disambiguation is needed.
So to conclude relational algebra,
entirely it's a formal language.
It's based on sets, set
operators and other operators
that combine data from multiple relations.
It takes relations as input,
it produces relations as answers
and it does form the formal
foundation of implemented relational database management.