This is the first of seven
videos where we're going to learn the SQL language.
The videos are largely going to
be live demos of SQL queries
and updates running on an actual database.
The first video is going
to focus on the basics of the SELECT statement.
As a reminder, the SELECT statement
selects a set of attributes from
a set of relations satisfying a particular condition.
We will see in the demo that
even with the these three clauses,
we can write quite powerful queries.
All of the seven demos are
going to be using the simple
college admissions database that we
learned about in the relational algebra videos.
As a reminder, we have three relations.
We have the college relation: college
relation contains information about the
name of the colleges, the state, and the enrollment of those colleges.
We have the student relation, which
contains student IDs, their names,
their GPA, and the size of the high school that they come from.
And finally, the application information, that
tells us that a particular student
applied to a particular college
for a particular major and there
was a decision of that application
Now as a reminder, in the
relational model, when we underline
attributes, that means we're designating a key for the relation.
So, the underlying attributes in
our example say that the
knowledge name is going to be unique within the college relation.
The student's idea is unique within
the student relation and in
the applied relation, the combination of these three attributes is unique.
That means that student can, if
he or she wishes, apply to a
college many times, or apply
for a major many times, but
can only apply to a
college for a particular major once.
Let's turn to the demo.
Let's start by looking at the actual data that we're going to be querying over.
We have a set of four
colleges: Stanford,
Berkeley, MIT and Cornell.
We have a bunch of students.
And a reminder, each student has an
ID, a name, a GPA, and a size of high school.
And finally, we have a set
of application records where a
student with a particular ID
applies to a college for a
particular major, and there's a
yes or no decision on that application.
So let's go to our first SQL query.
This query is going to find
the ID, name, and GPA of
students whose GPA is greater than 3.6.
So, very simple, it's the
basic SELECT FROM WHERE structure.
The SELECT gives our table name,
the WHERE gives our filtering condition
and the SELECT tells us what we want to get out of the query.
We'll execute that query and
we will find here all of
our students with a GPA greater than 3.6.
Now, it's not necessary
to include the GPA in
the result to the query even if we filter on the GPA.
So, I could just take GPA away
from the SELECT clause, run the
query again and now, we
see the same result but without the GPA.
Okay.
Let's go to our second query.
Our second query is going to combine two relations.
In this query, we're going to
find the names of the students
and the majors for which they've applied.
So, now, we're involving both the
student table and the
apply table and the
condition we see here is the
join condition that tells us
we want to combine students with
apply records that have the same student ID.
This is what would happen automatically in
a natural join of the
relational algebra, but in SQL
we need to always write the
join condition explicitly, and finally
we get the student name and the major.
And if we execute the
query, we get, expectedly, a
bunch of students and the majors that they've applied for.
Now, we do notice here that we have several duplicate values.
We have two copies of Amy
applying to CS and two copies of Craig applying to Bio-Engineering.
As we discussed in the relational
algebra video, in relational algebra
which underlies SQL, it's by
default the set model; we don't have duplicates.
But in the SQL language we
do have duplicates, it's based on a multi-set model.
If we don't like the duplicates in
our results SQL provides us a convenient way to get rid of them.
We simply add the keyword,
"distinct", to our query after
the word, "select", we execute, and
now we get the same result
but with the duplicate values eliminated.
Our next query is going
to be a little more complicated; it's
going to find the names
and GPAs of students whose
size high school is less
than a thousand, they've applied to
CS at Stanford, and we're going
to get the decision associated with that.
So again we have two
relations, two tables involved, the student and the apply.
We have the join condition, making
sure we're talking about the same
student and the student and apply tuples.
Very important to remember that one.
We are going to filter the result
based on size high school, major,
and the college to which they're applying.
So let's run this query and
we will see the result that
we have two students who
have applied to CS at Stanford from a small high school.
Our next query is again a join of two relations.
This time we're going to find all large
campuses that have someone applying to that campus in CS.
So this time we're going to join the college table and the apply table.
And again, we need to
be careful to make sure we
only join tuples that are talking about the same college.
So we have college.cname
equals apply.cname.
We have an enrollment that's greater
than 20,000 and a major that equals CS.
Let's run this query.
Oops, we got an error!
Well, actually I knew that was
coming, but I wanted to show you what happens here.
So the error is that we
have an ambiguous column name,
and that's the one right here, the C name.
So I haven't pointed it
out explicitly, but whenever I've
referred to attributes where there's
an attribute from both of
the relations we're querying, I prefaced
it with the name of
the relation that we cared about, the college here in the apply.
So the attribute name here
in the select clause is actually
ambiguous because there's a
C name attribute in college and there's one there in apply.
Now we happen to set those equal,
but in order for the query to
actually run we have to choose
So let's just say we're
going to take that C name from college.
Now, everything should be fine, and here we go.
So those are the colleges where we
have at least one
CS major and their enrollment is greater than 20,000.
Again, we see duplicates
so if we don't like
the two copies of Berkeley, we
simply add distinct and we run the query again.
And now we have Berkeley and Cornell.
Now, let's do a query with a bigger result.
This time we're finally going to join all three of our relations.
Student, college and apply.
And we're going to apply the
joint conditions that ensure that
we're talking about the same student and the same college.
And then from the result
of that big cross-product, that
big join, we're going to
get the student ID, their name,
their GPA, the college that
they're applying to and the enrollment of that college.
So just a whole bunch of
information associated with this students' applications.
And we execute this and here
we get the result with all the attributes that we asked for.
Now, one thing I haven't
mentioned yet is the order
of the results that we get when we run SQL queries.
SO SQL is, at its heart, an unordered model.
That means that we can get
the results of our queries in
any order, and in fact,
we could run a query today
and get our results in a particular order.
And then run the query tomorrow and get a different order.
And that's permitted with the
specification of SQL on relational databases.
If we care about the order
of our result SQL provides a
clause that we can ask for a
result to be sorted by
a particular attribute or set of attributes. So
let's say we want our application information here
sorted by descending GPA.
Then we add another clause called the order by clause.
We tell the attribute we'd like
to be ordering by and then
if we want it to be descending we write DESC.
The default behavior is actually ascending.
So if we run this query
now we get our results by
descending the GPA we
see all the 3.9's, 3.8, 3.7, and so forth.
Now we might still want
to further sort within all the
3.9s if we want
to do that we can specify another
attribute to sort each group by.
So, for example, if we
decide from that we
want to sort by enrollment
and ascending, we won't put
anything because ascending is the default.
And we execute.
Now we still have GPA
as descending as our primary
sort order and then within each
of those will be sorting by ascending enrollment.
This query introduces the like predicate.
Like is a built-in operator
in SQL that allows us
to do simple string matching on attribute values.
Let's suppose, for example, that we
wanted to find all students
who were applying for a major that had to do with bio.
Instead of listing all the
biology majors we can
simply pattern match the major
against the special string here
which says, match any major
where there's some set of characters,
followed by bio, followed by
some set of characters we execute
the query, and we'll find the
students who have applied for various
bio type majors.
Now, I want to introduce another construct.
I'm going to use the same query to
do it, which is the construct select star.
So far, we've always listed
explicitly the attributes that we
want to get in the result of a query.
But if we simply want to get
all attributes, then we can just write select star.
And when we do that, we
don't project away any attributes,
but we get all the attributes in the result of the from and where expression.
While we're at it, let's do a gigantic query.
We'll just do the cross-product
and student college without any
combination, and we'll do
select star to get all the attributes out.
So, here goes, and you can
see, we get all the attributes
and we get a whole lot of tuples as well.
Our last query is going to
demonstrate the ability to use
arithmetic within SQL clauses.
So we see here a query
that selects all the information
from the student relation but adds
to it a scaled GPA where
we're going to boost the student's
GPA if they're from a big
high school and reduce it if they're from a small one.
Specifically, we'll take their GPA, multiply
it by the size high school divided by a thousand.
So, let's run this
query and you can see
that we have the whole student table
here with an additional column that
has scaled their GPA based on the size of their high school.
Now, if we don't like the
label on this column, we
could change it and so
I'll use this query as an
example to demonstrate the 'as'
feature which allows us
to change the labeling of the schema in a query result.
Let's say as scaled GPA,
and we should get the same
result with a more nicely labeled attribute.
That concludes our video introducing the basic select statement.
We'll see many other features in
the upcoming six videos on SQL.