This is the first of seven videos where we're going to learn the SQL language. The videos are largely going to be live demos of SQL queries and updates running on an actual database. The first video is going to focus on the basics of the SELECT statement. As a reminder, the SELECT statement selects a set of attributes from a set of relations satisfying a particular condition. We will see in the demo that even with the these three clauses, we can write quite powerful queries. All of the seven demos are going to be using the simple college admissions database that we learned about in the relational algebra videos. As a reminder, we have three relations. We have the college relation: college relation contains information about the name of the colleges, the state, and the enrollment of those colleges. We have the student relation, which contains student IDs, their names, their GPA, and the size of the high school that they come from. And finally, the application information, that tells us that a particular student applied to a particular college for a particular major and there was a decision of that application Now as a reminder, in the relational model, when we underline attributes, that means we're designating a key for the relation. So, the underlying attributes in our example say that the knowledge name is going to be unique within the college relation. The student's idea is unique within the student relation and in the applied relation, the combination of these three attributes is unique. That means that student can, if he or she wishes, apply to a college many times, or apply for a major many times, but can only apply to a college for a particular major once. Let's turn to the demo. Let's start by looking at the actual data that we're going to be querying over. We have a set of four colleges: Stanford, Berkeley, MIT and Cornell. We have a bunch of students. And a reminder, each student has an ID, a name, a GPA, and a size of high school. And finally, we have a set of application records where a student with a particular ID applies to a college for a particular major, and there's a yes or no decision on that application. So let's go to our first SQL query. This query is going to find the ID, name, and GPA of students whose GPA is greater than 3.6. So, very simple, it's the basic SELECT FROM WHERE structure. The SELECT gives our table name, the WHERE gives our filtering condition and the SELECT tells us what we want to get out of the query. We'll execute that query and we will find here all of our students with a GPA greater than 3.6. Now, it's not necessary to include the GPA in the result to the query even if we filter on the GPA. So, I could just take GPA away from the SELECT clause, run the query again and now, we see the same result but without the GPA. Okay. Let's go to our second query. Our second query is going to combine two relations. In this query, we're going to find the names of the students and the majors for which they've applied. So, now, we're involving both the student table and the apply table and the condition we see here is the join condition that tells us we want to combine students with apply records that have the same student ID. This is what would happen automatically in a natural join of the relational algebra, but in SQL we need to always write the join condition explicitly, and finally we get the student name and the major. And if we execute the query, we get, expectedly, a bunch of students and the majors that they've applied for. Now, we do notice here that we have several duplicate values. We have two copies of Amy applying to CS and two copies of Craig applying to Bio-Engineering. As we discussed in the relational algebra video, in relational algebra which underlies SQL, it's by default the set model; we don't have duplicates. But in the SQL language we do have duplicates, it's based on a multi-set model. If we don't like the duplicates in our results SQL provides us a convenient way to get rid of them. We simply add the keyword, "distinct", to our query after the word, "select", we execute, and now we get the same result but with the duplicate values eliminated. Our next query is going to be a little more complicated; it's going to find the names and GPAs of students whose size high school is less than a thousand, they've applied to CS at Stanford, and we're going to get the decision associated with that. So again we have two relations, two tables involved, the student and the apply. We have the join condition, making sure we're talking about the same student and the student and apply tuples. Very important to remember that one. We are going to filter the result based on size high school, major, and the college to which they're applying. So let's run this query and we will see the result that we have two students who have applied to CS at Stanford from a small high school. Our next query is again a join of two relations. This time we're going to find all large campuses that have someone applying to that campus in CS. So this time we're going to join the college table and the apply table. And again, we need to be careful to make sure we only join tuples that are talking about the same college. So we have college.cname equals apply.cname. We have an enrollment that's greater than 20,000 and a major that equals CS. Let's run this query. Oops, we got an error! Well, actually I knew that was coming, but I wanted to show you what happens here. So the error is that we have an ambiguous column name, and that's the one right here, the C name. So I haven't pointed it out explicitly, but whenever I've referred to attributes where there's an attribute from both of the relations we're querying, I prefaced it with the name of the relation that we cared about, the college here in the apply. So the attribute name here in the select clause is actually ambiguous because there's a C name attribute in college and there's one there in apply. Now we happen to set those equal, but in order for the query to actually run we have to choose So let's just say we're going to take that C name from college. Now, everything should be fine, and here we go. So those are the colleges where we have at least one CS major and their enrollment is greater than 20,000. Again, we see duplicates so if we don't like the two copies of Berkeley, we simply add distinct and we run the query again. And now we have Berkeley and Cornell. Now, let's do a query with a bigger result. This time we're finally going to join all three of our relations. Student, college and apply. And we're going to apply the joint conditions that ensure that we're talking about the same student and the same college. And then from the result of that big cross-product, that big join, we're going to get the student ID, their name, their GPA, the college that they're applying to and the enrollment of that college. So just a whole bunch of information associated with this students' applications. And we execute this and here we get the result with all the attributes that we asked for. Now, one thing I haven't mentioned yet is the order of the results that we get when we run SQL queries. SO SQL is, at its heart, an unordered model. That means that we can get the results of our queries in any order, and in fact, we could run a query today and get our results in a particular order. And then run the query tomorrow and get a different order. And that's permitted with the specification of SQL on relational databases. If we care about the order of our result SQL provides a clause that we can ask for a result to be sorted by a particular attribute or set of attributes. So let's say we want our application information here sorted by descending GPA. Then we add another clause called the order by clause. We tell the attribute we'd like to be ordering by and then if we want it to be descending we write DESC. The default behavior is actually ascending. So if we run this query now we get our results by descending the GPA we see all the 3.9's, 3.8, 3.7, and so forth. Now we might still want to further sort within all the 3.9s if we want to do that we can specify another attribute to sort each group by. So, for example, if we decide from that we want to sort by enrollment and ascending, we won't put anything because ascending is the default. And we execute. Now we still have GPA as descending as our primary sort order and then within each of those will be sorting by ascending enrollment. This query introduces the like predicate. Like is a built-in operator in SQL that allows us to do simple string matching on attribute values. Let's suppose, for example, that we wanted to find all students who were applying for a major that had to do with bio. Instead of listing all the biology majors we can simply pattern match the major against the special string here which says, match any major where there's some set of characters, followed by bio, followed by some set of characters we execute the query, and we'll find the students who have applied for various bio type majors. Now, I want to introduce another construct. I'm going to use the same query to do it, which is the construct select star. So far, we've always listed explicitly the attributes that we want to get in the result of a query. But if we simply want to get all attributes, then we can just write select star. And when we do that, we don't project away any attributes, but we get all the attributes in the result of the from and where expression. While we're at it, let's do a gigantic query. We'll just do the cross-product and student college without any combination, and we'll do select star to get all the attributes out. So, here goes, and you can see, we get all the attributes and we get a whole lot of tuples as well. Our last query is going to demonstrate the ability to use arithmetic within SQL clauses. So we see here a query that selects all the information from the student relation but adds to it a scaled GPA where we're going to boost the student's GPA if they're from a big high school and reduce it if they're from a small one. Specifically, we'll take their GPA, multiply it by the size high school divided by a thousand. So, let's run this query and you can see that we have the whole student table here with an additional column that has scaled their GPA based on the size of their high school. Now, if we don't like the label on this column, we could change it and so I'll use this query as an example to demonstrate the 'as' feature which allows us to change the labeling of the schema in a query result. Let's say as scaled GPA, and we should get the same result with a more nicely labeled attribute. That concludes our video introducing the basic select statement. We'll see many other features in the upcoming six videos on SQL.