1 00:00:00,500 --> 00:00:01,300 In the next set of SQL 2 00:00:01,730 --> 00:00:03,160 queries, we'll see examples 3 00:00:03,630 --> 00:00:06,330 of using sub queries in the from and the select clauses. 4 00:00:07,660 --> 00:00:08,900 In the previous video, we 5 00:00:09,060 --> 00:00:11,410 introduced sub queries but we only used them in the where clause. 6 00:00:12,000 --> 00:00:13,750 In the condition of queries, we 7 00:00:13,890 --> 00:00:14,890 wrote sub queries that generated 8 00:00:15,510 --> 00:00:16,760 sets that we use for comparisons. 9 00:00:18,440 --> 00:00:19,160 In the next set of queries, 10 00:00:19,580 --> 00:00:20,350 we're going to see that we can 11 00:00:20,490 --> 00:00:22,090 also use sub queries, select 12 00:00:22,420 --> 00:00:24,370 expressions in the from 13 00:00:24,620 --> 00:00:26,880 clause of queries as well as in the select clause. 14 00:00:27,920 --> 00:00:28,920 If we use a sub query 15 00:00:29,010 --> 00:00:30,080 in the front, what we're really 16 00:00:30,280 --> 00:00:31,670 doing is running and that's the 17 00:00:31,820 --> 00:00:32,740 select statement that's going to 18 00:00:32,840 --> 00:00:33,860 generate one of the 19 00:00:33,930 --> 00:00:35,770 tables that we use in the rest of the query. 20 00:00:36,830 --> 00:00:37,830 If we use a sub query 21 00:00:38,250 --> 00:00:39,350 in the select, then what 22 00:00:39,470 --> 00:00:41,160 we're doing is writing a select 23 00:00:41,360 --> 00:00:42,670 expression, a sub select expression, 24 00:00:43,430 --> 00:00:45,220 that produces the value that comes out of the query. 25 00:00:45,860 --> 00:00:46,740 So let's see this in action 26 00:00:47,590 --> 00:00:48,570 and as usual we'll be using 27 00:00:48,930 --> 00:00:50,180 our simple college admissions database 28 00:00:51,010 --> 00:00:52,860 with a college, student and apply tables. 29 00:00:54,940 --> 00:00:56,060 Our first query is going to 30 00:00:56,190 --> 00:00:57,380 lead to demonstrating the use 31 00:00:57,590 --> 00:00:58,430 of a sub query in the from 32 00:00:58,770 --> 00:00:59,940 clause. You might remember 33 00:01:00,320 --> 00:01:01,760 from a previous video that we 34 00:01:01,960 --> 00:01:03,510 had the concept of scaling a 35 00:01:03,780 --> 00:01:05,910 student's GPA based on the size of their high school. 36 00:01:06,640 --> 00:01:08,080 Specifically, we took the GPA 37 00:01:08,510 --> 00:01:11,000 and multiplied it by the size of high school divided by a thousand. 38 00:01:12,050 --> 00:01:12,810 What we're going to do in 39 00:01:12,870 --> 00:01:13,890 this query is we're going to 40 00:01:14,000 --> 00:01:15,610 return all students where scaling 41 00:01:16,170 --> 00:01:17,600 their GPA changes its value 42 00:01:17,960 --> 00:01:19,670 by more than one, specifically either 43 00:01:19,900 --> 00:01:21,260 the scale GPA minus the 44 00:01:21,330 --> 00:01:22,510 GPA is greater than one 45 00:01:22,700 --> 00:01:25,650 or the GPA minus the scale GPA is greater than one. 46 00:01:26,320 --> 00:01:27,500 So let's run the query, and 47 00:01:27,960 --> 00:01:29,060 we find a number of 48 00:01:29,160 --> 00:01:30,480 students whose GPA is modified 49 00:01:31,180 --> 00:01:32,720 by more than one by scaling it. 50 00:01:33,350 --> 00:01:34,400 Now let's take a look at this query. 51 00:01:34,740 --> 00:01:36,210 We have this expression that scales 52 00:01:36,670 --> 00:01:37,900 the GPA and we actually 53 00:01:38,050 --> 00:01:39,820 have to write the expression out three times. 54 00:01:40,660 --> 00:01:43,050 Now remember, this could in fact be a much more complicated calculation. 55 00:01:43,850 --> 00:01:45,460 A smart database system 56 00:01:45,820 --> 00:01:46,710 might look at the query and 57 00:01:46,850 --> 00:01:47,760 recognize that it can do 58 00:01:47,880 --> 00:01:49,160 the computation once and use 59 00:01:49,350 --> 00:01:51,250 it in all three places, but maybe 60 00:01:51,500 --> 00:01:52,910 we're not using a smart database system. 61 00:01:53,200 --> 00:01:54,400 And furthermore, for the 62 00:01:54,560 --> 00:01:57,100 query itself, we might 63 00:01:57,300 --> 00:01:59,520 like to simplify things and only write that expression once. 64 00:02:00,540 --> 00:02:03,370 So, simplifying the where cause is actually fairly easy. 65 00:02:04,120 --> 00:02:05,160 We can just use the absolute 66 00:02:05,780 --> 00:02:07,910 value function that's built into most SQL implementation. 67 00:02:09,030 --> 00:02:10,490 So we write absolute value 68 00:02:10,900 --> 00:02:12,310 function and we apply 69 00:02:12,790 --> 00:02:14,370 it to this computation here. 70 00:02:15,340 --> 00:02:16,960 We delete the second clause 71 00:02:17,360 --> 00:02:19,380 of the query, and we run the query. 72 00:02:19,720 --> 00:02:22,100 And we should get exactly the same result, and we do. 73 00:02:22,330 --> 00:02:23,640 So, that's good, but we 74 00:02:24,630 --> 00:02:25,900 still have to use 75 00:02:26,040 --> 00:02:27,910 the two computations here of the scale GPA. 76 00:02:29,060 --> 00:02:29,720 So, what I'm going to do 77 00:02:29,820 --> 00:02:30,840 now is I'm going to 78 00:02:31,030 --> 00:02:32,110 put in the from clause 79 00:02:33,200 --> 00:02:34,620 a sub-query and that sub-query 80 00:02:34,950 --> 00:02:37,030 is going to among things compute the scale GPA. 81 00:02:37,390 --> 00:02:38,640 When we put a 82 00:02:38,840 --> 00:02:39,830 sub query in the from clause, 83 00:02:40,190 --> 00:02:41,690 what we're doing is creating a 84 00:02:42,340 --> 00:02:43,440 select from where statement that 85 00:02:43,580 --> 00:02:45,140 produces a relation, but in 86 00:02:45,210 --> 00:02:46,110 the rest of the query we can 87 00:02:46,290 --> 00:02:47,150 treat that just like a 88 00:02:47,480 --> 00:02:48,390 relation, so we can refer 89 00:02:48,870 --> 00:02:50,940 to the scale GPA in the rest of the query. 90 00:02:51,890 --> 00:02:54,540 Alright, so let me just do a little editing here. 91 00:02:55,340 --> 00:02:56,370 So, I'm going to take this select 92 00:02:57,400 --> 00:02:59,370 from here, this portion 93 00:02:59,870 --> 00:03:01,350 of the query, like that, and 94 00:03:01,480 --> 00:03:02,560 I'm going to turn it into its own 95 00:03:02,950 --> 00:03:04,630 subquery, and put it in the from clause. 96 00:03:04,880 --> 00:03:06,600 Now I'm going to 97 00:03:06,680 --> 00:03:08,800 give it a name G and technically, 98 00:03:09,660 --> 00:03:10,400 I might be able to leave 99 00:03:10,480 --> 00:03:13,270 that out, but most SQL implementations require it. 100 00:03:13,330 --> 00:03:14,490 This says compute the select 101 00:03:14,960 --> 00:03:16,330 from where expression and call 102 00:03:16,490 --> 00:03:17,840 the result G. And now 103 00:03:18,230 --> 00:03:19,020 in the rest of the query, 104 00:03:20,010 --> 00:03:21,200 anytime I refer to G, 105 00:03:21,400 --> 00:03:24,200 I'm referring to the result of this sub-query in the from. 106 00:03:24,820 --> 00:03:26,700 Specifically, I now have 107 00:03:26,840 --> 00:03:31,280 scaled GPA and if 108 00:03:31,610 --> 00:03:32,550 I want, I can put that 109 00:03:32,680 --> 00:03:35,020 it comes from G, although, I don't need to since it's not ambiguous. 110 00:03:35,920 --> 00:03:38,890 And I can use that scaled GPA in the rest of my query. 111 00:03:40,040 --> 00:03:41,140 Now, I'm still missing a select. 112 00:03:41,870 --> 00:03:43,060 So, I'll just put select star 113 00:03:43,500 --> 00:03:44,340 here and I'll get all attributes 114 00:03:44,890 --> 00:03:46,180 back and now, I have 115 00:03:46,300 --> 00:03:47,300 a complete query where I 116 00:03:47,400 --> 00:03:48,410 computed the scale GPA 117 00:03:48,700 --> 00:03:50,790 in the from clause and used it elsewhere in the query. 118 00:03:51,740 --> 00:03:53,570 I run this one and I 119 00:03:53,690 --> 00:03:55,320 get again the exact same result. 120 00:03:56,400 --> 00:03:57,690 We're going to see more examples of 121 00:03:57,760 --> 00:03:58,940 using sub-queries in the from 122 00:03:59,280 --> 00:04:01,080 clause in the later video on aggregation. 123 00:04:03,790 --> 00:04:05,320 Our second major example is going 124 00:04:05,600 --> 00:04:07,840 to lead to using a sub-query in the select clause. 125 00:04:08,810 --> 00:04:10,830 Let's start by explaining what this query does. 126 00:04:11,640 --> 00:04:12,520 What we're going to do is 127 00:04:12,690 --> 00:04:13,470 we're going to find colleges 128 00:04:14,490 --> 00:04:15,530 and we're going to pair those colleges 129 00:04:16,640 --> 00:04:18,790 with the highest GPA among their applicants. 130 00:04:19,970 --> 00:04:21,350 So this requires joining all three 131 00:04:21,570 --> 00:04:23,280 of our relations: college, apply, and student. 132 00:04:23,800 --> 00:04:24,880 We have here the join conditions. 133 00:04:25,820 --> 00:04:27,300 We always need to remember to include those. 134 00:04:27,840 --> 00:04:28,820 And we're going to return 135 00:04:29,450 --> 00:04:31,100 those college names, states, and 136 00:04:31,250 --> 00:04:32,790 GPAs where the GPA 137 00:04:33,260 --> 00:04:34,670 is the highest (and I'm 138 00:04:34,830 --> 00:04:35,870 using the greater than or equal 139 00:04:36,140 --> 00:04:37,830 to all construct here) among 140 00:04:38,510 --> 00:04:39,910 the GPAs of the 141 00:04:40,190 --> 00:04:42,760 students who applied to that same college. 142 00:04:43,910 --> 00:04:43,910 Okay? 143 00:04:44,680 --> 00:04:45,510 You might want to pause 144 00:04:45,530 --> 00:04:46,380 the video and just make sure 145 00:04:46,570 --> 00:04:48,000 you understand what the query is doing. 146 00:04:48,440 --> 00:04:49,280 But again and what it's going to 147 00:04:49,570 --> 00:04:52,390 return is colleges with the highest GPA among their applicants. 148 00:04:53,420 --> 00:04:55,010 Let's run the query, and we 149 00:04:55,530 --> 00:04:57,170 see that most colleges, 150 00:04:57,640 --> 00:04:59,320 all the colleges in fact, have 151 00:04:59,730 --> 00:05:01,280 a applicant with a 3.9 GPA. 152 00:05:01,610 --> 00:05:02,490 That happens to be the 153 00:05:02,720 --> 00:05:03,990 highest in the database, and all 154 00:05:04,300 --> 00:05:05,940 of the colleges have an applicant with that GPA. 155 00:05:06,820 --> 00:05:08,400 Now we see some duplicates here. 156 00:05:08,610 --> 00:05:09,720 If we don't like duplicates, we use 157 00:05:09,910 --> 00:05:12,300 our usual trick of putting 158 00:05:12,580 --> 00:05:14,420 distinct, and now we 159 00:05:14,520 --> 00:05:15,750 have our four colleges in 160 00:05:15,820 --> 00:05:18,110 our database and the highest GPA of their applicants. 161 00:05:20,510 --> 00:05:21,700 Now let's see how we 162 00:05:21,860 --> 00:05:24,500 can write this query using a sub-query in the select clause. 163 00:05:25,720 --> 00:05:26,700 What a sub-query in a select 164 00:05:26,930 --> 00:05:28,200 clause does is performs a 165 00:05:28,260 --> 00:05:29,790 computation and as long 166 00:05:29,910 --> 00:05:31,710 as that computation returns exactly one 167 00:05:31,950 --> 00:05:34,350 value, that value is used in the result tuple. 168 00:05:35,270 --> 00:05:37,600 So let me just start typing here a little bit. 169 00:05:37,760 --> 00:05:38,300 What we're going to do here 170 00:05:38,300 --> 00:05:39,270 is we're going to select 171 00:05:39,910 --> 00:05:41,320 the college name and state 172 00:05:42,590 --> 00:05:43,370 from the college. 173 00:05:43,690 --> 00:05:45,110 Let me just pull this down here. 174 00:05:47,180 --> 00:05:48,810 from the college, and then 175 00:05:48,960 --> 00:05:49,950 we're going to use a modified 176 00:05:50,590 --> 00:05:51,720 version of this query to 177 00:05:51,850 --> 00:05:54,810 compute the highest GPA for that particular college. 178 00:05:55,410 --> 00:05:56,370 So we're going to have college name, 179 00:05:56,650 --> 00:05:57,840 state, and here comes the 180 00:05:58,050 --> 00:05:59,540 sub-query in the from clause. 181 00:06:00,090 --> 00:06:01,430 So let's just put 182 00:06:02,200 --> 00:06:04,460 parentheses around there, okay? 183 00:06:04,690 --> 00:06:06,730 And we're going to in 184 00:06:07,140 --> 00:06:08,570 the sub-query just compute the 185 00:06:08,740 --> 00:06:09,620 GPA, and it's going to be 186 00:06:09,720 --> 00:06:11,470 the highest GPA for this college. 187 00:06:12,530 --> 00:06:14,850 So we don't need to include college in our from list. 188 00:06:15,860 --> 00:06:17,130 And now we have college here, 189 00:06:17,540 --> 00:06:19,630 and the rest of this is the same query precisely. 190 00:06:20,790 --> 00:06:21,710 The last thing we're going to 191 00:06:21,800 --> 00:06:24,830 do is label this attribute as GPA. 192 00:06:25,520 --> 00:06:26,290 And now we're all set. 193 00:06:26,510 --> 00:06:27,510 You may want to pause 194 00:06:27,620 --> 00:06:28,790 the video and look carefully to make 195 00:06:29,000 --> 00:06:30,560 sure you understand how the query works. 196 00:06:31,430 --> 00:06:33,000 But again, it's taking the...for each 197 00:06:33,290 --> 00:06:35,700 college the name, the state, and then finding the highest GPA. 198 00:06:36,750 --> 00:06:37,750 So let's go for it. 199 00:06:37,890 --> 00:06:39,990 We run the query and indeed we get the results. 200 00:06:40,210 --> 00:06:41,400 It happens to have sorted differently 201 00:06:42,040 --> 00:06:44,110 then last time,but it is exactly the same result. 202 00:06:45,580 --> 00:06:47,300 Now let's suppose that 203 00:06:47,430 --> 00:06:48,580 instead of wanting to pair 204 00:06:48,910 --> 00:06:49,880 every college with the highest 205 00:06:50,230 --> 00:06:51,660 GPA of the applicants, we wanted 206 00:06:52,020 --> 00:06:52,930 to instead pair the colleges 207 00:06:53,490 --> 00:06:54,530 with the names of the applicants. 208 00:06:55,160 --> 00:06:56,240 So we can do something similar. 209 00:06:56,710 --> 00:06:58,110 We have the college, name, and state. 210 00:06:58,530 --> 00:06:59,620 And then in the sub-query 211 00:07:00,080 --> 00:07:01,440 in the from clause, we'll find 212 00:07:01,800 --> 00:07:04,090 the names of the students who applied to that college. 213 00:07:05,070 --> 00:07:06,270 So everything looks pretty much 214 00:07:06,580 --> 00:07:07,460 the same except we don't 215 00:07:07,760 --> 00:07:11,160 need this big sub-query, this big portion here. 216 00:07:13,050 --> 00:07:14,030 We write it like this. 217 00:07:15,100 --> 00:07:16,480 We'll call it the student name. 218 00:07:18,520 --> 00:07:19,610 And that looks pretty similar. 219 00:07:19,970 --> 00:07:20,860 Again, we've got the college, 220 00:07:21,030 --> 00:07:23,520 name, state, and then all the students who applied to that college. 221 00:07:24,000 --> 00:07:25,900 We run the query and we get an error. 222 00:07:26,860 --> 00:07:29,570 The error was expected, and I'm using it to illustrate a point. 223 00:07:30,400 --> 00:07:31,750 When you write a sub-query in 224 00:07:31,850 --> 00:07:33,130 the select clause, it's critical 225 00:07:33,640 --> 00:07:35,650 that that sub-query return exactly one 226 00:07:35,910 --> 00:07:37,150 value, because the result 227 00:07:37,600 --> 00:07:40,100 of that is being used to fill in just one cell of the result. 228 00:07:41,090 --> 00:07:42,340 In our earlier example, we were 229 00:07:42,850 --> 00:07:44,330 computing the highest GPA for 230 00:07:44,420 --> 00:07:46,180 each college and there's exactly one of those. 231 00:07:46,950 --> 00:07:47,750 In this case, we have several 232 00:07:48,290 --> 00:07:50,170 students applying to colleges, so 233 00:07:50,290 --> 00:07:51,290 we wouldn't know, when we get 234 00:07:51,670 --> 00:07:52,580 a bunch of values in the 235 00:07:52,640 --> 00:07:54,940 result, which one to put in the tuple that's being constructed. 236 00:07:56,180 --> 00:07:57,340 And in fact, our result here says 237 00:07:57,650 --> 00:07:58,900 subquery returns more than 238 00:07:59,040 --> 00:08:00,570 one row, and that's the error.