1 00:00:01,140 --> 00:00:03,780 This is the second of two videos about the relational algebra. 2 00:00:04,990 --> 00:00:06,160 In the first video, we learned 3 00:00:06,430 --> 00:00:09,250 about the select and project operators in various types of joins. 4 00:00:10,270 --> 00:00:11,500 This video will cover set operators, 5 00:00:12,350 --> 00:00:14,680 union difference and intersection, the 6 00:00:14,880 --> 00:00:16,550 renaming operator, and different 7 00:00:16,820 --> 00:00:19,010 notations for expressions of relational algebra. 8 00:00:20,370 --> 00:00:21,600 Just as a reminder, we apply 9 00:00:21,980 --> 00:00:23,690 a relational algebra query or 10 00:00:23,820 --> 00:00:24,930 expression to a set 11 00:00:25,110 --> 00:00:26,220 of relations and we get 12 00:00:26,660 --> 00:00:27,500 as a result of that expression 13 00:00:28,070 --> 00:00:29,070 a relation as our answer. 14 00:00:30,210 --> 00:00:31,270 For our examples, we're using 15 00:00:31,640 --> 00:00:33,420 an imaginary database about college admissions. 16 00:00:34,070 --> 00:00:35,420 We have a relation of colleges, 17 00:00:36,150 --> 00:00:37,480 a relation of students, and a 18 00:00:37,780 --> 00:00:40,400 relation with information about students applying to colleges. 19 00:00:41,450 --> 00:00:42,650 We'll keep at the bottom of 20 00:00:42,750 --> 00:00:44,760 the video these three depictions of 21 00:00:44,840 --> 00:00:45,780 those relations with a few 22 00:00:46,030 --> 00:00:48,000 abbreviations used so that names aren't too long. 23 00:00:49,160 --> 00:00:50,550 Let's move ahead to our first operator. 24 00:00:52,590 --> 00:00:53,760 The first of three set operators 25 00:00:54,440 --> 00:00:55,860 is the union operator, and it's 26 00:00:55,960 --> 00:00:57,160 a standard set union that 27 00:00:57,280 --> 00:00:58,840 you learned about in elementary school. 28 00:01:00,230 --> 00:01:01,160 Let's suppose, for example, that we 29 00:01:01,250 --> 00:01:02,290 want a list of the 30 00:01:02,390 --> 00:01:04,110 college and student names in our database. 31 00:01:04,590 --> 00:01:05,670 So we just want those as list. 32 00:01:05,980 --> 00:01:07,220 For example, we might want 33 00:01:07,770 --> 00:01:10,040 Stanford and Susan 34 00:01:10,740 --> 00:01:14,340 and Cornell and Mary 35 00:01:15,350 --> 00:01:16,540 and John and so on. 36 00:01:17,980 --> 00:01:19,280 Now you might think 37 00:01:19,550 --> 00:01:20,690 we can generate this list by 38 00:01:20,800 --> 00:01:21,990 using one of the operators we've 39 00:01:22,140 --> 00:01:23,430 already learned for combining information 40 00:01:23,990 --> 00:01:25,480 from multiple relations, such as 41 00:01:25,580 --> 00:01:28,020 the cross-product operator or the natural join operator. 42 00:01:29,200 --> 00:01:30,100 The problem with those operators 43 00:01:30,830 --> 00:01:31,970 is that they kind of combine information 44 00:01:32,560 --> 00:01:33,750 from multiple relations horizontally. 45 00:01:34,960 --> 00:01:36,550 They might take a tuple T1 46 00:01:36,640 --> 00:01:37,840 from one relation and tuple T2 47 00:01:38,120 --> 00:01:39,730 from the other and kind of match them. 48 00:01:40,320 --> 00:01:41,570 But that's not what we want to do here. 49 00:01:41,740 --> 00:01:44,530 We want to combine the information vertically to create our list. 50 00:01:45,010 --> 00:01:47,140 And to do that we're going to use is the union operator. 51 00:01:48,240 --> 00:01:49,180 So in order to get a list 52 00:01:49,380 --> 00:01:50,390 of the college names and the 53 00:01:50,460 --> 00:01:53,460 student names, we'll project the college name from the college relation. 54 00:01:55,190 --> 00:01:57,310 That gives us a list of college names. 55 00:01:57,630 --> 00:01:58,880 We'll similarly project the student 56 00:01:59,280 --> 00:02:00,540 name from the student relation, 57 00:02:02,080 --> 00:02:03,120 and we've got those two lists 58 00:02:03,600 --> 00:02:04,700 and we'll just apply the union 59 00:02:04,940 --> 00:02:06,930 operator between them and that will give us our result. 60 00:02:07,660 --> 00:02:09,280 Now, technically, in relational algebra 61 00:02:10,090 --> 00:02:11,060 in order to union two lists 62 00:02:11,370 --> 00:02:12,620 they have to have the same schema, 63 00:02:13,210 --> 00:02:14,530 that means that same attribute name 64 00:02:14,930 --> 00:02:16,550 and these don't, but we'll correct that later. 65 00:02:17,140 --> 00:02:19,180 For now, you get the basic idea of the union operator. 66 00:02:20,700 --> 00:02:22,170 Our next set operator is 67 00:02:22,280 --> 00:02:24,580 the difference operator, and this one can be extremely useful. 68 00:02:25,180 --> 00:02:26,440 As an example, let's suppose 69 00:02:26,840 --> 00:02:28,190 we want to find the IDs 70 00:02:28,360 --> 00:02:29,920 of students who didn't apply to any colleges. 71 00:02:30,830 --> 00:02:32,250 It sounds like a complicated query, but 72 00:02:32,460 --> 00:02:34,440 we'll actually write it in a very simple fashion. 73 00:02:35,420 --> 00:02:36,840 We'll start by projecting the student 74 00:02:37,280 --> 00:02:38,510 ID from the student relation 75 00:02:39,020 --> 00:02:39,970 itself and that will 76 00:02:40,150 --> 00:02:41,800 give us all of this student IDs. 77 00:02:42,930 --> 00:02:44,090 Then lets project the student 78 00:02:44,640 --> 00:02:45,660 ID from the apply relation 79 00:02:47,600 --> 00:02:48,590 and that gives us the IDs 80 00:02:48,770 --> 00:02:50,200 of all students who have applied somewhere. 81 00:02:51,250 --> 00:02:52,160 All we need to do is 82 00:02:52,290 --> 00:02:53,870 take the difference operator, written 83 00:02:54,100 --> 00:02:56,640 with the minus sign, and that gives us the result of our query. 84 00:02:57,360 --> 00:02:58,540 It will take all IDs of 85 00:02:58,710 --> 00:02:59,840 the students and then subtract 86 00:03:00,330 --> 00:03:02,370 the ones who have applied somewhere. 87 00:03:03,430 --> 00:03:04,950 Suppose instead that we 88 00:03:05,050 --> 00:03:06,000 wanted the names of the students 89 00:03:06,500 --> 00:03:08,500 who didn't apply anywhere, not just their IDs. 90 00:03:09,360 --> 00:03:10,590 So that's a little bit more complicated. 91 00:03:11,350 --> 00:03:12,500 You might think, "Oh, just add 92 00:03:12,780 --> 00:03:14,040 student name to the projection 93 00:03:14,540 --> 00:03:15,720 list here," but if we 94 00:03:15,840 --> 00:03:16,920 do that, then we're trying to 95 00:03:17,070 --> 00:03:18,170 subtract a set that has 96 00:03:18,310 --> 00:03:20,620 just IDs from a set that has the pair of ID names. 97 00:03:20,810 --> 00:03:22,580 And we can't have 98 00:03:22,610 --> 00:03:23,890 the student name here because the 99 00:03:23,960 --> 00:03:25,690 student name isn't part of the apply relation. 100 00:03:26,870 --> 00:03:29,460 So there is a nice trick, however, that's going to do what we want. 101 00:03:29,810 --> 00:03:30,700 Let me erase these here. 102 00:03:31,530 --> 00:03:32,750 What we're going to do is 103 00:03:32,920 --> 00:03:34,030 we're going to take this whole 104 00:03:34,230 --> 00:03:35,500 expression here which gives us 105 00:03:35,680 --> 00:03:38,370 the student IDs who didn't apply anywhere and watch this. 106 00:03:38,720 --> 00:03:38,960 Pretty clever. 107 00:03:39,710 --> 00:03:40,860 We're gonna do a natural join 108 00:03:42,070 --> 00:03:42,730 with the student relation. 109 00:03:43,960 --> 00:03:45,350 And now, that's called a join back. 110 00:03:45,730 --> 00:03:47,040 So we've taken the IDs, 111 00:03:47,160 --> 00:03:48,180 a certain select set of IDs 112 00:03:48,460 --> 00:03:50,500 and we've joined them back to the student relation. 113 00:03:51,030 --> 00:03:51,800 That's going to give us a 114 00:03:52,050 --> 00:03:53,650 schema that's the student relation 115 00:03:54,100 --> 00:03:55,550 itself, and then we're just 116 00:03:55,690 --> 00:03:56,510 going to add to that a 117 00:03:57,210 --> 00:03:58,710 projection of the student name. 118 00:03:59,340 --> 00:04:00,770 And that will give us our desired answer. 119 00:04:03,460 --> 00:04:06,400 The last of the three set operators is the intersection operator. 120 00:04:07,390 --> 00:04:08,220 So let's suppose we want 121 00:04:08,470 --> 00:04:09,330 to find names that are both 122 00:04:09,560 --> 00:04:11,140 a college name and a student name. 123 00:04:11,860 --> 00:04:14,510 So perhaps, Washington is the name of a student and a college. 124 00:04:15,250 --> 00:04:16,250 To find those, we're going to 125 00:04:16,350 --> 00:04:18,410 do something similar to what we've done in the previous examples. 126 00:04:19,010 --> 00:04:20,390 Let's start by getting the college names. 127 00:04:24,720 --> 00:04:25,670 Then let's get the student names, 128 00:04:29,640 --> 00:04:30,530 and then what we're going to do 129 00:04:30,720 --> 00:04:32,430 is just perform an intersection of 130 00:04:32,550 --> 00:04:35,420 those two expressions and that will give us the result that we want. 131 00:04:36,120 --> 00:04:37,280 Now like our previous example, 132 00:04:37,910 --> 00:04:39,360 technically speaking, the two 133 00:04:39,600 --> 00:04:40,640 expressions on the two sides 134 00:04:41,000 --> 00:04:42,220 of the intersection ought to have 135 00:04:42,380 --> 00:04:43,530 the same schema and again, I'll 136 00:04:43,610 --> 00:04:44,410 show you just a little bit 137 00:04:44,560 --> 00:04:46,060 later, how we take care of that. 138 00:04:47,060 --> 00:04:48,980 Now, it turns out 139 00:04:49,220 --> 00:04:51,220 that intersection actually doesn't add 140 00:04:51,650 --> 00:04:52,830 any expressive power to our 141 00:04:53,030 --> 00:04:54,250 language and I'm going 142 00:04:54,300 --> 00:04:55,820 to show that actually in two different ways. 143 00:04:56,620 --> 00:04:57,780 The first way is that 144 00:04:57,970 --> 00:04:59,360 if we have two expressions, let's 145 00:04:59,640 --> 00:05:01,300 say E1 and E2 and 146 00:05:02,020 --> 00:05:04,210 we perform their intersection, that 147 00:05:04,530 --> 00:05:06,380 is exactly equivalent to 148 00:05:06,610 --> 00:05:08,650 writing E1 minus, using the 149 00:05:08,720 --> 00:05:11,360 difference operator, E1 minus E2. 150 00:05:12,590 --> 00:05:13,700 Now if you're not 151 00:05:14,160 --> 00:05:15,330 convinced of that immediately, let's go 152 00:05:15,530 --> 00:05:16,980 back to Venn diagrams, again 153 00:05:17,290 --> 00:05:19,340 a concept you probably learned early in your schooling. 154 00:05:19,960 --> 00:05:22,120 So let's make a picture of two circles. 155 00:05:23,470 --> 00:05:24,470 And let's say that the first 156 00:05:24,940 --> 00:05:26,270 circle Circle represents the result of 157 00:05:26,360 --> 00:05:28,040 expression E1 and the 158 00:05:28,100 --> 00:05:29,740 second rear circle represents the 159 00:05:29,810 --> 00:05:31,300 result of expression E2. 160 00:05:32,380 --> 00:05:33,390 Now if we 161 00:05:33,810 --> 00:05:35,340 take the entire circle E1. 162 00:05:36,800 --> 00:05:38,210 Let's shade that in purple. 163 00:05:39,980 --> 00:05:41,070 And then we take the 164 00:05:41,210 --> 00:05:42,380 result, so that's E1 here, 165 00:05:43,170 --> 00:05:45,410 and then we take E1, the 166 00:05:45,500 --> 00:05:46,760 result of the expression E1 167 00:05:46,890 --> 00:05:48,490 minus E2 here, we'll write 168 00:05:48,700 --> 00:05:49,760 that in green, so that's everything 169 00:05:50,170 --> 00:05:51,260 in E1 that's not in 170 00:05:51,380 --> 00:05:54,690 E2, that's this. Okay? 171 00:05:55,620 --> 00:05:57,160 And if we take the purple minus 172 00:05:57,870 --> 00:05:59,150 the green you will see 173 00:05:59,570 --> 00:06:02,040 that we actually do get the intersection here. 174 00:06:02,440 --> 00:06:03,420 So that's a simple property 175 00:06:04,040 --> 00:06:05,510 of set Operations but what 176 00:06:05,770 --> 00:06:06,810 that's telling us is that 177 00:06:07,020 --> 00:06:08,530 this intersection operator here isn't 178 00:06:08,830 --> 00:06:10,170 giving us more expressive power because 179 00:06:10,760 --> 00:06:11,620 any expression that we can 180 00:06:11,780 --> 00:06:13,420 write in this fashion, we can equivalently 181 00:06:14,010 --> 00:06:15,700 right with the difference operator in this fashion. 182 00:06:17,130 --> 00:06:18,150 Let me show you a completely 183 00:06:18,470 --> 00:06:19,630 different way in which intersection 184 00:06:20,170 --> 00:06:21,360 doesn't add any expressive power. 185 00:06:22,350 --> 00:06:23,370 So, let's go back to 186 00:06:23,890 --> 00:06:27,810 E1 intersect E2 and as 187 00:06:27,920 --> 00:06:28,780 a reminder for this to be 188 00:06:28,880 --> 00:06:30,130 completely correct these have to 189 00:06:30,280 --> 00:06:33,210 have the same schema as equal between the two. 190 00:06:34,320 --> 00:06:35,590 E1 intersect E2 turns out 191 00:06:35,720 --> 00:06:37,880 to be exactly the 192 00:06:37,970 --> 00:06:40,040 same as E1 natural join 193 00:06:40,660 --> 00:06:42,330 E2 in this particular case 194 00:06:42,910 --> 00:06:44,600 because the schema is the same. 195 00:06:45,140 --> 00:06:46,420 Remember what natural join does. 196 00:06:46,810 --> 00:06:48,270 Natural join says that you 197 00:06:48,430 --> 00:06:50,300 match up all columns that 198 00:06:50,400 --> 00:06:52,560 are equal and you eliminate duplicate values of columns. 199 00:06:52,950 --> 00:06:53,660 So I'll just let you work 200 00:06:53,920 --> 00:06:55,290 out for yourself that this 201 00:06:55,580 --> 00:06:57,340 is indeed an equivalence and 202 00:06:57,460 --> 00:06:59,060 a second reason that the 203 00:06:59,190 --> 00:07:01,210 intersection doesn't add any expressive power. 204 00:07:01,860 --> 00:07:03,340 Nevertheless, the intersection can be 205 00:07:03,520 --> 00:07:04,950 very useful to use in queries. 206 00:07:07,430 --> 00:07:09,100 Our last operator is the rename operator. 207 00:07:10,000 --> 00:07:11,240 The rename operator is necessary 208 00:07:12,270 --> 00:07:14,000 to express certain queries in relational algebra. 209 00:07:14,780 --> 00:07:17,350 Let me first show the form of the operator and then we'll see it in use. 210 00:07:18,170 --> 00:07:20,410 The rename operator uses the Greek symbol rho. 211 00:07:21,250 --> 00:07:22,130 And like all of our 212 00:07:22,280 --> 00:07:23,790 other operators, it applies to 213 00:07:23,880 --> 00:07:25,940 the result of any expression of relational algebra. 214 00:07:26,670 --> 00:07:28,180 And what the rename operator does 215 00:07:28,830 --> 00:07:30,520 is it reassigns the schema 216 00:07:31,310 --> 00:07:32,400 in the result of E. So 217 00:07:32,610 --> 00:07:33,530 we compute E, we get 218 00:07:33,680 --> 00:07:34,650 a relation as a result, 219 00:07:34,930 --> 00:07:36,340 and it says that I'm 220 00:07:36,730 --> 00:07:37,910 going to call the result of 221 00:07:37,980 --> 00:07:39,780 that, relation R with 222 00:07:40,150 --> 00:07:40,940 attributes A1 through An and 223 00:07:41,000 --> 00:07:43,680 then when this expression itself 224 00:07:44,150 --> 00:07:45,570 is embedded in more complex expression, 225 00:07:46,430 --> 00:07:48,170 we can use this schema to 226 00:07:48,580 --> 00:07:49,480 describe the result of the 227 00:07:49,680 --> 00:07:51,870 E. Again we'll see shortly why that's useful. 228 00:07:52,910 --> 00:07:53,570 There are a couple of the abbreviations 229 00:07:54,370 --> 00:07:55,290 that are used in the rename 230 00:07:55,640 --> 00:07:57,990 operator, this form is the general form here. 231 00:07:58,770 --> 00:08:00,360 One abbreviation is if we 232 00:08:00,510 --> 00:08:01,390 just want to use the 233 00:08:01,450 --> 00:08:02,750 same attribute names that came 234 00:08:02,990 --> 00:08:04,160 out of E, but change the 235 00:08:04,230 --> 00:08:05,590 relation name, we write row 236 00:08:05,940 --> 00:08:07,460 sub R applied to E, 237 00:08:08,750 --> 00:08:09,880 and similarly, if we want 238 00:08:10,220 --> 00:08:12,440 to use just the attribute 239 00:08:12,820 --> 00:08:14,550 names, if we 240 00:08:14,650 --> 00:08:15,740 want to change, I'm sorry, just 241 00:08:16,010 --> 00:08:17,330 the attribute names then we 242 00:08:17,470 --> 00:08:18,810 write attribute list here 243 00:08:19,310 --> 00:08:20,800 and it would keep the same relation name. 244 00:08:21,760 --> 00:08:22,930 This form of course has 245 00:08:23,110 --> 00:08:24,040 to have a list of attributes 246 00:08:24,530 --> 00:08:25,310 or we would not be able to 247 00:08:25,380 --> 00:08:26,870 distinguish it from the previous form. 248 00:08:27,150 --> 00:08:27,990 But again these are just abbreviations 249 00:08:28,710 --> 00:08:30,430 and the general form is the one up here. 250 00:08:31,360 --> 00:08:34,150 Okay, so now let's see the rename operator in use. 251 00:08:36,470 --> 00:08:37,350 The first use of the rename 252 00:08:37,740 --> 00:08:38,770 operator is something I alluded 253 00:08:39,070 --> 00:08:40,100 to earlier in this video 254 00:08:40,650 --> 00:08:41,540 which is the fact that when 255 00:08:41,860 --> 00:08:42,930 we do the set operators, 256 00:08:43,540 --> 00:08:45,880 the union, difference, and intersect 257 00:08:46,350 --> 00:08:47,840 operators, we do expect 258 00:08:48,310 --> 00:08:49,370 the schemas on the two 259 00:08:49,620 --> 00:08:50,780 the sides of the operator to 260 00:08:50,900 --> 00:08:51,860 match, and in a couple 261 00:08:52,030 --> 00:08:53,200 of our examples they didn't match, 262 00:08:53,570 --> 00:08:55,680 and the rename operator will allow us to fix that. 263 00:08:56,420 --> 00:08:58,150 So, for example, if we're 264 00:08:58,300 --> 00:08:59,470 doing the list of college and 265 00:08:59,590 --> 00:09:00,520 student names, and let me 266 00:09:00,610 --> 00:09:02,040 just remind you how we wrote that query. 267 00:09:02,590 --> 00:09:03,720 We took the C name from 268 00:09:03,890 --> 00:09:05,780 college and we 269 00:09:05,920 --> 00:09:07,040 took the s name from students 270 00:09:09,250 --> 00:09:11,020 and we did the big union of those. 271 00:09:11,860 --> 00:09:12,950 Now, to make this technically 272 00:09:13,240 --> 00:09:15,520 correct, these two attribute names would have to be the same. 273 00:09:16,350 --> 00:09:18,310 So we're just going to apply the rename operator. 274 00:09:19,270 --> 00:09:20,610 Let's say that we're gonna 275 00:09:20,790 --> 00:09:22,150 rename the result of this 276 00:09:22,390 --> 00:09:23,840 first expression to say 277 00:09:24,060 --> 00:09:26,350 the relation name C with attribute name. 278 00:09:26,870 --> 00:09:28,370 And let's make the 279 00:09:28,450 --> 00:09:30,500 result of the second expression similarly 280 00:09:31,040 --> 00:09:32,850 be the relation C with attribute name. 281 00:09:33,290 --> 00:09:34,400 And now we have two 282 00:09:34,580 --> 00:09:36,140 matching schemas and then 283 00:09:36,300 --> 00:09:38,050 we can properly perform the union operator. 284 00:09:38,690 --> 00:09:39,650 Again, this is just a 285 00:09:39,690 --> 00:09:41,060 syntactic necessity to have 286 00:09:41,170 --> 00:09:42,830 well-formed relational algebra expressions. 287 00:09:44,690 --> 00:09:46,260 Now, the second use of 288 00:09:46,360 --> 00:09:47,700 the rename operator is a 289 00:09:47,970 --> 00:09:49,860 little more complicated and quite 290 00:09:50,150 --> 00:09:52,000 a bit more important actually which 291 00:09:52,260 --> 00:09:54,210 is disambiguation in self 292 00:09:54,550 --> 00:09:55,530 joins and you probably have no 293 00:09:55,800 --> 00:09:56,910 idea what I'm talking 294 00:09:57,220 --> 00:09:59,040 about when I say that, but let me give an example. 295 00:10:00,130 --> 00:10:01,320 Let's suppose that we wanted 296 00:10:01,710 --> 00:10:02,810 to have a query that finds 297 00:10:03,620 --> 00:10:04,870 pairs of colleges in the same state. 298 00:10:05,980 --> 00:10:06,720 Now, think about that. 299 00:10:06,910 --> 00:10:07,950 So we want to have, for example, 300 00:10:08,550 --> 00:10:11,300 Stanford and Berkeley and 301 00:10:11,980 --> 00:10:14,100 Berkeley and UCLA and so on. 302 00:10:14,530 --> 00:10:16,060 So that, as you 303 00:10:16,180 --> 00:10:17,720 can see, unlike the union 304 00:10:18,160 --> 00:10:19,500 operator, we're looking for this 305 00:10:19,820 --> 00:10:21,480 horizontal joining here. So 306 00:10:21,710 --> 00:10:23,180 we're going to have to combine essentially 307 00:10:23,690 --> 00:10:25,620 two instances of the college relation. 308 00:10:26,270 --> 00:10:28,930 And that's exactly what we're going to do. 309 00:10:28,990 --> 00:10:30,720 We're effectively going to do college join 310 00:10:31,120 --> 00:10:32,930 college making the state equal. 311 00:10:33,260 --> 00:10:35,020 So, let's work on that a little bit. 312 00:10:35,600 --> 00:10:36,560 So, what we wanna 313 00:10:36,690 --> 00:10:37,820 do is we wanna have college 314 00:10:39,820 --> 00:10:41,020 and we want to, let's just 315 00:10:41,300 --> 00:10:43,500 start with, say, the cross-product of college. 316 00:10:45,250 --> 00:10:46,690 And then we want to somehow say, 317 00:10:47,080 --> 00:10:48,510 "Well, the state equals the state." 318 00:10:49,270 --> 00:10:50,620 But that's not gonna work. 319 00:10:50,980 --> 00:10:52,130 Which state are these? 320 00:10:52,350 --> 00:10:54,520 And how do we describe the two instances of college? 321 00:10:55,620 --> 00:10:56,670 So what we're going to 322 00:10:56,840 --> 00:10:57,630 do and let me just 323 00:10:57,860 --> 00:10:59,030 erase this, is we're going 324 00:10:59,300 --> 00:11:00,910 to rename those two instances 325 00:11:01,430 --> 00:11:03,000 of colleges so they have different names. 326 00:11:03,710 --> 00:11:04,720 So we're going to take the first 327 00:11:05,010 --> 00:11:06,540 instance of college here and 328 00:11:07,040 --> 00:11:09,110 we're going to apply a rename operator to that. 329 00:11:09,480 --> 00:11:11,350 And we'll call it C1 and we'll 330 00:11:11,640 --> 00:11:12,990 say that that has name1, state1, and enrollment1. 331 00:11:16,580 --> 00:11:18,600 And then we'll take the second instance here. 332 00:11:18,720 --> 00:11:20,340 We'll call it C2, so N2, 333 00:11:21,140 --> 00:11:24,250 S2, E2 of college and 334 00:11:24,790 --> 00:11:25,800 now we have two different relations. 335 00:11:26,870 --> 00:11:27,770 So what we can do is 336 00:11:27,940 --> 00:11:29,040 we can take the cross-product 337 00:11:29,480 --> 00:11:31,010 of those two like that, 338 00:11:31,880 --> 00:11:33,570 and then we can select where 339 00:11:34,560 --> 00:11:36,030 S1 equals S2, okay? 340 00:11:38,010 --> 00:11:40,040 And that gives us pairs of college in the same state. 341 00:11:41,130 --> 00:11:42,080 Actually, let me show you 342 00:11:42,220 --> 00:11:44,760 an even trickier, simpler way of doing this. 343 00:11:45,480 --> 00:11:46,930 Let's take away the selection operator 344 00:11:47,510 --> 00:11:49,140 here, okay? 345 00:11:49,280 --> 00:11:50,610 And let's take away this. 346 00:11:52,610 --> 00:11:54,730 And let's make this into a natural join. 347 00:11:55,150 --> 00:11:56,240 Now that's not gonna work quite 348 00:11:56,590 --> 00:11:57,980 yet because the natural join 349 00:11:58,190 --> 00:11:59,520 requires attribute names to 350 00:11:59,650 --> 00:12:00,560 be the same, and we don't 351 00:12:00,740 --> 00:12:02,180 have any attribute names that are the same. 352 00:12:03,020 --> 00:12:04,100 So the last little trick we're 353 00:12:04,150 --> 00:12:05,890 gonna do is we're 354 00:12:06,190 --> 00:12:07,550 gonna make those two attribute names, 355 00:12:07,990 --> 00:12:09,650 S, be the same. 356 00:12:10,470 --> 00:12:12,010 And now when we 357 00:12:12,120 --> 00:12:13,390 do the natural join, it's gonna 358 00:12:13,660 --> 00:12:15,000 require equality on those two 359 00:12:15,190 --> 00:12:16,970 S's and everything is gonna be great. 360 00:12:18,060 --> 00:12:18,060 Okay? 361 00:12:18,710 --> 00:12:21,570 Now, things are still a little bit more complicated. 362 00:12:22,250 --> 00:12:23,750 One problem with this query 363 00:12:24,520 --> 00:12:25,230 is that we are going to 364 00:12:25,530 --> 00:12:27,090 get colleges paired with themselves. 365 00:12:27,590 --> 00:12:28,650 So we're going to get from 366 00:12:28,810 --> 00:12:30,710 this, for example, Stanford Stanford. 367 00:12:31,990 --> 00:12:33,150 If you think about it, right? 368 00:12:33,880 --> 00:12:37,540 Berkley Berkeley, as well as Stanford Berkeley. 369 00:12:41,450 --> 00:12:44,240 Now, that's not really what we want presumably. 370 00:12:44,820 --> 00:12:46,360 Presumably we actually want different colleges. 371 00:12:47,380 --> 00:12:48,960 but that's pretty easy to handle, actually. 372 00:12:49,460 --> 00:12:51,050 Let's put a selection condition here 373 00:12:51,820 --> 00:12:54,440 so that the name one is not equal to name two. 374 00:12:56,370 --> 00:12:57,030 Great. We took care of that. 375 00:12:57,710 --> 00:12:58,620 So in that case we will 376 00:12:58,740 --> 00:13:00,790 no longer get Stanford Standford and Berkeley Berkeley. 377 00:13:02,000 --> 00:13:03,160 Ah, but there's still one more problem. 378 00:13:04,400 --> 00:13:06,900 We'll get Stanford Berkeley but we'll also get Berkeley Stanford. 379 00:13:09,430 --> 00:13:12,160 Now, let me 380 00:13:12,310 --> 00:13:13,190 pause for a moment and see 381 00:13:13,260 --> 00:13:14,070 if you can think of a 382 00:13:14,150 --> 00:13:15,650 simple way to solve this problem. 383 00:13:17,880 --> 00:13:20,060 Actually, there's a surprisingly simple way, kind of clever. 384 00:13:20,880 --> 00:13:22,350 We're gonna take away this not 385 00:13:22,570 --> 00:13:24,620 equals and we're going replace it with a less than. 386 00:13:24,940 --> 00:13:27,070 And now we'll only 387 00:13:27,450 --> 00:13:29,370 get pairs where the first one is less than the second. 388 00:13:30,070 --> 00:13:32,740 So Stanford and Berkeley goes away and we get Berkley Stanford. 389 00:13:34,200 --> 00:13:35,800 And this is our final query 390 00:13:36,310 --> 00:13:37,300 for what we wanted to do here. 391 00:13:37,960 --> 00:13:39,010 Now what I really wanted to 392 00:13:39,190 --> 00:13:40,330 show, aside from some of the 393 00:13:40,390 --> 00:13:42,270 uses of relational algebra, is 394 00:13:42,370 --> 00:13:43,440 the fact that the rename operator 395 00:13:43,980 --> 00:13:46,210 was for this query absolutely necessary. 396 00:13:46,770 --> 00:13:49,110 We could not have done this query without the rename operator. 397 00:13:50,760 --> 00:13:52,980 Now we've seen all the operators of relational algebra. 398 00:13:53,830 --> 00:13:54,730 Before we wrap up the 399 00:13:54,800 --> 00:13:55,670 video I did want 400 00:13:55,720 --> 00:13:56,790 to mention that there are 401 00:13:56,880 --> 00:13:58,040 some other notations that can 402 00:13:58,300 --> 00:13:59,690 be used for relational algebra expressions. 403 00:14:00,780 --> 00:14:01,690 So far we've just been writing 404 00:14:01,970 --> 00:14:03,290 our expressions in a standard form 405 00:14:03,620 --> 00:14:04,850 with relation names and operators 406 00:14:05,540 --> 00:14:07,280 between those names and applying to those names. 407 00:14:08,060 --> 00:14:09,440 But sometimes people prefer to 408 00:14:09,560 --> 00:14:10,570 write using a more 409 00:14:10,690 --> 00:14:12,180 linear notation of assignment statements 410 00:14:13,060 --> 00:14:14,860 and sometimes people like to write the expressions as trees. 411 00:14:15,860 --> 00:14:18,860 So I'm just gonna briefly show a couple of examples of those and then we'll wrap up. 412 00:14:19,710 --> 00:14:21,310 So assignment statements are a 413 00:14:21,370 --> 00:14:22,790 way to break down relational 414 00:14:23,250 --> 00:14:24,510 algebra expressions into their parts. 415 00:14:25,260 --> 00:14:26,570 Let's do the same query we 416 00:14:26,700 --> 00:14:27,590 just finished as a big 417 00:14:27,740 --> 00:14:28,850 expression which is the 418 00:14:28,940 --> 00:14:30,790 pairs of colleges that are on the same state. 419 00:14:31,620 --> 00:14:32,680 We'll start by writing two assignment 420 00:14:33,220 --> 00:14:34,160 statements that do the rename 421 00:14:34,630 --> 00:14:37,020 of the two instances of the college relation. 422 00:14:37,710 --> 00:14:39,130 So we'll start with 423 00:14:39,190 --> 00:14:40,510 C1 colon equals and we'll 424 00:14:40,620 --> 00:14:42,350 use a rename operator and now 425 00:14:42,680 --> 00:14:44,990 we use the abbreviated form that just lists attribute names. 426 00:14:45,320 --> 00:14:46,570 So we'll see say C one, 427 00:14:47,410 --> 00:14:49,400 S, E one of 428 00:14:49,490 --> 00:14:51,460 college and we'll similarly 429 00:14:52,500 --> 00:14:54,650 say that C2 gets the 430 00:14:54,850 --> 00:14:55,810 rename, and we'll call it 431 00:14:56,210 --> 00:14:58,210 C2SE2 of college, 432 00:14:59,180 --> 00:15:00,250 and remember we use the 433 00:15:00,480 --> 00:15:02,410 same S here so that we can do the natural join. 434 00:15:03,260 --> 00:15:04,470 So, now we'll say 435 00:15:04,910 --> 00:15:07,330 that college pairs gets C1 436 00:15:07,850 --> 00:15:09,490 natural join C2, and 437 00:15:10,300 --> 00:15:11,810 then finally we'll do our selection condition. 438 00:15:12,330 --> 00:15:13,860 So our final answer will be 439 00:15:14,140 --> 00:15:16,520 the selection where N1 is 440 00:15:16,840 --> 00:15:18,450 less than N2 of CP. 441 00:15:19,870 --> 00:15:21,280 And again, this is equivalent to 442 00:15:21,350 --> 00:15:23,220 the expression that we saw on the earlier slide. 443 00:15:23,940 --> 00:15:25,170 It's just a notation that sometimes 444 00:15:25,580 --> 00:15:27,490 people prefer to modularize their expressions. 445 00:15:29,640 --> 00:15:32,700 The second alternate notation I'm going to show is expression trees. 446 00:15:33,090 --> 00:15:35,990 And expression trees are actually commonly used in relational algebra. 447 00:15:36,400 --> 00:15:37,820 They allow you to visualize the structure 448 00:15:38,300 --> 00:15:39,880 of the expression a little bit better. 449 00:15:40,750 --> 00:15:42,400 And as it turns out when SQL 450 00:15:42,680 --> 00:15:44,000 is compiled in database systems, 451 00:15:44,380 --> 00:15:45,560 it's often compiled into an 452 00:15:45,630 --> 00:15:46,620 expression tree that looks very 453 00:15:46,830 --> 00:15:48,510 much like what I'm gonna show you right now. 454 00:15:49,360 --> 00:15:50,660 So for this example let's suppose 455 00:15:51,100 --> 00:15:52,150 that we want to find the 456 00:15:52,580 --> 00:15:54,680 GPAs of students who are applying to CS in California. 457 00:15:55,880 --> 00:15:57,180 So that's going to 458 00:15:57,400 --> 00:15:58,790 involve all three relations because 459 00:15:59,170 --> 00:16:00,400 we're looking at the 460 00:16:00,460 --> 00:16:02,230 state is in California, and 461 00:16:02,510 --> 00:16:03,560 we're looking at the student GPA's 462 00:16:04,580 --> 00:16:06,530 and we're looking at them applying to CS. 463 00:16:07,220 --> 00:16:08,380 So what we're going 464 00:16:08,600 --> 00:16:09,320 to do is we're going to 465 00:16:09,480 --> 00:16:10,890 make a little tree notation here 466 00:16:11,670 --> 00:16:12,730 where we're going to first do 467 00:16:12,840 --> 00:16:14,440 the natural join of these three relations. 468 00:16:14,990 --> 00:16:16,320 So technically the expression 469 00:16:16,800 --> 00:16:18,280 I'm going to show you is going to stop down here. 470 00:16:18,450 --> 00:16:19,450 It's not going to actually have the tables. 471 00:16:20,280 --> 00:16:21,570 So the leaves of the expression are 472 00:16:21,650 --> 00:16:23,790 going to be the three relations: college, students, and apply. 473 00:16:24,600 --> 00:16:26,100 And in relational algebra trees, the 474 00:16:26,260 --> 00:16:28,180 leaves are always relation names. 475 00:16:28,910 --> 00:16:29,530 And we're going to do the natural 476 00:16:30,040 --> 00:16:31,120 join of those three which 477 00:16:31,550 --> 00:16:33,780 as a reminder enforces equality of 478 00:16:34,160 --> 00:16:35,660 the college name against the 479 00:16:35,740 --> 00:16:36,820 college name here against the 480 00:16:36,880 --> 00:16:37,810 college name here, and the 481 00:16:37,880 --> 00:16:39,560 student ID here and the student ID here. 482 00:16:40,210 --> 00:16:41,440 That enforcement means that we 483 00:16:41,610 --> 00:16:42,590 get triples that are talking 484 00:16:42,970 --> 00:16:44,710 about a student applying to a particular college. 485 00:16:45,680 --> 00:16:47,750 And then we're going to apply to 486 00:16:47,830 --> 00:16:49,090 that, and so that's going to 487 00:16:49,180 --> 00:16:51,330 be written as a new note above this one in the tree. 488 00:16:51,980 --> 00:16:53,820 The selection condition that says 489 00:16:54,230 --> 00:16:59,000 that the state equals California and the major equals CS. 490 00:17:02,960 --> 00:17:04,230 And finally, we'll put on 491 00:17:04,420 --> 00:17:06,410 top of that the projection that gets the GPA. 492 00:17:08,450 --> 00:17:08,450 okay? 493 00:17:08,820 --> 00:17:10,820 Now actually this expression is 494 00:17:11,060 --> 00:17:12,220 exactly equivalent to if 495 00:17:12,390 --> 00:17:14,140 we wrote it linearly, project the 496 00:17:14,210 --> 00:17:15,780 GPA, select etc. of 497 00:17:16,690 --> 00:17:19,200 the three college join student, join apply. 498 00:17:19,610 --> 00:17:20,700 I'm just abbreviating here. 499 00:17:20,890 --> 00:17:22,090 That would be an equivalent expression. 500 00:17:22,470 --> 00:17:24,130 But again, people often like 501 00:17:24,280 --> 00:17:25,990 to use the tree notation 502 00:17:26,490 --> 00:17:27,280 because it does allow you to 503 00:17:27,350 --> 00:17:28,460 visualize the structure of the 504 00:17:28,790 --> 00:17:29,780 expression, and it is used 505 00:17:30,520 --> 00:17:32,730 inside implementations of the SQL language. 506 00:17:34,560 --> 00:17:36,580 Let me finish up by summarizing relational algebra. 507 00:17:37,300 --> 00:17:39,930 Let's start with the core constructs of the language. 508 00:17:40,790 --> 00:17:42,220 So a relation name is 509 00:17:42,800 --> 00:17:44,120 a query in relational algebra, 510 00:17:45,000 --> 00:17:47,700 and then we use operators that combine relations and filter relations. 511 00:17:48,730 --> 00:17:49,810 So we have the select operator 512 00:17:50,440 --> 00:17:52,610 that applies a condition to the result of an expression. 513 00:17:53,940 --> 00:17:55,750 We have the project operator that 514 00:17:55,870 --> 00:17:56,820 gives us a set of 515 00:17:57,040 --> 00:17:59,310 attributes that we take from the result of an expression. 516 00:18:00,720 --> 00:18:02,750 We have the expression one 517 00:18:03,650 --> 00:18:05,290 cross-product expression two. 518 00:18:05,750 --> 00:18:06,790 And again those can be any expressions. 519 00:18:08,250 --> 00:18:09,760 Then we have expression one 520 00:18:10,300 --> 00:18:11,560 union expression two. 521 00:18:13,040 --> 00:18:15,930 And we have expression one minus expression two. 522 00:18:16,120 --> 00:18:17,890 And finally we have 523 00:18:18,310 --> 00:18:20,010 the rename operator that takes 524 00:18:20,890 --> 00:18:22,580 an expression and renames the 525 00:18:22,670 --> 00:18:24,320 result of that, the 526 00:18:24,670 --> 00:18:25,880 schema in the result of that expression. 527 00:18:29,430 --> 00:18:30,660 Now, you probably noticed that 528 00:18:30,750 --> 00:18:31,940 I skipped a few of our 529 00:18:32,210 --> 00:18:33,550 favorite operators, but this 530 00:18:33,610 --> 00:18:34,470 is the core of the language 531 00:18:35,240 --> 00:18:36,730 and all the other operators are 532 00:18:36,840 --> 00:18:38,250 actually abbreviations that don't 533 00:18:38,620 --> 00:18:40,040 increase the expressive power of 534 00:18:40,120 --> 00:18:42,660 the language but they can be very useful performing queries. 535 00:18:43,350 --> 00:18:44,820 And the abbreviations that we 536 00:18:45,010 --> 00:18:46,720 learned were expression one 537 00:18:47,450 --> 00:18:49,010 natural join expression two. 538 00:18:49,700 --> 00:18:51,290 They were expression one 539 00:18:51,720 --> 00:18:53,620 theta join expression two. 540 00:18:55,490 --> 00:18:58,250 And, finally, expression one intersect expression two. 541 00:18:58,730 --> 00:18:59,640 All of those where we had 542 00:18:59,860 --> 00:19:02,990 a method of rewriting them using the core operators. 543 00:19:03,730 --> 00:19:05,260 Just a small aside about parentheses. 544 00:19:06,180 --> 00:19:07,530 A parentheses are used in 545 00:19:07,630 --> 00:19:09,180 relational expressions for, relational 546 00:19:09,600 --> 00:19:13,410 algebraic expressions, for disambiguation, similarly to arithmetic expressions. 547 00:19:13,900 --> 00:19:15,310 I was a little cavalier about whether 548 00:19:15,520 --> 00:19:16,940 I included parentheses or not, 549 00:19:17,450 --> 00:19:18,370 but as you write your relational 550 00:19:18,900 --> 00:19:20,410 algebra expressions you will 551 00:19:20,650 --> 00:19:21,990 see that it's pretty straightforward 552 00:19:22,500 --> 00:19:24,240 to figure out when disambiguation is needed. 553 00:19:25,240 --> 00:19:26,630 So to conclude relational algebra, 554 00:19:27,050 --> 00:19:28,290 entirely it's a formal language. 555 00:19:28,940 --> 00:19:30,210 It's based on sets, set 556 00:19:30,350 --> 00:19:31,600 operators and other operators 557 00:19:32,100 --> 00:19:33,560 that combine data from multiple relations. 558 00:19:34,510 --> 00:19:35,980 It takes relations as input, 559 00:19:36,430 --> 00:19:37,590 it produces relations as answers 560 00:19:38,440 --> 00:19:39,640 and it does form the formal 561 00:19:40,100 --> 00:19:43,210 foundation of implemented relational database management.