1 00:00:00,630 --> 00:00:03,430 Now, we'll learn about the joined family of operators in SQL. 2 00:00:04,590 --> 00:00:06,520 Going to our select from Where statement. 3 00:00:07,280 --> 00:00:09,080 In the From clause, we list tables. 4 00:00:10,420 --> 00:00:11,880 separated by commas and that's 5 00:00:12,160 --> 00:00:13,840 implicitly a clause product 6 00:00:14,300 --> 00:00:15,650 of those labels, but it's 7 00:00:15,790 --> 00:00:17,510 also possible to have explicit 8 00:00:18,240 --> 00:00:19,630 join of tables and this 9 00:00:19,850 --> 00:00:22,240 follows the relational Algebra style of join. 10 00:00:23,110 --> 00:00:24,490 There's a few different types. 11 00:00:25,090 --> 00:00:27,870 One of them is what's called the inner join on a condition. 12 00:00:28,430 --> 00:00:29,760 And we'll see that that's 13 00:00:30,190 --> 00:00:31,200 equivalent to what in 14 00:00:31,370 --> 00:00:32,570 relational Algebra we were 15 00:00:32,680 --> 00:00:35,510 calling the theta join where the theta here is a condition. 16 00:00:36,150 --> 00:00:37,340 So, it's effectively taking the 17 00:00:37,410 --> 00:00:38,660 clause product, but then 18 00:00:38,840 --> 00:00:39,950 applying the condition and only 19 00:00:40,130 --> 00:00:41,240 keeping the tupples in the 20 00:00:41,420 --> 00:00:43,160 clause product that satisfy the condition. 21 00:00:44,400 --> 00:00:45,540 The separate type of join 22 00:00:45,970 --> 00:00:47,390 we can use in SQL is 23 00:00:47,610 --> 00:00:48,930 the natural join and that 24 00:00:49,120 --> 00:00:50,320 is in fact exactly the natural 25 00:00:50,790 --> 00:00:52,380 join in relational Algebra 26 00:00:53,260 --> 00:00:54,980 where it equates columns across 27 00:00:55,400 --> 00:00:56,710 tables of the same name 28 00:00:57,430 --> 00:00:58,900 so it requires the values in 29 00:00:58,980 --> 00:01:00,300 those columns to be same to 30 00:01:00,450 --> 00:01:01,230 keep the tupples in the clause 31 00:01:01,600 --> 00:01:02,580 product and then it 32 00:01:02,670 --> 00:01:04,830 also eliminates the duplicate columns that are created. 33 00:01:05,250 --> 00:01:07,040 We'll see this very clearly when we get to the demo. 34 00:01:08,060 --> 00:01:09,320 The third type of join operator 35 00:01:09,840 --> 00:01:11,630 and sequel is again, interjoin 36 00:01:12,670 --> 00:01:13,940 but with a special clause called 37 00:01:14,300 --> 00:01:16,420 using and listing attributes and 38 00:01:17,250 --> 00:01:17,930 that's kind of again the natural 39 00:01:18,510 --> 00:01:20,220 join except you explicitly list 40 00:01:20,620 --> 00:01:22,150 the attributes that you want to be equated. 41 00:01:23,950 --> 00:01:25,040 And finally, the fourth type and actually, 42 00:01:25,450 --> 00:01:27,250 the most interesting type is 43 00:01:27,720 --> 00:01:29,640 the other join and there's 44 00:01:29,890 --> 00:01:31,170 a left outer join, right outer 45 00:01:31,380 --> 00:01:33,000 join and full outer join and 46 00:01:33,150 --> 00:01:35,080 this is again combining tupples 47 00:01:35,880 --> 00:01:38,080 similar to the theta 48 00:01:38,360 --> 00:01:39,530 join except when tupples 49 00:01:40,010 --> 00:01:41,180 don't match the theta condition. 50 00:01:42,000 --> 00:01:44,720 They're still added to the result and patted with no values. 51 00:01:45,750 --> 00:01:46,690 Now, I will say right off 52 00:01:46,980 --> 00:01:48,060 that none of these operators 53 00:01:48,550 --> 00:01:50,740 are actually adding expressive part of SQL. 54 00:01:51,320 --> 00:01:53,610 All of them can be expressed using other constructs. 55 00:01:54,600 --> 00:01:55,470 But they can be quite useful 56 00:01:55,970 --> 00:01:57,480 in formulating queries and especially 57 00:01:58,180 --> 00:01:59,490 the outer join is a fairly 58 00:01:59,790 --> 00:02:01,590 complicated to express without the 59 00:02:01,990 --> 00:02:04,110 outer join operator itself. 60 00:02:04,470 --> 00:02:05,420 So, as usual, we'll be doing 61 00:02:05,700 --> 00:02:07,040 our demo with our simple college 62 00:02:07,390 --> 00:02:08,900 admissions database with college 63 00:02:09,340 --> 00:02:10,800 tables, student table and applied table. 64 00:02:11,220 --> 00:02:12,440 So, let's move ahead to the demo. 65 00:02:14,330 --> 00:02:15,290 As usual, we'll have four 66 00:02:15,710 --> 00:02:17,260 colleges, a bunch of 67 00:02:17,330 --> 00:02:20,020 students and students applying to colleges. 68 00:02:21,000 --> 00:02:22,170 Let's start with the simple parade 69 00:02:22,450 --> 00:02:24,200 that we've seen before which matches 70 00:02:24,980 --> 00:02:28,270 students names with majors to which they've applied . 71 00:02:28,370 --> 00:02:29,430 So, that combines the student in 72 00:02:29,640 --> 00:02:31,090 apply relation, making sure the 73 00:02:31,270 --> 00:02:32,110 student ID is the same 74 00:02:32,400 --> 00:02:34,130 across the two relations and gives 75 00:02:34,330 --> 00:02:36,720 us the names and major back. 76 00:02:37,030 --> 00:02:38,560 Now, if you remember your relational Algebra, 77 00:02:39,340 --> 00:02:40,600 you can see clearly that 78 00:02:40,810 --> 00:02:43,060 this is a join of the student on apply relation. 79 00:02:43,430 --> 00:02:44,620 Actually, a natural join, but we'll 80 00:02:44,710 --> 00:02:46,240 come to them in We're gonna 81 00:02:46,530 --> 00:02:47,680 first rewrite it using the 82 00:02:48,010 --> 00:02:49,020 equivalent of a theta join 83 00:02:49,270 --> 00:02:50,750 operator which is called inner 84 00:02:51,010 --> 00:02:52,960 join in SQL and so, 85 00:02:53,160 --> 00:02:54,310 this does the theta join 86 00:02:54,630 --> 00:02:56,270 or the combination of student and 87 00:02:56,330 --> 00:02:57,960 apply on a specific condition, so 88 00:02:58,150 --> 00:02:59,520 we'll change the "Where" to 89 00:02:59,770 --> 00:03:01,450 "on" and it is 90 00:03:01,780 --> 00:03:04,330 effectively the cross product of the two tables. 91 00:03:05,250 --> 00:03:06,150 But then when it does the 92 00:03:06,260 --> 00:03:07,540 cross product, it checks this 93 00:03:07,700 --> 00:03:09,160 condition and only keeps the 94 00:03:10,030 --> 00:03:11,060 tuples that satisfy the condition. 95 00:03:11,840 --> 00:03:14,170 So, let's run that query and of course, we get the same result. 96 00:03:14,650 --> 00:03:16,420 The two queries we saw are exactly 97 00:03:16,700 --> 00:03:18,410 equivalent, we're just expressing them a little bit differently. 98 00:03:19,900 --> 00:03:21,680 Now the inner join is 99 00:03:21,950 --> 00:03:24,450 the default join operator in a SQL. 100 00:03:24,810 --> 00:03:25,770 So we can actually take away 101 00:03:26,000 --> 00:03:27,310 the word "inner" and when we 102 00:03:27,420 --> 00:03:28,500 run that, we again get the 103 00:03:28,580 --> 00:03:31,510 same result, because join is an abbreviation for inner join. 104 00:03:33,640 --> 00:03:34,570 happens when we have a joint 105 00:03:34,920 --> 00:03:36,400 operator with additional conditions 106 00:03:37,010 --> 00:03:39,160 besides the one on the two tables. 107 00:03:40,000 --> 00:03:41,740 So, this is also a query that we've seen before. 108 00:03:42,500 --> 00:03:43,550 This times the name and 109 00:03:44,240 --> 00:03:45,590 GPA of students who 110 00:03:45,820 --> 00:03:48,150 came from a high school with less than a thousand students. 111 00:03:48,840 --> 00:03:51,200 They've applied to major in Computer Science at Stanford. 112 00:03:52,100 --> 00:03:54,820 So, we ran the query and we find just to a students in our result. 113 00:03:55,660 --> 00:03:58,060 So, now let's rewrite that using the join operator. 114 00:03:58,940 --> 00:04:00,390 So, we type join instead 115 00:04:00,810 --> 00:04:01,880 of comma, the comma being 116 00:04:02,050 --> 00:04:04,090 the cross product and the 117 00:04:04,230 --> 00:04:06,100 join condition is again combining 118 00:04:06,700 --> 00:04:08,580 the student and apply records 119 00:04:08,930 --> 00:04:10,160 where the student ID matches 120 00:04:10,910 --> 00:04:12,980 and the rest of this becomes our Where condition. 121 00:04:13,860 --> 00:04:15,890 Go ahead and run the query and we get the same result. 122 00:04:17,040 --> 00:04:18,210 Now, it turns out that 123 00:04:18,370 --> 00:04:19,850 we can actually put all 124 00:04:20,160 --> 00:04:21,580 of these conditions into our 125 00:04:21,910 --> 00:04:23,310 On clause, so we can 126 00:04:23,670 --> 00:04:24,960 make this Where back into an 127 00:04:25,460 --> 00:04:27,280 And, and our On 128 00:04:27,520 --> 00:04:29,380 clause now is the And of all all three conditions. 129 00:04:30,220 --> 00:04:31,870 We run the query and we get the same result. 130 00:04:32,760 --> 00:04:33,950 Now, you're probably thinking how 131 00:04:34,220 --> 00:04:35,270 do I know what to put in 132 00:04:35,450 --> 00:04:36,430 the On clause and what 133 00:04:36,640 --> 00:04:38,010 do I put in the Where clause 134 00:04:39,080 --> 00:04:40,650 because this are obviously equivalent. 135 00:04:41,480 --> 00:04:44,230 Well, first of all there are many equivalent queries in SQL. 136 00:04:44,350 --> 00:04:45,810 We can write things in different ways. 137 00:04:46,400 --> 00:04:48,030 In theory, SQL query processor 138 00:04:48,550 --> 00:04:49,640 should execute them all in 139 00:04:49,670 --> 00:04:51,490 the most efficient possible way, but 140 00:04:51,730 --> 00:04:53,980 the join clause in 141 00:04:54,170 --> 00:04:55,450 particular is often used 142 00:04:55,780 --> 00:04:56,800 as a hint to the query processor 143 00:04:57,660 --> 00:04:58,530 on how to execute the query. 144 00:04:59,260 --> 00:05:00,280 So, if we put 145 00:05:00,450 --> 00:05:02,340 all of these in the On condition. 146 00:05:02,680 --> 00:05:03,650 We're sort of saying as the 147 00:05:03,750 --> 00:05:04,680 query processor does the join, 148 00:05:04,990 --> 00:05:07,740 it should be all the conditions when we make this aware. 149 00:05:08,540 --> 00:05:09,530 It's sort of a hint saying 150 00:05:09,720 --> 00:05:11,060 here's the condition that really 151 00:05:11,300 --> 00:05:12,530 applies to the combination of 152 00:05:12,580 --> 00:05:13,620 the tuples and the rest 153 00:05:13,670 --> 00:05:15,880 of the conditions apply to separate attributes. 154 00:05:17,350 --> 00:05:18,330 Now let's take at what happens