1 00:00:00,470 --> 00:00:03,040 In this video, we'll be learning about null values in SQL. 2 00:00:04,390 --> 00:00:05,720 As usual, we'll be considering a 3 00:00:05,890 --> 00:00:07,330 simple college admissions database, and 4 00:00:07,590 --> 00:00:08,670 we'll be exploring the select 5 00:00:09,040 --> 00:00:10,470 from where query expressions. 6 00:00:11,570 --> 00:00:12,850 Now instead of extending what 7 00:00:13,000 --> 00:00:14,490 we can write in queries, in 8 00:00:14,570 --> 00:00:17,090 this video we'll be looking at extending what's in the data. 9 00:00:17,920 --> 00:00:19,860 Specifically, in relational databases, 10 00:00:20,750 --> 00:00:22,660 unless specified otherwise, any value 11 00:00:23,110 --> 00:00:25,330 in an attribute can take on the special value null. 12 00:00:26,320 --> 00:00:27,470 Null is usually used to 13 00:00:27,590 --> 00:00:29,620 mean that the value is undefined or unknown. 14 00:00:30,520 --> 00:00:31,600 For example, we might have 15 00:00:31,740 --> 00:00:33,260 a student whose GPA is unknown 16 00:00:33,670 --> 00:00:34,830 or who doesn't have a GPA, 17 00:00:35,100 --> 00:00:37,390 and we would represent that in our data with null. 18 00:00:38,430 --> 00:00:39,810 Or we might create our 19 00:00:40,050 --> 00:00:41,460 reply relation with the decisions 20 00:00:42,150 --> 00:00:43,310 not yet decided, and those 21 00:00:43,550 --> 00:00:45,000 decision values would also be null. 22 00:00:45,770 --> 00:00:46,570 So what we're going to look 23 00:00:46,720 --> 00:00:48,230 at in our demo queries is what 24 00:00:48,470 --> 00:00:49,370 happens when we have null 25 00:00:49,630 --> 00:00:51,390 values and we run queries over the database. 26 00:00:51,860 --> 00:00:54,440 Here is the set of students in our database. 27 00:00:55,380 --> 00:00:56,540 In order to explore what happens 28 00:00:56,920 --> 00:00:58,720 when we have null values, let's insert 29 00:00:59,240 --> 00:01:01,110 two new students, Kevin and 30 00:01:01,220 --> 00:01:03,650 Laurie, each of which have a null GPA. 31 00:01:05,800 --> 00:01:06,710 We go back and we look 32 00:01:06,830 --> 00:01:08,040 at our student table and now 33 00:01:08,230 --> 00:01:09,360 we see Kevin and Laurie as 34 00:01:09,490 --> 00:01:10,940 part of that table, and in 35 00:01:11,050 --> 00:01:14,180 this particular interface null values are depicted as blanks. 36 00:01:15,390 --> 00:01:16,140 Now let's run some queries. 37 00:01:17,990 --> 00:01:20,950 Our first query finds students whose GPA is greater than 3.5. 38 00:01:21,120 --> 00:01:22,990 It's a simple query to write. 39 00:01:23,700 --> 00:01:26,520 We run the query and we find a number of students in our result. 40 00:01:27,260 --> 00:01:28,310 We don't have Kevin and Laurie, 41 00:01:28,810 --> 00:01:29,910 because their GPAs are null, 42 00:01:30,610 --> 00:01:32,430 so we can't determine that they're greater than 3.5. 43 00:01:33,580 --> 00:01:35,450 Let's run another query 44 00:01:35,860 --> 00:01:37,980 that says GPA less than or equal to 3.5. 45 00:01:38,900 --> 00:01:39,910 We'll get a different set of students, 46 00:01:40,600 --> 00:01:43,110 as we see, but we still don't get Kevin and Laurie. 47 00:01:44,110 --> 00:01:45,660 Now let's run a query 48 00:01:45,830 --> 00:01:46,760 where we ask for the GPA 49 00:01:49,380 --> 00:01:51,010 to be greater than 3.5 or less than or equal to 3.5. 50 00:01:51,070 --> 00:01:53,130 And this is the one that's a little bit funny. 51 00:01:53,790 --> 00:01:54,680 Actually, we talked about this 52 00:01:54,780 --> 00:01:57,530 very same query back when we were talking about the relational model. 53 00:01:58,550 --> 00:01:59,420 When you look at the query, 54 00:01:59,650 --> 00:02:00,720 you would think that you 55 00:02:00,780 --> 00:02:02,050 would get every student in the 56 00:02:02,130 --> 00:02:04,290 database, because everybody's GPA obviously 57 00:02:04,740 --> 00:02:06,820 is either greater than 3.5 or less or equal to 3.5. 58 00:02:06,850 --> 00:02:08,920 Or, from a more logical point 59 00:02:09,210 --> 00:02:10,270 of view, this is an 60 00:02:10,350 --> 00:02:12,030 expression that's always true. 61 00:02:13,140 --> 00:02:14,250 But you might not be 62 00:02:14,380 --> 00:02:15,360 surprised when we run the 63 00:02:15,440 --> 00:02:16,760 query, we still don't get 64 00:02:17,530 --> 00:02:18,660 Kevin or Laurie, because we can't 65 00:02:18,840 --> 00:02:20,100 determine that their GPA satisfies 66 00:02:20,810 --> 00:02:21,650 either of these conditions. 67 00:02:22,610 --> 00:02:23,630 So even when we have 68 00:02:23,900 --> 00:02:25,240 a query that looks like the 69 00:02:25,310 --> 00:02:27,360 where clause is a tautology, 70 00:02:27,510 --> 00:02:28,830 meaning a logical expression that's always 71 00:02:29,140 --> 00:02:31,430 true, we still don't get all the data in the result. 72 00:02:32,570 --> 00:02:33,780 If we want this query to 73 00:02:33,940 --> 00:02:35,210 return all students in the 74 00:02:35,290 --> 00:02:36,380 database, we can do 75 00:02:36,520 --> 00:02:37,930 that by adding one more clause 76 00:02:38,480 --> 00:02:41,650 to the where, which is or GPA is null. 77 00:02:42,710 --> 00:02:44,280 Now this is truly a condition 78 00:02:44,570 --> 00:02:45,660 that will be matched by every tuple, 79 00:02:46,120 --> 00:02:48,690 because either the GPA is greater 80 00:02:48,910 --> 00:02:50,550 than 3.5, less than or equal 3.5, or it's a null value. 81 00:02:50,980 --> 00:02:51,850 "Is null" is a keyword 82 00:02:52,310 --> 00:02:54,680 phrase in SQL that matches values that are null. 83 00:02:55,460 --> 00:02:56,380 Now when we run the query, 84 00:02:56,800 --> 00:02:57,880 we finally get all of 85 00:02:57,990 --> 00:03:00,160 the students in the database, including Kevin and Laurie. 86 00:03:01,650 --> 00:03:02,490 Just to drive the point home 87 00:03:02,720 --> 00:03:04,390 a bit further, let's continue looking 88 00:03:04,670 --> 00:03:05,660 at queries where we have 89 00:03:05,850 --> 00:03:07,190 null values involved in where clauses. 90 00:03:08,630 --> 00:03:09,370 In this query, we're looking for students 91 00:03:09,820 --> 00:03:10,910 where their GPA is greater than 92 00:03:11,070 --> 00:03:13,840 3.5 or their size high school is less than 1600. 93 00:03:13,970 --> 00:03:16,300 So let's run the query and see who we get. 94 00:03:17,240 --> 00:03:19,380 We get all of these students, including Kevin. 95 00:03:20,230 --> 00:03:21,370 So although his GPA is 96 00:03:21,570 --> 00:03:22,530 null and therefore we can't 97 00:03:22,900 --> 00:03:24,320 determine that it satisfies this 98 00:03:24,470 --> 00:03:25,690 part of the condition, because since 99 00:03:25,920 --> 00:03:27,200 his high school size is 1500, 100 00:03:27,390 --> 00:03:28,590 he does satisfy the second 101 00:03:28,840 --> 00:03:31,130 part of the condition and he does come out in our result. 102 00:03:32,490 --> 00:03:33,610 Now let's add one more 103 00:03:33,760 --> 00:03:35,270 condition, which is or size 104 00:03:36,000 --> 00:03:39,130 high school is greater than or equal to 1600. 105 00:03:39,490 --> 00:03:40,560 So now we have again 106 00:03:41,220 --> 00:03:43,040 something in the condition that looks like a tautology. 107 00:03:43,460 --> 00:03:44,910 It looks like it should always be true. 108 00:03:45,400 --> 00:03:47,360 Everyone's high school is either smaller or larger than 1600. 109 00:03:47,440 --> 00:03:50,140 Furthermore, in this case, we 110 00:03:50,260 --> 00:03:52,850 have no null values for 111 00:03:52,900 --> 00:03:54,270 the high school size, so when 112 00:03:54,470 --> 00:03:55,790 we run this query, we should 113 00:03:56,040 --> 00:03:57,500 get back everybody, and indeed 114 00:03:57,870 --> 00:04:00,070 when we run the query, we do get back all of the students. 115 00:04:01,570 --> 00:04:04,540 Now, I'm not going to go into details here because you can certainly read about it. 116 00:04:04,900 --> 00:04:06,250 The way the where clause is 117 00:04:06,370 --> 00:04:07,520 evaluated when null values 118 00:04:07,980 --> 00:04:09,190 are involved is using a 119 00:04:09,430 --> 00:04:11,130 three valued logic where every 120 00:04:11,510 --> 00:04:14,180 expression is either true or false or unknown. 121 00:04:15,430 --> 00:04:16,670 These logical values are combined 122 00:04:17,200 --> 00:04:18,210 to finally yield a value 123 00:04:18,540 --> 00:04:21,170 for the where expression that determines whether a couple is in the result. 124 00:04:22,800 --> 00:04:23,740 Our last series of queries 125 00:04:24,090 --> 00:04:25,650 explores the interaction between null 126 00:04:25,920 --> 00:04:27,540 values and aggregate functions, specifically 127 00:04:28,120 --> 00:04:29,100 in this case the count function. 128 00:04:29,530 --> 00:04:30,650 There are actually quite a 129 00:04:30,760 --> 00:04:32,350 few subtleties about null values 130 00:04:32,750 --> 00:04:34,770 and aggregation and also null values and sub-queries. 131 00:04:35,730 --> 00:04:36,480 We're not going to do an exhaustive 132 00:04:37,080 --> 00:04:38,130 exploration of these subtleties 133 00:04:38,610 --> 00:04:39,650 here but just give one example. 134 00:04:40,700 --> 00:04:41,980 And we encourage you if you 135 00:04:42,250 --> 00:04:43,260 are using a database with null 136 00:04:43,490 --> 00:04:44,590 values to be very careful 137 00:04:44,930 --> 00:04:45,990 when you write your queries to make 138 00:04:46,150 --> 00:04:47,230 sure you understand how the null 139 00:04:47,450 --> 00:04:49,020 values are going to influence the result. 140 00:04:50,250 --> 00:04:51,690 OK, so our query here is 141 00:04:52,460 --> 00:04:53,270 counting the number of students 142 00:04:53,950 --> 00:04:55,290 whose GPA is not null for starters. 143 00:04:55,880 --> 00:04:56,860 So we run the query and 144 00:04:57,010 --> 00:04:59,710 we see that there twelve students who have a non-null GPA. 145 00:05:00,690 --> 00:05:01,370 Those are the 12 students we had 146 00:05:01,550 --> 00:05:02,870 originally in the database before we 147 00:05:03,010 --> 00:05:04,860 added Kevin and Lorrie with their null GPA's. 148 00:05:06,620 --> 00:05:07,660 Now instead of just counting 149 00:05:08,080 --> 00:05:10,290 the students, let's count the distinct GPAs. 150 00:05:12,300 --> 00:05:14,910 So among these twelve students, how many different GPAs are represented? 151 00:05:16,040 --> 00:05:17,020 It turns out there are seven 152 00:05:17,240 --> 00:05:19,670 distinct GPAs among those that are not null. 153 00:05:20,920 --> 00:05:22,710 Let's further drop the 154 00:05:22,800 --> 00:05:24,240 not null condition and just 155 00:05:24,450 --> 00:05:25,910 count the the distinct GPAs among the students. 156 00:05:27,050 --> 00:05:28,250 That will tell us whether count distinct 157 00:05:29,100 --> 00:05:31,250 includes nulls or not in its count. 158 00:05:32,230 --> 00:05:33,240 We'll run the query and we 159 00:05:33,380 --> 00:05:34,430 get the same answer so we 160 00:05:34,610 --> 00:05:36,360 still have 7 GPAs and 161 00:05:36,650 --> 00:05:38,220 we are not counting the null values. 162 00:05:39,560 --> 00:05:40,840 Now, let's do one 163 00:05:41,510 --> 00:05:42,690 last query, which is to take 164 00:05:43,090 --> 00:05:44,330 away the and just look 165 00:05:44,630 --> 00:05:45,910 at the distinct GPAs themselves. 166 00:05:46,540 --> 00:05:47,800 So, we had seven when we did the count. 167 00:05:48,590 --> 00:05:50,310 We run the query now and we get eight. 168 00:05:51,120 --> 00:05:51,330 What happened? 169 00:05:51,780 --> 00:05:54,030 Well this first tuple is hard with the visualization. 170 00:05:54,110 --> 00:05:55,530 This first answer is 171 00:05:55,720 --> 00:05:57,140 empty and that's actually the null value. 172 00:05:58,170 --> 00:05:59,700 So when we right select the distinct GPA. 173 00:06:00,100 --> 00:06:01,300 We do include the null 174 00:06:01,590 --> 00:06:03,530 value in our result if there is a null GPA. 175 00:06:04,440 --> 00:06:07,320 However, when we count the distinct values, we don't include the null. 176 00:06:07,510 --> 00:06:09,320 And that's really just one 177 00:06:09,610 --> 00:06:11,010 example of the type of 178 00:06:11,910 --> 00:06:13,230 subtleties and possibly unexpected 179 00:06:13,810 --> 00:06:15,910 behavior we might get when we have null values. 180 00:06:16,590 --> 00:06:17,680 So again, I encourage you to 181 00:06:17,740 --> 00:06:19,140 be very careful when writing queries 182 00:06:19,660 --> 00:06:20,760 over databases that include nulls 183 00:06:21,130 --> 00:06:23,410 that you know exactly what you're going to get in your result.