1 00:00:00,280 --> 00:00:01,520 In this final video about transactions, 2 00:00:02,480 --> 00:00:04,550 we'll focus on the concept of isolation levels. 3 00:00:05,690 --> 00:00:07,070 As a reminder transactions are a 4 00:00:07,360 --> 00:00:08,790 solution for both the concurrency control 5 00:00:09,410 --> 00:00:10,630 and system failure problem in databases. 6 00:00:11,470 --> 00:00:12,640 A transaction is a 7 00:00:12,690 --> 00:00:13,550 sequence of one or more 8 00:00:13,670 --> 00:00:15,050 operations that's treated as a unit. 9 00:00:15,970 --> 00:00:17,160 Transactions appear to run in 10 00:00:17,240 --> 00:00:18,550 isolation and if the 11 00:00:18,630 --> 00:00:20,270 system fails, each transaction changes 12 00:00:20,700 --> 00:00:22,870 are reflected either entirely or not at all. 13 00:00:23,750 --> 00:00:26,550 On this video we are going to focus on the isolation portion of transactions. 14 00:00:27,930 --> 00:00:30,740 As a reminder we can have multiple clients operating on the same database. 15 00:00:31,610 --> 00:00:34,290 And each client will submit a sequence of transactions. 16 00:00:35,470 --> 00:00:37,170 So we have this client with T1, T2. 17 00:00:37,460 --> 00:00:40,220 Here we have T9, T10, T11 and so forth. 18 00:00:41,160 --> 00:00:43,710 And each transaction itself is a sequence of statements. 19 00:00:45,910 --> 00:00:47,440 Serializeability says that it's 20 00:00:47,720 --> 00:00:49,150 okay for the system to 21 00:00:49,570 --> 00:00:50,990 inter-leave the execution of these 22 00:00:51,090 --> 00:00:52,280 statements with the statements that 23 00:00:52,420 --> 00:00:53,510 are being performed by other clients; 24 00:00:54,450 --> 00:00:56,040 however, the behavior against the 25 00:00:56,370 --> 00:00:57,550 database must be equivalent 26 00:00:58,340 --> 00:01:00,410 to the transactions themselves executing 27 00:01:00,930 --> 00:01:01,950 in some serial order. 28 00:01:02,700 --> 00:01:03,700 For example, in this case, 29 00:01:04,070 --> 00:01:05,950 the system may enforce behavior that's 30 00:01:06,100 --> 00:01:08,580 equivalent to say, doing transaction T9 first. 31 00:01:09,350 --> 00:01:11,450 Then maybe T1, T2, and then T10, 32 00:01:12,540 --> 00:01:12,950 and so on. 33 00:01:14,050 --> 00:01:15,690 Serializability give us understandable behavior 34 00:01:16,180 --> 00:01:17,450 and consistency but it does 35 00:01:17,700 --> 00:01:19,190 have some overhead involved in 36 00:01:19,290 --> 00:01:20,390 the locking protocols that are used 37 00:01:20,700 --> 00:01:22,000 and it does reduce concurrency. 38 00:01:23,800 --> 00:01:24,920 As a result of the overhead and 39 00:01:25,010 --> 00:01:26,720 reduced concurrency, systems do 40 00:01:26,950 --> 00:01:28,530 offer weaker isolation levels. 41 00:01:29,130 --> 00:01:30,480 In the sequel standard, there are 42 00:01:30,740 --> 00:01:33,000 three levels: read uncommitted, read 43 00:01:33,260 --> 00:01:35,570 committed, and repeatable read. And 44 00:01:35,840 --> 00:01:37,510 these isolation levels have lower 45 00:01:37,900 --> 00:01:39,340 overhead and allow higher concurrency 46 00:01:40,340 --> 00:01:41,240 but of course at a cost 47 00:01:41,780 --> 00:01:43,050 which is lower consistency guarantees. 48 00:01:44,780 --> 00:01:46,300 I've listed the three alternative isolation 49 00:01:46,920 --> 00:01:48,180 levels from the weaker to 50 00:01:48,370 --> 00:01:49,920 the stronger and to 51 00:01:50,150 --> 00:01:51,100 complete the picture at the 52 00:01:51,170 --> 00:01:52,040 bottom we have a fourth 53 00:01:52,200 --> 00:01:54,840 one which is serializable, which is what we've been talking about already. 54 00:01:56,210 --> 00:01:57,090 Before we proceed to learn 55 00:01:57,350 --> 00:01:58,620 about the different isolation, let me 56 00:01:58,710 --> 00:01:59,750 mention a couple of 57 00:01:59,870 --> 00:02:01,020 things: First of all, 58 00:02:01,220 --> 00:02:03,140 the Isolation level is per a transaction. 59 00:02:03,520 --> 00:02:05,010 So each client could set 60 00:02:05,320 --> 00:02:07,230 different isolation levels for each of its transactions, 61 00:02:07,900 --> 00:02:08,160 if it wishes. 62 00:02:08,730 --> 00:02:10,380 Second of all, isolation levels 63 00:02:10,880 --> 00:02:11,710 are in the eye of the 64 00:02:11,780 --> 00:02:13,550 beholder, and let me show you what I mean by that. 65 00:02:13,890 --> 00:02:15,560 So each client submits transaction 66 00:02:16,100 --> 00:02:18,560 to the database and might set the isolation level for that transaction. 67 00:02:19,700 --> 00:02:22,570 That isolation level only affects that transaction itself. 68 00:02:23,030 --> 00:02:25,000 It does not affect the behavior of 69 00:02:25,230 --> 00:02:26,840 any other transactions that are running concurrently. 70 00:02:27,860 --> 00:02:29,380 So, for example, our client 71 00:02:29,820 --> 00:02:31,290 on the left might set its 72 00:02:31,500 --> 00:02:32,850 transaction to be a repeatable 73 00:02:33,170 --> 00:02:34,340 read, while our client 74 00:02:34,730 --> 00:02:36,000 on the right will set it's 75 00:02:36,160 --> 00:02:37,930 transaction to read uncommitted and 76 00:02:38,030 --> 00:02:39,180 those properties will be guaranteed 77 00:02:39,690 --> 00:02:41,880 for each of those transactions and won't affect the other. 78 00:02:42,640 --> 00:02:44,050 By the way, the isolation levels 79 00:02:44,370 --> 00:02:45,750 really are specific to reads. 80 00:02:46,200 --> 00:02:47,940 They specify what values might 81 00:02:48,170 --> 00:02:49,100 be seen in the transaction, 82 00:02:50,000 --> 00:02:51,150 as we'll see when we get into the details. 83 00:02:52,530 --> 00:02:54,970 So let's start by defining a concept called dirty reads. 84 00:02:56,130 --> 00:02:57,100 A data item in the database 85 00:02:57,300 --> 00:02:58,370 is dirty if it's been 86 00:02:58,560 --> 00:03:00,900 written by a transaction that has not yet committed. 87 00:03:01,960 --> 00:03:03,190 So for example, here are 88 00:03:03,310 --> 00:03:04,590 two transactions, I'll call them 89 00:03:04,740 --> 00:03:06,010 T1 and T2, and by 90 00:03:06,160 --> 00:03:07,490 the way throughout the rest of 91 00:03:07,590 --> 00:03:08,530 this video I'm going to put 92 00:03:08,710 --> 00:03:10,530 transactions in boxes, and you 93 00:03:10,800 --> 00:03:12,240 can assume implicitly that there 94 00:03:12,310 --> 00:03:13,790 is a commit at the end of each box. 95 00:03:14,100 --> 00:03:15,120 I'm not going to write it each time. 96 00:03:16,130 --> 00:03:17,990 So our first transaction is updating 97 00:03:18,620 --> 00:03:20,050 Standford's enrollment, adding 1000 98 00:03:20,100 --> 00:03:22,650 to it, and our second transaction is 99 00:03:22,820 --> 00:03:24,850 reading the average enrollment in the college table. 100 00:03:25,300 --> 00:03:26,710 We're using our usual database 101 00:03:27,320 --> 00:03:28,950 of students applying to colleges. 102 00:03:29,970 --> 00:03:31,740 So after this enrollment, Standford's 103 00:03:32,550 --> 00:03:34,260 enrollment has 1,000 added to 104 00:03:34,380 --> 00:03:36,770 it, but before the transaction commits at 105 00:03:36,920 --> 00:03:39,070 that point in time the value is what's known as dirty. 106 00:03:40,400 --> 00:03:41,940 If our second transaction here 107 00:03:42,160 --> 00:03:43,810 reads this value then it 108 00:03:43,990 --> 00:03:45,070 might be reading a value that 109 00:03:45,210 --> 00:03:46,760 never actually exists in the database. 110 00:03:47,350 --> 00:03:48,570 And why is that, because 111 00:03:48,930 --> 00:03:50,320 before this transaction commits 112 00:03:50,740 --> 00:03:51,820 there could be a system failure 113 00:03:52,610 --> 00:03:53,800 and the transaction could be rolled 114 00:03:54,110 --> 00:03:56,640 back as we described before and all of it's changes undone. 115 00:03:57,500 --> 00:03:58,820 Meanwhile, however, the second transaction 116 00:03:59,420 --> 00:04:01,380 may have read that value before it was undone. 117 00:04:02,690 --> 00:04:03,840 Here's another example now we 118 00:04:03,960 --> 00:04:05,910 have three transactions T1, T2, 119 00:04:06,110 --> 00:04:07,340 T3 our first transaction 120 00:04:08,010 --> 00:04:09,480 is modifying the GPA of 121 00:04:09,630 --> 00:04:10,730 student's who's high school size 122 00:04:11,220 --> 00:04:12,410 is sufficiently large. 123 00:04:13,170 --> 00:04:14,560 Our second transaction is finding 124 00:04:14,910 --> 00:04:17,280 the GPA of student number 123. 125 00:04:17,580 --> 00:04:18,860 And our third transaction 126 00:04:19,650 --> 00:04:20,800 is modifying the high school 127 00:04:21,180 --> 00:04:22,470 size of student 234. 128 00:04:23,520 --> 00:04:25,170 So if this GPA here 129 00:04:25,420 --> 00:04:26,960 in transaction T2 is read 130 00:04:27,360 --> 00:04:29,140 before the commit of 131 00:04:29,250 --> 00:04:30,910 transaction T1, then that 132 00:04:31,010 --> 00:04:32,690 would be a dirty value for the GPA. 133 00:04:33,030 --> 00:04:34,090 Again, because of this 134 00:04:34,420 --> 00:04:36,610 first transaction, doesn't commit, then 135 00:04:36,880 --> 00:04:38,110 that value will be rolled back. 136 00:04:39,040 --> 00:04:39,980 There's a second case where we 137 00:04:40,110 --> 00:04:41,290 might have dirty data read, 138 00:04:41,440 --> 00:04:42,770 in this trio of transactions 139 00:04:43,690 --> 00:04:45,160 and that's the size high school here. 140 00:04:45,750 --> 00:04:46,990 Because notice that here we're 141 00:04:47,180 --> 00:04:48,410 modifying a high school size, 142 00:04:49,170 --> 00:04:50,390 so if this size of high 143 00:04:50,600 --> 00:04:51,830 school is read before the commit 144 00:04:52,250 --> 00:04:53,790 point of the third transaction, that 145 00:04:54,040 --> 00:04:55,410 would also be a dirty data item. 146 00:04:56,420 --> 00:04:57,890 One clarification about dirty reads, 147 00:04:58,340 --> 00:04:59,320 is that there is no such thing 148 00:04:59,760 --> 00:05:01,330 as a dirty read within the same transaction. 149 00:05:02,720 --> 00:05:04,310 In T3 for example, 150 00:05:04,940 --> 00:05:06,210 after we've modified the size 151 00:05:06,500 --> 00:05:07,660 high school, we might read 152 00:05:08,050 --> 00:05:09,350 the size high school later in 153 00:05:09,440 --> 00:05:12,320 the same transaction and that's not considered a dirty read. 154 00:05:12,910 --> 00:05:13,790 So a read is only dirty 155 00:05:14,100 --> 00:05:15,520 when it reads a uncommitted 156 00:05:15,890 --> 00:05:17,560 value that was modified by a different transaction. 157 00:05:19,160 --> 00:05:21,770 So here's our first isolation level, and it's our weakest one. 158 00:05:22,060 --> 00:05:23,620 It's called Read I'm Committed, and 159 00:05:23,720 --> 00:05:24,570 what is says is that a 160 00:05:24,620 --> 00:05:25,830 transaction that has this 161 00:05:25,980 --> 00:05:28,130 isolation level may perform dirty reads. 162 00:05:28,480 --> 00:05:30,020 It may read values that have 163 00:05:30,200 --> 00:05:32,800 been modified by a different transaction and not yet committed. 164 00:05:33,780 --> 00:05:34,630 So lets take a look at an example. 165 00:05:35,450 --> 00:05:36,050 It's our same example. 166 00:05:36,530 --> 00:05:38,180 We've dropped the third transaction, so 167 00:05:38,300 --> 00:05:39,900 our first transaction is modifying 168 00:05:40,650 --> 00:05:41,780 GPAs in the student table 169 00:05:42,290 --> 00:05:43,860 and our second transaction is 170 00:05:44,020 --> 00:05:45,360 reading average of those GPAs. 171 00:05:46,450 --> 00:05:47,850 So if these transactions are 172 00:05:48,210 --> 00:05:49,960 serializable, then it'll be 173 00:05:50,190 --> 00:05:51,510 the behavior's guaranteed to be 174 00:05:51,600 --> 00:05:52,830 equivalent to either T1 followed 175 00:05:53,390 --> 00:05:55,390 by T2 or T2 followed by T1. 176 00:05:56,320 --> 00:05:57,550 So either the second transaction 177 00:05:58,060 --> 00:05:59,070 will see all the GPAs before 178 00:05:59,200 --> 00:06:00,930 they were updated, or 179 00:06:01,090 --> 00:06:03,150 it will see all the GPAs after they were updated. 180 00:06:04,220 --> 00:06:05,990 As a reminder we don't know which order these will occur in. 181 00:06:06,270 --> 00:06:08,690 Only that the behavior will be equivalent to one of those orders. 182 00:06:09,890 --> 00:06:10,990 Now let's suppose we add 183 00:06:11,260 --> 00:06:13,020 to our second transaction a specification 184 00:06:13,480 --> 00:06:15,390 that it has isolation level read uncommitted. 185 00:06:16,290 --> 00:06:17,480 And by the way, this very long 186 00:06:17,840 --> 00:06:19,600 sentence is how we specify the 187 00:06:19,770 --> 00:06:21,220 isolation level in the SQL standard. 188 00:06:22,310 --> 00:06:23,660 Now when we don't specify an 189 00:06:23,770 --> 00:06:24,820 isolation level, as we haven't 190 00:06:25,160 --> 00:06:26,700 here, the default is serializable. 191 00:06:27,860 --> 00:06:28,750 Although in most of our 192 00:06:28,840 --> 00:06:30,030 examples, it won't actually matter 193 00:06:30,460 --> 00:06:33,250 what the first transaction's isolation level is, as we'll see. 194 00:06:34,020 --> 00:06:34,720 We're going to be focusing on the 195 00:06:34,790 --> 00:06:36,160 data that's read in the 196 00:06:36,270 --> 00:06:38,970 second transaction and typically written in the first transaction. 197 00:06:40,120 --> 00:06:41,430 Okay, so let's see what's going on here. 198 00:06:41,630 --> 00:06:42,690 Again this is T1 and T2 199 00:06:42,930 --> 00:06:45,730 and our first transaction is updating the GPAs. 200 00:06:46,510 --> 00:06:47,380 And now we've said in our 201 00:06:47,490 --> 00:06:48,690 second that it's okay for 202 00:06:48,850 --> 00:06:50,370 this average to read dirty 203 00:06:50,740 --> 00:06:51,830 values, in other words, to 204 00:06:52,110 --> 00:06:53,830 see uncommitted GPA modifications. 205 00:06:55,160 --> 00:06:56,220 In that case, as the 206 00:06:56,330 --> 00:06:58,000 average is computed, it could 207 00:06:58,050 --> 00:06:59,150 be computed right in the 208 00:06:59,470 --> 00:07:00,440 middle of the set of 209 00:07:00,540 --> 00:07:02,350 modifications being performed by T1. 210 00:07:03,700 --> 00:07:06,470 In that case, we certainly don't have serializable behavior. 211 00:07:06,750 --> 00:07:08,150 We don't have T1 followed by 212 00:07:08,320 --> 00:07:09,720 T2, since T2 is 213 00:07:09,930 --> 00:07:11,120 reading some values that are 214 00:07:11,210 --> 00:07:12,700 in the middle of T1, and similarly 215 00:07:13,210 --> 00:07:14,720 we don't have T2 followed by T1. 216 00:07:15,680 --> 00:07:16,670 It might be that for our 217 00:07:16,920 --> 00:07:18,420 particular application, we just don't 218 00:07:18,610 --> 00:07:20,160 care that much about having exact consistency. 219 00:07:21,330 --> 00:07:22,210 It may be that we don't 220 00:07:22,400 --> 00:07:23,470 mind if our average is 221 00:07:23,590 --> 00:07:24,720 computed with some old values 222 00:07:25,200 --> 00:07:26,420 and some new values, we might 223 00:07:26,580 --> 00:07:27,530 not even mind if we compute 224 00:07:28,020 --> 00:07:30,010 in our average an increased GPA 225 00:07:30,550 --> 00:07:33,030 that ends up being undone when a transaction rolls back. 226 00:07:33,770 --> 00:07:34,790 So if we're just looking for a 227 00:07:34,950 --> 00:07:36,700 rough approximate GPA we can 228 00:07:36,870 --> 00:07:38,310 use this isolation level, and 229 00:07:38,500 --> 00:07:40,460 we'll have increased concurrency decreased 230 00:07:40,980 --> 00:07:43,060 overhead better performance overall with 231 00:07:43,280 --> 00:07:45,570 the understanding that it will have reduced consistency guarantees. 232 00:07:46,990 --> 00:07:48,070 Let's go one step up to 233 00:07:48,140 --> 00:07:50,430 the next isolation level which is called "read committed". 234 00:07:51,130 --> 00:07:52,510 As you can probably guess this one 235 00:07:52,920 --> 00:07:55,420 specifies that transactions may not perform dirty reads. 236 00:07:56,020 --> 00:07:57,120 They may only read data values 237 00:07:57,670 --> 00:07:59,210 whose updates by other transactions 238 00:07:59,870 --> 00:08:01,050 have been committed to the database. 239 00:08:02,320 --> 00:08:03,730 Now this isolation level is stronger 240 00:08:04,360 --> 00:08:05,840 but it still doesn't guarantee global 241 00:08:06,200 --> 00:08:08,300 serializability. Let's take a look through an example. 242 00:08:09,380 --> 00:08:11,030 Our first transaction, T1, is 243 00:08:11,170 --> 00:08:12,650 the same one, modifying the GPA 244 00:08:13,270 --> 00:08:14,420 for students from large high schools. 245 00:08:15,430 --> 00:08:17,330 Our second transaction is the 246 00:08:17,420 --> 00:08:19,090 one where we are going reset the isolation level. 247 00:08:19,390 --> 00:08:20,280 In this case to read committed. 248 00:08:21,270 --> 00:08:22,380 And it is going to perform two statements. 249 00:08:23,040 --> 00:08:23,930 One of them is going to 250 00:08:24,040 --> 00:08:25,530 read the average GPA from 251 00:08:25,690 --> 00:08:26,950 the student table, and the other 252 00:08:27,070 --> 00:08:29,390 is going to read the maximum GPA from the student table. 253 00:08:30,730 --> 00:08:31,760 So let's look at one behavior 254 00:08:32,250 --> 00:08:35,180 that's consistent with isolation level, but we will see is not serializable. 255 00:08:35,350 --> 00:08:37,120 Let's suppose that this 256 00:08:37,440 --> 00:08:38,760 average GPA is read 257 00:08:39,150 --> 00:08:41,010 before transaction T1, but 258 00:08:41,140 --> 00:08:43,900 the max GPA is computed after transaction T1. 259 00:08:44,110 --> 00:08:45,550 So the average will take, 260 00:08:45,890 --> 00:08:47,080 will not take into account the 261 00:08:47,150 --> 00:08:48,310 increases but the max will 262 00:08:48,580 --> 00:08:50,620 take account, will take into account the increases. 263 00:08:51,750 --> 00:08:54,230 So, let's see if this is equivalent to any serial order. 264 00:08:54,710 --> 00:08:56,550 Is it equivalent to T1 followed by T2? 265 00:08:57,510 --> 00:08:58,740 Well it's certainly not because 266 00:08:59,390 --> 00:09:01,090 T2's first statement is reading 267 00:09:01,570 --> 00:09:02,680 the state of the table 268 00:09:03,160 --> 00:09:04,770 before T1 and not the 269 00:09:04,830 --> 00:09:06,540 state of the table afterward, although 270 00:09:06,780 --> 00:09:09,390 C2 second statement is reading the state of the table afterward. 271 00:09:10,530 --> 00:09:11,560 Similarly it's not equivalent 272 00:09:12,450 --> 00:09:14,050 to T2 followed by T1 because 273 00:09:14,370 --> 00:09:15,740 T2 is reading in its 274 00:09:16,210 --> 00:09:18,520 second statement the state of the database after T1. 275 00:09:19,260 --> 00:09:21,000 So there's no equivalent serial order. 276 00:09:21,650 --> 00:09:23,830 But again, perhaps that's not needed for the particular application. 277 00:09:24,960 --> 00:09:26,380 And by using Read Committed we 278 00:09:26,540 --> 00:09:29,750 do get somewhat more performance then we would have if we were serializable. 279 00:09:31,040 --> 00:09:33,420 Our next isolation level is called Repeatable Read. 280 00:09:33,740 --> 00:09:36,110 And it's our strongest one before we get to Serializable. 281 00:09:37,340 --> 00:09:38,740 In Repeatable Read, a transaction 282 00:09:39,160 --> 00:09:41,710 may not perform dirty reads just like in read committed. 283 00:09:42,060 --> 00:09:43,560 And furthermore, there is 284 00:09:43,630 --> 00:09:44,770 an additional constraint that if 285 00:09:44,900 --> 00:09:47,590 an item is read multiple times, it can't change value. 286 00:09:48,370 --> 00:09:49,100 You might remember in our previous 287 00:09:49,450 --> 00:09:50,380 example, we read the GPA 288 00:09:51,040 --> 00:09:52,700 multiple times, and it did change value. 289 00:09:53,380 --> 00:09:55,010 So if we were using Repeatable Read 290 00:09:55,250 --> 00:09:56,930 for the consistency level there, then 291 00:09:57,070 --> 00:09:58,720 the behavior that I described couldn't occur. 292 00:09:59,740 --> 00:10:00,970 So, even with this stronger 293 00:10:01,520 --> 00:10:02,930 condition, we still don't have 294 00:10:03,520 --> 00:10:06,940 a guarantee of global serializability, and we'll again see that through an example. 295 00:10:07,670 --> 00:10:09,700 Our examples are getting a little more complicated, here. 296 00:10:10,490 --> 00:10:12,010 So we have our two transactions T1, 297 00:10:12,620 --> 00:10:14,470 T2, our first transaction is 298 00:10:14,580 --> 00:10:16,020 still modifying the GPA (I 299 00:10:16,100 --> 00:10:17,240 took away the condition about the 300 00:10:17,310 --> 00:10:18,470 high school size, just to keep 301 00:10:18,600 --> 00:10:20,380 things simple) and our second 302 00:10:20,690 --> 00:10:22,290 statement in our first transaction is 303 00:10:22,310 --> 00:10:23,380 modifying the high school size 304 00:10:23,760 --> 00:10:25,400 of the student with ID 123. 305 00:10:25,650 --> 00:10:28,510 So we first modified GPA's and then a high school size. 306 00:10:29,590 --> 00:10:30,910 In our second transaction, and 307 00:10:31,030 --> 00:10:31,970 that's the one we're setting as 308 00:10:32,080 --> 00:10:33,350 Repeatable read, we are 309 00:10:33,510 --> 00:10:34,830 going to read the average GPA, 310 00:10:35,440 --> 00:10:37,160 as we usually do, and this 311 00:10:37,330 --> 00:10:38,110 time we are going to read the 312 00:10:38,310 --> 00:10:39,810 average of the high school sizes. 313 00:10:40,690 --> 00:10:42,420 Incidentally, our first transaction is 314 00:10:42,720 --> 00:10:44,460 serializable, as they always are by default. 315 00:10:45,720 --> 00:10:46,570 Let's look at a behavior 316 00:10:46,960 --> 00:10:48,430 where the first statement reading 317 00:10:48,690 --> 00:10:51,070 the average GPA is executed 318 00:10:51,620 --> 00:10:53,140 before transaction T1, or sees 319 00:10:53,470 --> 00:10:54,570 the values before T1, while 320 00:10:55,260 --> 00:10:56,470 our second statement, the high 321 00:10:56,680 --> 00:10:58,030 school size, sees the values 322 00:10:58,600 --> 00:10:59,970 after transaction T1. 323 00:11:01,030 --> 00:11:01,830 So let's check our conditions. 324 00:11:02,620 --> 00:11:04,060 We are not performing dirty reads, 325 00:11:04,730 --> 00:11:06,190 because the first read here 326 00:11:06,590 --> 00:11:07,690 is of the committed value 327 00:11:07,970 --> 00:11:08,840 before T1 and the second 328 00:11:09,200 --> 00:11:10,330 read is the committed value after 329 00:11:10,700 --> 00:11:13,050 T1 and furthermore, any 330 00:11:13,260 --> 00:11:14,620 items that are read multiple times 331 00:11:14,840 --> 00:11:15,930 have not had their value changed 332 00:11:16,270 --> 00:11:18,430 because we are actually not reading any values multiple times. 333 00:11:19,530 --> 00:11:21,390 So the execution of the 334 00:11:21,520 --> 00:11:22,750 first statement here, before T1 335 00:11:22,870 --> 00:11:24,090 and the second one after is 336 00:11:24,410 --> 00:11:26,570 legal in the repeatable read isolation level. 337 00:11:27,270 --> 00:11:28,390 Yet we're still not serializable. 338 00:11:29,710 --> 00:11:30,790 We're not equivalent to T1 339 00:11:31,250 --> 00:11:33,060 before T2 because again 340 00:11:33,540 --> 00:11:35,640 this Statement of T2 341 00:11:35,900 --> 00:11:37,240 is going - sorry - 342 00:11:37,260 --> 00:11:38,320 the first statement of T2 343 00:11:38,570 --> 00:11:41,020 is going before T1 or 344 00:11:41,110 --> 00:11:42,400 seeing the state before T1 and 345 00:11:42,770 --> 00:11:43,760 we're not equivalent to T2 346 00:11:44,140 --> 00:11:45,790 followed by T1 because the 347 00:11:45,890 --> 00:11:47,240 second statement of T2 is 348 00:11:47,580 --> 00:11:48,740 seeing the state after T1. 349 00:11:49,800 --> 00:11:51,180 Now there is another situation with repeatable 350 00:11:51,770 --> 00:11:52,970 read that's quite important to understand. 351 00:11:54,220 --> 00:11:56,690 We said that a transaction can't perform dirty reads, and it can't. 352 00:11:57,360 --> 00:11:58,380 We also said that when an 353 00:11:58,490 --> 00:12:00,630 item that's read multiple times can't change value. 354 00:12:01,550 --> 00:12:02,590 But the fact is that Repeatable 355 00:12:03,150 --> 00:12:04,500 Read does allow a relation 356 00:12:05,050 --> 00:12:06,270 to change value if it's 357 00:12:06,420 --> 00:12:09,000 read multiple times through what's known as phantom tuples. 358 00:12:09,260 --> 00:12:11,070 Let me explain through an example. 359 00:12:12,160 --> 00:12:13,350 Let's suppose our first transaction 360 00:12:14,000 --> 00:12:15,900 inserts a hundred new students into the database. 361 00:12:16,760 --> 00:12:17,870 And that's run concurrently with our 362 00:12:17,950 --> 00:12:19,650 second transaction, which is 363 00:12:19,840 --> 00:12:20,910 right at the repeatable 364 00:12:21,470 --> 00:12:22,910 read isolation level and now 365 00:12:23,180 --> 00:12:23,950 we're just going to read the average 366 00:12:24,470 --> 00:12:25,320 GPA and we're going 367 00:12:25,450 --> 00:12:26,770 to follow that with the max GPA 368 00:12:27,580 --> 00:12:28,800 similar to one of our earlier examples. 369 00:12:30,100 --> 00:12:31,920 Now, repeatable read actually does 370 00:12:32,400 --> 00:12:33,960 allow behavior where this 371 00:12:34,330 --> 00:12:35,740 average is computed before T1 372 00:12:36,430 --> 00:12:38,600 and this max is computed at after T1. 373 00:12:39,580 --> 00:12:41,680 So the justification behind that is 374 00:12:42,000 --> 00:12:42,970 pretty much that when we 375 00:12:43,360 --> 00:12:44,540 do the second read of the 376 00:12:44,640 --> 00:12:46,480 GPA, the tuples that 377 00:12:46,640 --> 00:12:47,910 we're reading for a second time 378 00:12:48,240 --> 00:12:49,560 do still have the same value. 379 00:12:50,380 --> 00:12:51,520 So we are reading those some 380 00:12:51,690 --> 00:12:52,640 new tuples that were inserted 381 00:12:53,300 --> 00:12:54,230 and in fact if this max 382 00:12:54,650 --> 00:12:55,920 were an average instead of max 383 00:12:56,310 --> 00:12:57,340 we might get 2 different answers 384 00:12:58,190 --> 00:12:59,420 for the average, even with Repeatable 385 00:12:59,940 --> 00:13:00,950 Read at the isolation level. 386 00:13:01,670 --> 00:13:02,740 But that's what it allows and 387 00:13:02,990 --> 00:13:04,410 these hundred tuples here are 388 00:13:04,470 --> 00:13:06,070 what are know as the phantom tuples. 389 00:13:06,490 --> 00:13:09,640 They sort of emerged during execution out of nowhere. 390 00:13:10,450 --> 00:13:11,570 Now, I would have 391 00:13:11,770 --> 00:13:12,880 to say that my opinion is 392 00:13:13,090 --> 00:13:14,650 that this behavior within the 393 00:13:14,730 --> 00:13:16,700 repeatable read isolation level, although 394 00:13:16,980 --> 00:13:18,180 it's part of the standard, is really 395 00:13:18,450 --> 00:13:19,380 in effect of the way repeatable 396 00:13:19,900 --> 00:13:21,650 read is implemented using Locks. 397 00:13:22,300 --> 00:13:23,600 When a value is read once, 398 00:13:24,040 --> 00:13:25,270 it's locked and can't be modified, 399 00:13:26,280 --> 00:13:27,400 but when we insert new tuples, 400 00:13:27,820 --> 00:13:29,020 they aren't inserted with locks 401 00:13:29,430 --> 00:13:31,680 so they can read in a second read of the same relation. 402 00:13:32,600 --> 00:13:34,130 Don't worry about the implementation details, 403 00:13:34,590 --> 00:13:35,810 but do worry about phantom 404 00:13:36,210 --> 00:13:37,070 tuples because if you're using 405 00:13:37,560 --> 00:13:39,060 the repeatable read isolation level, you 406 00:13:39,230 --> 00:13:39,980 do need to know 407 00:13:40,550 --> 00:13:41,960 that insertions can be made 408 00:13:42,590 --> 00:13:44,530 by another transaction, even between 409 00:13:45,010 --> 00:13:46,620 two entire readings of a table. 410 00:13:47,510 --> 00:13:48,560 Now on the other hand, if 411 00:13:48,770 --> 00:13:49,530 what we do in our first 412 00:13:49,770 --> 00:13:51,150 transaction is delete the 413 00:13:51,430 --> 00:13:52,620 hundred tuples instead of insert 414 00:13:53,070 --> 00:13:54,380 them, in that case we 415 00:13:54,570 --> 00:13:55,590 actually can not get the 416 00:13:55,660 --> 00:13:56,680 behavior where the first statement 417 00:13:57,050 --> 00:13:58,670 is before and the second statement is after. 418 00:13:59,390 --> 00:14:00,660 Because once these, the average 419 00:14:01,140 --> 00:14:02,030 value has been read of 420 00:14:02,130 --> 00:14:03,660 this GPA, this deletion 421 00:14:04,170 --> 00:14:05,650 will not be allowed because, 422 00:14:06,030 --> 00:14:08,580 again kind of an implementation, but those values are locked. 423 00:14:09,340 --> 00:14:10,620 And so in this case, the second 424 00:14:11,190 --> 00:14:13,410 read of that same relation wouldn't be allowed. 425 00:14:14,150 --> 00:14:15,160 So in summary we may have 426 00:14:15,460 --> 00:14:16,790 phantom tuples up here between 427 00:14:17,320 --> 00:14:18,420 two reads of the 428 00:14:18,490 --> 00:14:19,940 same relation in a repeatable 429 00:14:20,260 --> 00:14:21,970 read transaction, but we 430 00:14:22,130 --> 00:14:23,580 won't have tuples disappear from 431 00:14:23,760 --> 00:14:25,630 the relation in between two reads of it. 432 00:14:26,540 --> 00:14:27,770 So, that completes our three isolations 433 00:14:28,310 --> 00:14:29,530 levels, in addition to serializable 434 00:14:30,410 --> 00:14:31,640 we had at the weakest read-uncommitted, 435 00:14:32,590 --> 00:14:34,410 then read-committed, and then repeatable read. 436 00:14:35,140 --> 00:14:38,080 I did want to mention that we can also set transactions to be read-only. 437 00:14:39,190 --> 00:14:40,630 That's sort of orthogonal to setting 438 00:14:41,020 --> 00:14:42,510 the isolation level, what it 439 00:14:42,730 --> 00:14:44,510 does is it helps the system optimize performance. 440 00:14:45,450 --> 00:14:46,600 So, for example, in our 441 00:14:46,760 --> 00:14:47,900 transaction, where we were 442 00:14:47,970 --> 00:14:49,200 just reading the average GPA and 443 00:14:49,420 --> 00:14:50,750 the max GPA, we can 444 00:14:50,930 --> 00:14:52,590 set an isolation level, and then 445 00:14:52,690 --> 00:14:53,900 we can also tell the system 446 00:14:54,260 --> 00:14:55,600 that it's going to be a read only transaction. 447 00:14:56,300 --> 00:14:57,090 That means that we are not going 448 00:14:57,250 --> 00:15:00,000 to perform any modifications to the database within the transaction. 449 00:15:01,240 --> 00:15:02,410 The system can use that as 450 00:15:02,450 --> 00:15:03,650 a hint to figure out his 451 00:15:03,780 --> 00:15:05,210 protocols to guarantee the 452 00:15:05,290 --> 00:15:06,790 right isolation level, but it 453 00:15:06,880 --> 00:15:07,830 might not have as much 454 00:15:08,000 --> 00:15:09,450 overhead as if the 455 00:15:09,530 --> 00:15:11,160 transaction had the possibility of 456 00:15:11,240 --> 00:15:13,600 performing modifications as well as performing reads. 457 00:15:14,650 --> 00:15:15,970 OK, so the behavior of transactions 458 00:15:16,540 --> 00:15:18,310 can be rather confusing and 459 00:15:18,610 --> 00:15:21,120 it's very important to get it right or surprising things might happen. 460 00:15:21,900 --> 00:15:24,830 But I think we can summarize it pretty well with this table here. 461 00:15:25,480 --> 00:15:26,830 We're going here from the weakest 462 00:15:27,060 --> 00:15:28,720 to the strongest and we 463 00:15:29,200 --> 00:15:31,010 can classify the behavior of 464 00:15:31,070 --> 00:15:34,380 transactions based on again what happens with reading. 465 00:15:34,820 --> 00:15:36,000 Can they read uncommitted values? 466 00:15:36,800 --> 00:15:38,310 Can they have non-repeatable reads, where 467 00:15:38,700 --> 00:15:39,530 we read a value and 468 00:15:39,610 --> 00:15:40,740 then read a different one later in 469 00:15:40,820 --> 00:15:42,640 the same transaction, and can there 470 00:15:42,770 --> 00:15:44,640 be phantom tuples inserted during the transaction? 471 00:15:45,540 --> 00:15:47,210 If we set our isolation level 472 00:15:47,530 --> 00:15:49,280 to serializable, then we 473 00:15:49,510 --> 00:15:50,930 cannot have dirty reads, we 474 00:15:51,070 --> 00:15:53,890 cannot have non-repeatable reads and we cannot have phantoms. 475 00:15:55,550 --> 00:15:57,000 If we go one step weaker 476 00:15:57,390 --> 00:15:59,000 for a little more performance, and use 477 00:15:59,210 --> 00:16:01,350 repeatable read, then we still won't have dirty reads, 478 00:16:01,770 --> 00:16:03,330 we still won't have non-repeatable reads, 479 00:16:03,670 --> 00:16:04,980 but we might have phantom tuples. 480 00:16:06,490 --> 00:16:07,510 Moving up with Read 481 00:16:07,880 --> 00:16:09,120 Committed, we still won't have 482 00:16:09,300 --> 00:16:11,330 dirty reads but we might have non-repeatable reads. 483 00:16:11,620 --> 00:16:12,390 So we might read a value 484 00:16:12,830 --> 00:16:14,230 that's committed both times we 485 00:16:14,490 --> 00:16:15,800 read it however a transaction 486 00:16:16,660 --> 00:16:17,760 wrote the value in between 487 00:16:18,110 --> 00:16:19,230 those two reads so it's 488 00:16:19,350 --> 00:16:21,920 different each time and we may have phantoms as well. 489 00:16:22,700 --> 00:16:24,990 Finally read uncommitted is the absolute weakest, 490 00:16:25,360 --> 00:16:26,550 not many guarantees at all. 491 00:16:26,820 --> 00:16:27,990 We might have dirty reads, we 492 00:16:28,140 --> 00:16:30,370 might have not repeatable reads, and we might have phantoms. 493 00:16:31,330 --> 00:16:32,670 So to wrap up transactions completely 494 00:16:33,360 --> 00:16:34,670 the standard default is serializable 495 00:16:35,480 --> 00:16:37,780 behavior, and we specified exactly what that means. 496 00:16:38,490 --> 00:16:39,980 Weaker isolation levels allow 497 00:16:40,330 --> 00:16:42,000 us to increase concurrency, decrease overhead 498 00:16:42,510 --> 00:16:43,790 so overhaul will get an increased performance, 499 00:16:44,810 --> 00:16:46,330 but we have weaker consistency guarantees. 500 00:16:47,280 --> 00:16:48,590 I should mention that some 501 00:16:48,840 --> 00:16:50,390 prominent database systems have 502 00:16:50,610 --> 00:16:52,800 actually chosen to have repeatable read as their default. 503 00:16:53,460 --> 00:16:55,260 Oracle and MySQL are examples of that. 504 00:16:55,900 --> 00:16:57,550 So, in those systems it's assumed 505 00:16:57,860 --> 00:16:59,580 that most applications will be 506 00:16:59,700 --> 00:17:00,970 willing to sacrifice a little 507 00:17:01,120 --> 00:17:03,190 bit inconsistency in order to get higher performance. 508 00:17:03,770 --> 00:17:05,330 And, finally, the isolation 509 00:17:05,570 --> 00:17:06,650 level is set for each transaction 510 00:17:07,720 --> 00:17:08,780 and is in the eye of 511 00:17:08,910 --> 00:17:10,550 the beholder, meaning that that 512 00:17:10,940 --> 00:17:12,350 transaction's reads must conform to 513 00:17:12,480 --> 00:17:13,880 its own isolation level but 514 00:17:13,980 --> 00:17:16,550 won't affect any concurrent transaction's isolation level. 515 00:17:17,400 --> 00:17:18,210 I hope I've gotten the point 516 00:17:18,430 --> 00:17:19,650 across that transactions are one 517 00:17:19,860 --> 00:17:21,810 of the most important concepts in database systems. 518 00:17:22,670 --> 00:17:23,970 They are what allow multiple clients, 519 00:17:24,490 --> 00:17:25,680 maybe thousands, tens of thousands 520 00:17:26,070 --> 00:17:27,270 of clients, to operate on 521 00:17:27,470 --> 00:17:28,380 a data base, all at the 522 00:17:28,440 --> 00:17:30,160 same time without concern that 523 00:17:30,340 --> 00:17:31,400 the actions they perform on the 524 00:17:31,500 --> 00:17:34,010 data will affect each other in unpredictable ways. 525 00:17:34,710 --> 00:17:36,190 And furthermore, transactions are what 526 00:17:36,380 --> 00:17:37,790 allow database systems to recover 527 00:17:38,110 --> 00:17:40,700 when there's an unexpected crash into a consistent space.