1 00:00:00,410 --> 00:00:02,800 this video covers the topic of materialized views. 2 00:00:04,400 --> 00:00:05,470 As a reminder the reason that 3 00:00:05,600 --> 00:00:06,780 we use views in database systems 4 00:00:07,180 --> 00:00:08,170 is to hide data from users 5 00:00:08,720 --> 00:00:10,890 to make some queries easier or more natural to express. 6 00:00:12,040 --> 00:00:14,250 And to modularize our access to the database. 7 00:00:14,690 --> 00:00:16,270 And real applications do tend 8 00:00:16,600 --> 00:00:18,050 to use lots, and lots, and lots of views. 9 00:00:19,760 --> 00:00:21,640 So those views are for virtual views. 10 00:00:22,230 --> 00:00:23,070 Virtual views are what we've 11 00:00:23,190 --> 00:00:24,110 been talking about in our previous 12 00:00:24,570 --> 00:00:26,130 videos, I'm not actually sure I used that terminology. 13 00:00:27,840 --> 00:00:28,850 A virtual view is the usual type of view 14 00:00:29,010 --> 00:00:31,210 where we define it as a query of the database. 15 00:00:32,030 --> 00:00:33,560 We don't actually create a table for the view. 16 00:00:34,200 --> 00:00:36,950 Queries and modifications are rewritten based on the view definition. 17 00:00:38,380 --> 00:00:39,490 Now there's also a notion 18 00:00:39,800 --> 00:00:41,390 of a materialized view obviously for 19 00:00:41,580 --> 00:00:43,320 this video and materialized views 20 00:00:43,740 --> 00:00:45,340 give us the same advantages of 21 00:00:45,420 --> 00:00:46,800 virtual views but one additional 22 00:00:47,380 --> 00:00:48,610 advantage which is perhaps the 23 00:00:48,670 --> 00:00:49,950 most important one, which is 24 00:00:50,130 --> 00:00:52,180 to improve query performance over the database. 25 00:00:53,390 --> 00:00:54,130 So again as a quick reminder 26 00:00:54,530 --> 00:00:56,110 about virtual views, we 27 00:00:56,220 --> 00:00:57,590 define a view "V" say, 28 00:00:58,250 --> 00:00:59,750 by giving a query to specify 29 00:01:00,360 --> 00:01:02,990 the view over some relations or even other views. 30 00:01:03,760 --> 00:01:06,570 The schema of the view is the schema of the result of the query. 31 00:01:07,470 --> 00:01:08,900 When we have a query 32 00:01:09,570 --> 00:01:10,400 queue, so this is a user 33 00:01:10,760 --> 00:01:12,660 query, that references the 34 00:01:12,830 --> 00:01:14,970 view V then conceptually, not 35 00:01:15,170 --> 00:01:16,610 actually, we can imagine 36 00:01:17,080 --> 00:01:18,430 that there is a table called V. 37 00:01:19,090 --> 00:01:22,060 We run the view query over the current state of the relations. 38 00:01:22,580 --> 00:01:23,560 We put the result in V. 39 00:01:24,110 --> 00:01:25,160 And then we can evaluate the user 40 00:01:25,480 --> 00:01:26,980 query queue which refers to 41 00:01:27,110 --> 00:01:28,680 V. Now in reality what 42 00:01:28,830 --> 00:01:29,810 happens, as we already mentioned, 43 00:01:30,270 --> 00:01:31,390 is that the user query queue 44 00:01:31,760 --> 00:01:33,390 is rewritten based on the 45 00:01:33,450 --> 00:01:35,160 view definition, to just use the base tables. 46 00:01:36,560 --> 00:01:38,600 Now let's talk about what happens to materialized views. 47 00:01:39,270 --> 00:01:41,110 Again, exactly the same, we define a view. 48 00:01:41,320 --> 00:01:42,190 We give it a name, say V. 49 00:01:42,610 --> 00:01:43,650 And we define it as a query 50 00:01:44,190 --> 00:01:45,350 over a set of table 51 00:01:45,800 --> 00:01:48,000 or other views, then the 52 00:01:48,270 --> 00:01:49,730 system actually creates a physical 53 00:01:50,010 --> 00:01:52,490 table V with the schema of the query result. 54 00:01:53,800 --> 00:01:55,390 Next, the view query is 55 00:01:55,640 --> 00:01:56,620 executed over the current 56 00:01:56,890 --> 00:01:57,960 state of the database and the 57 00:01:58,180 --> 00:01:59,580 results are put physically in 58 00:01:59,930 --> 00:02:01,800 that table V. Now queries 59 00:02:02,160 --> 00:02:03,210 can refer to V as 60 00:02:03,360 --> 00:02:05,860 if its a table table because it actually is a table stored in a database. 61 00:02:07,030 --> 00:02:09,260 This all sounds great; of course, there are some down sides. 62 00:02:09,950 --> 00:02:11,850 The first down side is that V could be very large. 63 00:02:12,550 --> 00:02:13,860 When we talked about virtual views. 64 00:02:14,570 --> 00:02:15,750 We showed some examples where we 65 00:02:15,910 --> 00:02:16,910 could create a view that 66 00:02:17,050 --> 00:02:18,390 was just enormous - much 67 00:02:18,630 --> 00:02:19,530 larger than could ever be 68 00:02:19,650 --> 00:02:20,870 stored in the database but because 69 00:02:21,190 --> 00:02:23,590 the view was only a logical concept, it wasn't a problem. 70 00:02:24,180 --> 00:02:25,410 When users ran queries over the 71 00:02:25,500 --> 00:02:26,900 view, they'd typically have selection conditions 72 00:02:27,360 --> 00:02:28,600 so you'd never be materializing 73 00:02:29,670 --> 00:02:30,260 that very large view. 74 00:02:31,010 --> 00:02:32,430 In materialized views, obviously you're 75 00:02:32,810 --> 00:02:34,020 creating the view and so 76 00:02:34,450 --> 00:02:36,610 it is a problem if the view is extremely large. 77 00:02:37,020 --> 00:02:37,670 So, that's one of the downsides. 78 00:02:38,840 --> 00:02:41,660 The other downside is that we need to worry if the view is stored. 79 00:02:42,370 --> 00:02:43,570 What happens when we 80 00:02:43,690 --> 00:02:45,280 have modifications to those tables, 81 00:02:46,100 --> 00:02:46,890 over which V is defined. 82 00:02:47,840 --> 00:02:48,940 We need to actually modify 83 00:02:49,560 --> 00:02:50,970 the stored table V, either makes 84 00:02:51,460 --> 00:02:52,390 changes to it based on 85 00:02:52,500 --> 00:02:53,410 the changes to the base tables 86 00:02:54,160 --> 00:02:55,360 or completely recompute the view. 87 00:02:56,810 --> 00:02:57,830 Let's move now to an example. 88 00:02:58,340 --> 00:02:59,850 And we'll use our usual sample 89 00:03:00,320 --> 00:03:02,090 database shown here at the bottom of the slide. 90 00:03:03,230 --> 00:03:04,640 Let's create a materialized view. 91 00:03:04,810 --> 00:03:05,860 We'll give it the name CACS. 92 00:03:06,440 --> 00:03:08,440 It's for CS applicants to 93 00:03:08,660 --> 00:03:10,510 California colleges, so this 94 00:03:10,800 --> 00:03:11,950 is a three way join 95 00:03:12,340 --> 00:03:13,830 over all of our relations and it's 96 00:03:14,010 --> 00:03:15,130 going to select the college name 97 00:03:15,410 --> 00:03:16,920 and student name when the 98 00:03:17,130 --> 00:03:18,260 student has applied to the 99 00:03:18,320 --> 00:03:19,810 college, the college is in 100 00:03:19,910 --> 00:03:22,480 California, and the student is applying to major in CS. 101 00:03:23,070 --> 00:03:24,560 So once this command 102 00:03:25,030 --> 00:03:26,330 is issued the system will actually 103 00:03:26,840 --> 00:03:28,160 create a table called CACS 104 00:03:29,380 --> 00:03:30,330 and now the good news 105 00:03:30,510 --> 00:03:32,080 we can CACS in any query 106 00:03:32,680 --> 00:03:34,490 we want as if it's a table, because it is. 107 00:03:35,570 --> 00:03:36,700 Now the down side is that 108 00:03:37,070 --> 00:03:38,200 the base data over which 109 00:03:38,450 --> 00:03:39,470 the view is defined is modified 110 00:03:40,140 --> 00:03:41,070 we have to worry that our 111 00:03:41,360 --> 00:03:42,730 view is invalid that it's 112 00:03:42,930 --> 00:03:44,570 become out of sync with with the base data. 113 00:03:45,840 --> 00:03:46,910 So let's think about what modifications 114 00:03:47,940 --> 00:03:48,860 could occur to the database 115 00:03:49,350 --> 00:03:51,030 that would cause the view to 116 00:03:53,070 --> 00:03:55,350 become invalid. 117 00:03:59,430 --> 00:03:59,970 Well we have to worry about the three 118 00:04:00,040 --> 00:04:01,940 relations that are referenced in 119 00:04:02,000 --> 00:04:03,740 the view, that is the college relation, the student relation, and the apply relation, and for the college relation, well inserts 120 00:04:04,500 --> 00:04:05,950 could change the results of the view. 121 00:04:06,220 --> 00:04:07,720 We could have a new college. 122 00:04:08,730 --> 00:04:09,890 It seems unlikely, we could 123 00:04:10,010 --> 00:04:10,970 have a new college that the student 124 00:04:11,270 --> 00:04:12,480 will have already applied to in 125 00:04:12,640 --> 00:04:14,480 California, for C.S. certainly 126 00:04:14,920 --> 00:04:16,000 deletes can affect the view, 127 00:04:16,750 --> 00:04:18,520 and then updates to any 128 00:04:19,210 --> 00:04:20,460 of the attributes that are mentioned in 129 00:04:20,560 --> 00:04:21,670 the view, and for the 130 00:04:21,750 --> 00:04:23,230 college, that would be the 131 00:04:23,460 --> 00:04:25,000 college name and the state. 132 00:04:26,600 --> 00:04:28,700 For the student table, again inserts 133 00:04:29,300 --> 00:04:30,780 to student could affect the 134 00:04:30,850 --> 00:04:32,010 view if we already have an 135 00:04:32,090 --> 00:04:33,930 applied tuple and a college tuple that it matches. 136 00:04:34,880 --> 00:04:36,290 Deletes would certainly affect 137 00:04:36,630 --> 00:04:38,030 the view, and again, updates, 138 00:04:38,860 --> 00:04:40,380 and in this case, the attributes 139 00:04:40,740 --> 00:04:41,540 that are referenced from the student 140 00:04:41,880 --> 00:04:42,970 table are the student 141 00:04:44,030 --> 00:04:45,570 name and the student ID. 142 00:04:46,800 --> 00:04:48,870 And finally apply, again, that 143 00:04:49,130 --> 00:04:50,240 [xx] is the most likely one 144 00:04:50,510 --> 00:04:51,690 that would have modification that would 145 00:04:51,820 --> 00:04:54,560 affect the view, inserts, deletes 146 00:04:56,710 --> 00:04:59,070 and again updates and here 147 00:04:59,340 --> 00:05:00,710 the set of attributes that are 148 00:05:01,040 --> 00:05:02,020 relevant are the the 149 00:05:02,090 --> 00:05:03,830 college name, the student 150 00:05:04,430 --> 00:05:05,730 id, and the major. 151 00:05:07,130 --> 00:05:08,640 Now, if there is certain constraints on 152 00:05:08,770 --> 00:05:11,030 the database, referential integrity constraints for example, it 153 00:05:12,220 --> 00:05:13,100 might be that some of these 154 00:05:13,470 --> 00:05:15,240 operations couldn't affect the view. 155 00:05:15,500 --> 00:05:16,650 For example, we might not 156 00:05:16,850 --> 00:05:17,970 be able to insert a college 157 00:05:18,500 --> 00:05:20,000 where there's already an application for 158 00:05:20,110 --> 00:05:21,870 that college or insert a student likewise. 159 00:05:23,030 --> 00:05:24,050 We might not be able to 160 00:05:24,420 --> 00:05:25,820 delete a college if there's 161 00:05:26,230 --> 00:05:27,620 applications referencing it. 162 00:05:27,740 --> 00:05:29,420 So if there are 163 00:05:29,740 --> 00:05:30,760 additional constraints that the system 164 00:05:31,100 --> 00:05:33,110 is aware of, it might be able to eliminate some of these modifications. 165 00:05:34,210 --> 00:05:36,640 But regardless, many modifications will 166 00:05:36,930 --> 00:05:38,440 have to be monitored to make 167 00:05:38,650 --> 00:05:39,800 sure that the view is modified 168 00:05:40,440 --> 00:05:41,910 to stay in sync with the base data. 169 00:05:43,010 --> 00:05:44,580 By the way, if this 170 00:05:44,820 --> 00:05:45,900 feels a little bit familiar 171 00:05:46,410 --> 00:05:47,670 to you, when we talked about 172 00:05:47,870 --> 00:05:50,040 general assertions over the 173 00:05:50,130 --> 00:05:51,390 database, that was one of 174 00:05:51,460 --> 00:05:53,610 the types of constraints that we could specify. 175 00:05:54,830 --> 00:05:55,910 We went through a similar 176 00:05:56,190 --> 00:05:57,580 exercise where if assertions were 177 00:05:57,700 --> 00:05:58,890 defined as clearly as over 178 00:05:59,060 --> 00:06:00,360 the database and we looked at 179 00:06:00,450 --> 00:06:01,760 what operations could occur, 180 00:06:02,150 --> 00:06:03,280 what modifications to the database 181 00:06:03,930 --> 00:06:05,420 needed to be monitored to see 182 00:06:05,940 --> 00:06:07,090 if an assertion might be invalidated. 183 00:06:07,810 --> 00:06:09,610 Really an assertion can almost 184 00:06:09,960 --> 00:06:12,070 be thought of as a materialized view over the database. 185 00:06:12,960 --> 00:06:14,010 And if you look 186 00:06:14,140 --> 00:06:15,130 back at that video, I think 187 00:06:15,620 --> 00:06:18,200 you'll see there really is a correspondence between those two concepts. 188 00:06:19,670 --> 00:06:21,380 So, just to reiterate the 189 00:06:21,550 --> 00:06:23,180 system [xx materialized views 190 00:06:23,540 --> 00:06:25,210 stored, must monitor all 191 00:06:25,520 --> 00:06:27,440 the modifications that might invalidate the view. 192 00:06:28,120 --> 00:06:29,320 When there is a modification, either 193 00:06:29,570 --> 00:06:31,440 the view can be completely recomputed or 194 00:06:31,920 --> 00:06:33,550 sometimes there's clever algorithms called 195 00:06:33,890 --> 00:06:35,800 incremental maintenance algorithms that 196 00:06:35,980 --> 00:06:37,600 can just make small modifications to 197 00:06:37,730 --> 00:06:38,780 the view based on the 198 00:06:39,130 --> 00:06:40,370 modifications that were made to the base data. 199 00:06:41,870 --> 00:06:43,820 So we've talked about queries over materialized views. 200 00:06:44,370 --> 00:06:46,230 Very simple because the views actually stored in the database. 201 00:06:47,040 --> 00:06:49,470 Now what about modifications on materialized views? 202 00:06:50,290 --> 00:06:51,380 Well there's good news and bad news. 203 00:06:51,930 --> 00:06:53,520 The good news is that since 204 00:06:53,810 --> 00:06:55,320 the table is stored if a 205 00:06:55,420 --> 00:06:56,880 user issues a modification command 206 00:06:57,250 --> 00:06:58,420 and insert, delete, or update command, 207 00:06:59,010 --> 00:07:01,600 the system can just perform that command directly on the table. 208 00:07:02,570 --> 00:07:04,050 The bad news is the 209 00:07:04,160 --> 00:07:06,660 base table still need to stay in sync with the view. 210 00:07:07,210 --> 00:07:08,560 So really the exact same 211 00:07:08,870 --> 00:07:10,150 issues that we talked about with 212 00:07:10,320 --> 00:07:12,190 virtual views about a 213 00:07:12,640 --> 00:07:13,590 modification that the user wishes 214 00:07:14,010 --> 00:07:15,070 to execute on the view 215 00:07:15,660 --> 00:07:16,620 being propagated to the base 216 00:07:16,870 --> 00:07:18,140 tables, occur here the 217 00:07:18,230 --> 00:07:19,280 only difference is that we're 218 00:07:19,450 --> 00:07:21,520 actually modifying the view 219 00:07:21,810 --> 00:07:22,820 as well as modifying the base 220 00:07:23,100 --> 00:07:24,520 tables so I'm going 221 00:07:24,570 --> 00:07:26,160 to draw that same square diagram 222 00:07:26,800 --> 00:07:28,340 that we saw for virtual views 223 00:07:28,770 --> 00:07:30,290 to explain again the 224 00:07:30,450 --> 00:07:32,190 issue with modifications on views, 225 00:07:32,810 --> 00:07:33,620 so we have our view V 226 00:07:34,460 --> 00:07:35,450 and based on our view 227 00:07:35,910 --> 00:07:37,860 queries, view query Q. 228 00:07:38,850 --> 00:07:40,700 That view is defined over 229 00:07:41,120 --> 00:07:42,440 down here our set of 230 00:07:42,640 --> 00:07:43,730 relations that could be 231 00:07:44,040 --> 00:07:45,280 based table could be other views. 232 00:07:46,190 --> 00:07:47,530 Now the only difference with virtual 233 00:07:47,820 --> 00:07:49,730 views is that based on the view query. 234 00:07:50,210 --> 00:07:51,100 In this case V is actually 235 00:07:51,540 --> 00:07:52,940 stored in the database, so 236 00:07:53,160 --> 00:07:57,090 it's there relations are also in the database of course. 237 00:07:57,980 --> 00:07:59,290 Now the user comes along 238 00:07:59,840 --> 00:08:01,480 and the user says I'd 239 00:08:01,810 --> 00:08:03,180 like to perform a modification 240 00:08:03,630 --> 00:08:06,480 command on V could be an insert, delete, or update. 241 00:08:07,500 --> 00:08:09,540 And as a result, we 242 00:08:09,670 --> 00:08:10,920 can actually run that modification, 243 00:08:11,310 --> 00:08:12,550 since V is stored, so we 244 00:08:12,690 --> 00:08:14,290 get some new version of 245 00:08:14,390 --> 00:08:16,500 V, prime. Now what 246 00:08:16,700 --> 00:08:17,820 the system has to do, 247 00:08:18,280 --> 00:08:20,150 if it can, is perform modifications 248 00:08:20,920 --> 00:08:22,530 down here on the base 249 00:08:22,910 --> 00:08:24,170 tables, and that would 250 00:08:24,340 --> 00:08:26,100 be producing then R1 prime 251 00:08:27,150 --> 00:08:28,300 through RN prime. 252 00:08:29,280 --> 00:08:30,640 And what we want is 253 00:08:30,880 --> 00:08:32,910 these modifications down here to 254 00:08:33,060 --> 00:08:34,650 be such that the view 255 00:08:35,040 --> 00:08:37,000 query when executed on 256 00:08:37,860 --> 00:08:39,200 the, our primes down here 257 00:08:39,850 --> 00:08:41,140 would produce also v-prime. 258 00:08:42,210 --> 00:08:43,530 It's probably better to make 259 00:08:43,680 --> 00:08:45,230 that arrow upwards instead of downwards. 260 00:08:45,660 --> 00:08:47,020 In any case I hope 261 00:08:47,310 --> 00:08:48,470 that you get the idea 262 00:08:48,940 --> 00:08:50,110 that we still need to stay 263 00:08:50,260 --> 00:08:52,090 in sync, and the translation of 264 00:08:52,180 --> 00:08:54,860 these modifications here, as we 265 00:08:54,990 --> 00:08:56,070 saw in virtual views, have 266 00:08:56,300 --> 00:08:59,330 various issues, sometimes there's no good, meaningful translation. 267 00:09:00,190 --> 00:09:01,150 Sometimes there are many translations, 268 00:09:01,860 --> 00:09:03,390 and it's hard to know which one is the right one. 269 00:09:04,080 --> 00:09:06,010 So, again, the exact same issues arise. 270 00:09:06,370 --> 00:09:09,250 We're not going to talk about these issues at length in this video. 271 00:09:10,100 --> 00:09:11,320 I do want to mention actually, 272 00:09:11,970 --> 00:09:13,940 that more often with materialized 273 00:09:14,700 --> 00:09:16,060 views then with virtual views 274 00:09:16,660 --> 00:09:17,670 sometime people just say "I'm 275 00:09:17,780 --> 00:09:19,480 not going to allow the view to be updated. 276 00:09:20,310 --> 00:09:21,740 Materialized views are often used 277 00:09:22,220 --> 00:09:24,180 specifically for performance on 278 00:09:24,400 --> 00:09:25,490 queries, and so users 279 00:09:25,820 --> 00:09:26,920 will be allowed to query 280 00:09:27,120 --> 00:09:28,170 the view, but will not 281 00:09:28,510 --> 00:09:29,750 be allowed to modify the view. 282 00:09:31,950 --> 00:09:33,020 Now the next topic I want to 283 00:09:33,090 --> 00:09:34,470 address is how a 284 00:09:34,900 --> 00:09:38,030 database designer picks which materialized views to create. 285 00:09:38,850 --> 00:09:40,710 So for virtual views were 286 00:09:40,800 --> 00:09:42,280 mostly used as extra layer 287 00:09:42,590 --> 00:09:45,200 of abstraction based on modular 288 00:09:45,600 --> 00:09:47,300 access to the database or authorization 289 00:09:47,830 --> 00:09:49,230 concerns but as I mentioned 290 00:09:49,560 --> 00:09:50,530 a couple of times already, materialized 291 00:09:51,110 --> 00:09:52,860 views are also used for 292 00:09:53,000 --> 00:09:55,130 increased performance and that 293 00:09:55,410 --> 00:09:58,020 makes the process of picking which ones to create fairly interesting. 294 00:09:59,760 --> 00:10:00,810 So if we think about the 295 00:10:01,010 --> 00:10:02,180 benefits of a materialized view 296 00:10:02,510 --> 00:10:04,110 from an efficiency standpoint, a 297 00:10:04,490 --> 00:10:05,890 number of factors play 298 00:10:06,200 --> 00:10:07,630 into whether a materialized view 299 00:10:07,970 --> 00:10:09,860 is going to give us increased performance, better efficiency. 300 00:10:10,750 --> 00:10:11,770 One of them is just the overall 301 00:10:12,240 --> 00:10:13,970 size of the database, one 302 00:10:14,380 --> 00:10:15,620 is the complexity of the view. 303 00:10:16,010 --> 00:10:17,100 If we have the view, 304 00:10:17,350 --> 00:10:18,570 we don't have to re-execute 305 00:10:18,850 --> 00:10:19,940 the query So, if it's 306 00:10:20,040 --> 00:10:21,390 a complex query, it might 307 00:10:21,590 --> 00:10:23,910 be helpful not to be re-executing it over and over. 308 00:10:24,320 --> 00:10:25,580 Then there's the question of 309 00:10:25,670 --> 00:10:26,780 how many queries are going 310 00:10:26,870 --> 00:10:27,950 to be issued on the database 311 00:10:29,110 --> 00:10:30,470 that use the view, if 312 00:10:30,730 --> 00:10:31,510 we're going to query the view 313 00:10:31,640 --> 00:10:32,480 only once or twice, it's probably 314 00:10:32,760 --> 00:10:34,320 not worth storing it and keeping it up to date. 315 00:10:34,620 --> 00:10:36,470 The other question, then 316 00:10:36,830 --> 00:10:38,080 is how many or 317 00:10:38,170 --> 00:10:39,290 how often there are going 318 00:10:39,420 --> 00:10:41,160 to be modifications to the 319 00:10:41,400 --> 00:10:42,980 base data that affect the 320 00:10:43,050 --> 00:10:44,940 view, because whenever we 321 00:10:45,070 --> 00:10:46,380 modify the base data but 322 00:10:46,590 --> 00:10:47,790 this affecting of the 323 00:10:47,900 --> 00:10:48,560 view means we have to do 324 00:10:48,820 --> 00:10:50,480 extra work to keep the view up-to-date. 325 00:10:51,500 --> 00:10:54,000 I also alluded to this notion of incremental maintenance. 326 00:10:54,950 --> 00:10:56,300 Incremental maintenance says that 327 00:10:56,470 --> 00:10:57,880 we can take modifications to the 328 00:10:57,950 --> 00:10:59,520 base data and propagate them 329 00:10:59,690 --> 00:11:01,620 into the view without fully recomputing the view. 330 00:11:02,320 --> 00:11:04,960 Full recomputation can be a very expensive process. 331 00:11:05,850 --> 00:11:07,650 So, if we have a workload where 332 00:11:07,940 --> 00:11:09,440 we occasionally could use the 333 00:11:09,520 --> 00:11:10,790 view for queries but we're 334 00:11:10,990 --> 00:11:12,230 constantly updating the database 335 00:11:12,660 --> 00:11:13,750 and having to do full recomputation, 336 00:11:14,880 --> 00:11:15,990 clearly it's not going to 337 00:11:16,040 --> 00:11:17,970 be worthwhile to create the materialized view. 338 00:11:18,980 --> 00:11:20,530 Overall, if we think 339 00:11:20,720 --> 00:11:21,710 about the trade offs we're looking 340 00:11:22,130 --> 00:11:23,350 at here, at a high level it's 341 00:11:23,650 --> 00:11:25,040 what's known as a query update 342 00:11:26,040 --> 00:11:27,320 trade off, this actually occurs 343 00:11:27,640 --> 00:11:30,650 in various places in database design and applications. 344 00:11:31,680 --> 00:11:33,060 So how often are we 345 00:11:33,130 --> 00:11:34,350 going to query the database where 346 00:11:34,450 --> 00:11:35,870 we get increased performance on 347 00:11:36,010 --> 00:11:37,460 our queries versus how often 348 00:11:37,730 --> 00:11:38,630 we're gonna get to update the 349 00:11:38,710 --> 00:11:41,030 database where updates are gonna cost us in performance. 350 00:11:42,270 --> 00:11:43,770 So the idea is then to 351 00:11:44,090 --> 00:11:45,520 analyze the workload over the 352 00:11:45,600 --> 00:11:46,970 database also based on 353 00:11:47,090 --> 00:11:47,910 these factors like the size 354 00:11:48,210 --> 00:11:49,330 of the data and the complexity of 355 00:11:49,380 --> 00:11:50,990 the view and decide whether we 356 00:11:51,140 --> 00:11:52,440 are going to get more advantage by 357 00:11:52,620 --> 00:11:54,700 increasing the queries and that's 358 00:11:54,990 --> 00:11:57,640 not offset by the disadvantages of the updates. 359 00:11:58,900 --> 00:12:00,010 By the way does this sound familiar 360 00:12:00,660 --> 00:12:01,780 to you at all, this query 361 00:12:02,110 --> 00:12:04,080 update trade off decision of 362 00:12:04,170 --> 00:12:05,190 whether to make this extra 363 00:12:05,620 --> 00:12:08,740 structure that speeds up queries, but slows down updates? 364 00:12:09,920 --> 00:12:11,010 Probably if you're thinking, 365 00:12:11,550 --> 00:12:13,470 you'll realize that indexes, or 366 00:12:13,720 --> 00:12:15,360 when we talked about them, have 367 00:12:15,490 --> 00:12:17,500 exactly the same trade offs to consider. 368 00:12:17,760 --> 00:12:18,950 When we build an index, are 369 00:12:19,200 --> 00:12:21,440 we going to speed up queries but we are going to slow down updates. 370 00:12:22,200 --> 00:12:23,660 And actually materialize views in 371 00:12:23,770 --> 00:12:26,350 a certain way generalize the concept of indexes. 372 00:12:27,720 --> 00:12:28,720 And in fact that brings us 373 00:12:28,900 --> 00:12:29,970 to our next and 374 00:12:30,080 --> 00:12:31,190 last topic which is the 375 00:12:31,420 --> 00:12:33,270 topic of automatically rewriting user 376 00:12:33,590 --> 00:12:35,510 queries to use materialized views. 377 00:12:36,110 --> 00:12:38,150 And this again is similar to indexes. 378 00:12:38,550 --> 00:12:39,970 So when we build an index in 379 00:12:40,100 --> 00:12:41,420 a database, for a database, 380 00:12:42,000 --> 00:12:43,010 when we run a query we don't 381 00:12:43,200 --> 00:12:45,490 actually see that the query is deciding to use the index. 382 00:12:46,240 --> 00:12:47,140 We build the index and it 383 00:12:47,220 --> 00:12:48,140 will speed up the queries because 384 00:12:48,400 --> 00:12:50,810 the system itself will make that decision to use the index. 385 00:12:51,560 --> 00:12:53,550 Sophisticated database systems, these 386 00:12:53,850 --> 00:12:55,690 days are also starting to 387 00:12:55,770 --> 00:12:56,820 be able to look at what 388 00:12:57,050 --> 00:12:58,470 materialized views are present 389 00:12:59,080 --> 00:13:01,110 in a database and automatically rewrite 390 00:13:01,570 --> 00:13:02,540 queries to use those views 391 00:13:02,980 --> 00:13:03,960 without the user being aware 392 00:13:04,270 --> 00:13:05,360 of that; the same query answer 393 00:13:05,670 --> 00:13:06,660 will be given, it will be 394 00:13:06,780 --> 00:13:08,300 given faster based on the 395 00:13:08,420 --> 00:13:09,860 use of an existing materialized view. 396 00:13:11,320 --> 00:13:12,510 So, as a simple example of 397 00:13:12,640 --> 00:13:13,700 that, let's suppose we have 398 00:13:14,080 --> 00:13:15,760 a materialized view with the 399 00:13:16,090 --> 00:13:17,640 student id, college name and 400 00:13:17,810 --> 00:13:18,930 major of students who have 401 00:13:19,010 --> 00:13:20,660 applied to a college in California. 402 00:13:21,410 --> 00:13:22,380 This is similar to but not 403 00:13:22,690 --> 00:13:24,760 identical to the view that we showed earlier. 404 00:13:25,590 --> 00:13:27,900 So this is going to be a stored table always up to date. 405 00:13:28,450 --> 00:13:29,980 And this view is available to 406 00:13:30,070 --> 00:13:32,240 be used by the system if it can speed up a user query. 407 00:13:33,600 --> 00:13:34,470 So a user may come along 408 00:13:34,800 --> 00:13:37,460 with this query down here and what's this query doing? 409 00:13:37,830 --> 00:13:38,950 It's finding students. 410 00:13:39,710 --> 00:13:40,870 This time we're looking at 411 00:13:40,990 --> 00:13:42,910 the ID and the GPA of 412 00:13:43,160 --> 00:13:44,340 students who have applied 413 00:13:44,820 --> 00:13:46,900 to a California college, and 414 00:13:47,590 --> 00:13:48,500 they want to major in C.S. 415 00:13:48,630 --> 00:13:50,620 at that college, and they have a GPA over 3.5. 416 00:13:51,460 --> 00:13:52,940 So this query has 417 00:13:53,280 --> 00:13:55,030 been issued over just the 418 00:13:55,230 --> 00:13:56,540 base tables, but we'll 419 00:13:56,680 --> 00:13:57,850 see how the system might decide 420 00:13:58,410 --> 00:13:59,510 that if it has at 421 00:13:59,870 --> 00:14:01,490 its disposal this materialized view 422 00:14:01,700 --> 00:14:02,760 up here, it could modify 423 00:14:03,350 --> 00:14:04,210 the query to use the materialized 424 00:14:04,610 --> 00:14:05,450 view and it will get better 425 00:14:05,780 --> 00:14:07,160 performance because this materialized 426 00:14:07,600 --> 00:14:08,690 view has already done some 427 00:14:08,860 --> 00:14:10,260 of the work that would be 428 00:14:10,400 --> 00:14:12,770 done if we executed the query down here from scratch. 429 00:14:14,080 --> 00:14:15,030 So here's what the system can 430 00:14:15,550 --> 00:14:17,760 do in the rewrite, it can take this college relation out altogether. 431 00:14:18,870 --> 00:14:20,010 That reference to college is 432 00:14:20,100 --> 00:14:20,890 gonna be taken care of in 433 00:14:20,960 --> 00:14:22,500 our view and let's change 434 00:14:22,870 --> 00:14:24,170 this apply here to be 435 00:14:24,440 --> 00:14:27,710 the California apply view, instead of the apply relation itself. 436 00:14:29,110 --> 00:14:30,430 With the college table gone we 437 00:14:30,530 --> 00:14:31,950 don't need that first joined 438 00:14:32,280 --> 00:14:33,820 condition anymore and we also 439 00:14:33,940 --> 00:14:34,990 don't need to check that 440 00:14:35,150 --> 00:14:37,850 the college is in California, that's taken care of in our view. 441 00:14:39,230 --> 00:14:40,240 The remainder of the query 442 00:14:40,630 --> 00:14:42,180 with apply, replaced by California 443 00:14:42,810 --> 00:14:43,910 apply, will give us 444 00:14:44,130 --> 00:14:45,680 exactly the same result, and 445 00:14:46,010 --> 00:14:46,820 presumably it will do it 446 00:14:46,860 --> 00:14:48,200 much faster, again because some 447 00:14:48,310 --> 00:14:49,420 of the work in executing 448 00:14:50,000 --> 00:14:51,460 the query and evaluating the 449 00:14:51,650 --> 00:14:52,810 conditions has already been done 450 00:14:53,520 --> 00:14:54,460 when the view was created. 451 00:14:55,460 --> 00:14:56,650 So you can imagine, actually, 452 00:14:57,010 --> 00:14:58,840 a very complicated and interesting 453 00:14:59,200 --> 00:15:00,810 problem for the database system itself. 454 00:15:01,690 --> 00:15:02,920 It has lots of materialized 455 00:15:03,410 --> 00:15:04,810 views, let say stored in 456 00:15:04,900 --> 00:15:06,520 the database V1, V2 all 457 00:15:06,800 --> 00:15:08,020 the way to V, you know, 458 00:15:08,500 --> 00:15:10,040 1,000, and along comes a 459 00:15:10,170 --> 00:15:11,630 user query Q, might be 460 00:15:11,730 --> 00:15:13,110 a complicated query, and the 461 00:15:13,340 --> 00:15:14,840 system wants to determine whether 462 00:15:15,070 --> 00:15:16,160 any of these views could 463 00:15:16,330 --> 00:15:18,450 be used to help Q have better performance. 464 00:15:20,310 --> 00:15:23,080 And sometimes that performance improvement can be really, really significant. 465 00:15:24,110 --> 00:15:26,430 Again depending on the complexity of the view, the size of the database. 466 00:15:27,370 --> 00:15:28,480 The converse problem is a 467 00:15:28,680 --> 00:15:29,850 problem of figuring out which views 468 00:15:30,150 --> 00:15:31,130 we want to design to help 469 00:15:31,430 --> 00:15:33,030 our queries and again that's 470 00:15:33,300 --> 00:15:34,650 a very interesting problem as well. 471 00:15:35,310 --> 00:15:37,110 Unfortunately, many times that 472 00:15:37,360 --> 00:15:38,830 problem is left to the 473 00:15:38,980 --> 00:15:40,690 human doing database design although 474 00:15:40,980 --> 00:15:42,070 there are some tools being developed 475 00:15:42,570 --> 00:15:43,460 right now as we speak 476 00:15:44,020 --> 00:15:45,620 to help users with that design problem. 477 00:15:46,890 --> 00:15:48,120 So in summary, materialized views 478 00:15:48,480 --> 00:15:50,320 provide the same advantages as 479 00:15:50,530 --> 00:15:51,830 virtual views in terms of 480 00:15:52,080 --> 00:15:55,160 their use for authorization, for modularity of applications. 481 00:15:56,380 --> 00:15:57,910 They have the additional feature that 482 00:15:58,070 --> 00:16:00,150 they improve query performance as 483 00:16:00,240 --> 00:16:01,610 long as the workload is appropriate 484 00:16:02,020 --> 00:16:03,210 and doesn't impose too much 485 00:16:03,320 --> 00:16:05,970 of a burden when the underlying base data is modified. 486 00:16:07,090 --> 00:16:08,330 Designing the right virtual 487 00:16:08,760 --> 00:16:10,060 views for an application is a 488 00:16:10,330 --> 00:16:11,760 challenging process, it's also challenging 489 00:16:12,350 --> 00:16:13,560 for systems to use the 490 00:16:13,620 --> 00:16:14,710 views properly, but when they 491 00:16:14,860 --> 00:16:16,070 do there can be really 492 00:16:16,440 --> 00:16:17,590 dramatic performance improvements.