this video covers the topic of materialized views. As a reminder the reason that we use views in database systems is to hide data from users to make some queries easier or more natural to express. And to modularize our access to the database. And real applications do tend to use lots, and lots, and lots of views. So those views are for virtual views. Virtual views are what we've been talking about in our previous videos, I'm not actually sure I used that terminology. A virtual view is the usual type of view where we define it as a query of the database. We don't actually create a table for the view. Queries and modifications are rewritten based on the view definition. Now there's also a notion of a materialized view obviously for this video and materialized views give us the same advantages of virtual views but one additional advantage which is perhaps the most important one, which is to improve query performance over the database. So again as a quick reminder about virtual views, we define a view "V" say, by giving a query to specify the view over some relations or even other views. The schema of the view is the schema of the result of the query. When we have a query queue, so this is a user query, that references the view V then conceptually, not actually, we can imagine that there is a table called V. We run the view query over the current state of the relations. We put the result in V. And then we can evaluate the user query queue which refers to V. Now in reality what happens, as we already mentioned, is that the user query queue is rewritten based on the view definition, to just use the base tables. Now let's talk about what happens to materialized views. Again, exactly the same, we define a view. We give it a name, say V. And we define it as a query over a set of table or other views, then the system actually creates a physical table V with the schema of the query result. Next, the view query is executed over the current state of the database and the results are put physically in that table V. Now queries can refer to V as if its a table table because it actually is a table stored in a database. This all sounds great; of course, there are some down sides. The first down side is that V could be very large. When we talked about virtual views. We showed some examples where we could create a view that was just enormous - much larger than could ever be stored in the database but because the view was only a logical concept, it wasn't a problem. When users ran queries over the view, they'd typically have selection conditions so you'd never be materializing that very large view. In materialized views, obviously you're creating the view and so it is a problem if the view is extremely large. So, that's one of the downsides. The other downside is that we need to worry if the view is stored. What happens when we have modifications to those tables, over which V is defined. We need to actually modify the stored table V, either makes changes to it based on the changes to the base tables or completely recompute the view. Let's move now to an example. And we'll use our usual sample database shown here at the bottom of the slide. Let's create a materialized view. We'll give it the name CACS. It's for CS applicants to California colleges, so this is a three way join over all of our relations and it's going to select the college name and student name when the student has applied to the college, the college is in California, and the student is applying to major in CS. So once this command is issued the system will actually create a table called CACS and now the good news we can CACS in any query we want as if it's a table, because it is. Now the down side is that the base data over which the view is defined is modified we have to worry that our view is invalid that it's become out of sync with with the base data. So let's think about what modifications could occur to the database that would cause the view to become invalid. Well we have to worry about the three relations that are referenced in the view, that is the college relation, the student relation, and the apply relation, and for the college relation, well inserts could change the results of the view. We could have a new college. It seems unlikely, we could have a new college that the student will have already applied to in California, for C.S. certainly deletes can affect the view, and then updates to any of the attributes that are mentioned in the view, and for the college, that would be the college name and the state. For the student table, again inserts to student could affect the view if we already have an applied tuple and a college tuple that it matches. Deletes would certainly affect the view, and again, updates, and in this case, the attributes that are referenced from the student table are the student name and the student ID. And finally apply, again, that [xx] is the most likely one that would have modification that would affect the view, inserts, deletes and again updates and here the set of attributes that are relevant are the the college name, the student id, and the major. Now, if there is certain constraints on the database, referential integrity constraints for example, it might be that some of these operations couldn't affect the view. For example, we might not be able to insert a college where there's already an application for that college or insert a student likewise. We might not be able to delete a college if there's applications referencing it. So if there are additional constraints that the system is aware of, it might be able to eliminate some of these modifications. But regardless, many modifications will have to be monitored to make sure that the view is modified to stay in sync with the base data. By the way, if this feels a little bit familiar to you, when we talked about general assertions over the database, that was one of the types of constraints that we could specify. We went through a similar exercise where if assertions were defined as clearly as over the database and we looked at what operations could occur, what modifications to the database needed to be monitored to see if an assertion might be invalidated. Really an assertion can almost be thought of as a materialized view over the database. And if you look back at that video, I think you'll see there really is a correspondence between those two concepts. So, just to reiterate the system [xx materialized views stored, must monitor all the modifications that might invalidate the view. When there is a modification, either the view can be completely recomputed or sometimes there's clever algorithms called incremental maintenance algorithms that can just make small modifications to the view based on the modifications that were made to the base data. So we've talked about queries over materialized views. Very simple because the views actually stored in the database. Now what about modifications on materialized views? Well there's good news and bad news. The good news is that since the table is stored if a user issues a modification command and insert, delete, or update command, the system can just perform that command directly on the table. The bad news is the base table still need to stay in sync with the view. So really the exact same issues that we talked about with virtual views about a modification that the user wishes to execute on the view being propagated to the base tables, occur here the only difference is that we're actually modifying the view as well as modifying the base tables so I'm going to draw that same square diagram that we saw for virtual views to explain again the issue with modifications on views, so we have our view V and based on our view queries, view query Q. That view is defined over down here our set of relations that could be based table could be other views. Now the only difference with virtual views is that based on the view query. In this case V is actually stored in the database, so it's there relations are also in the database of course. Now the user comes along and the user says I'd like to perform a modification command on V could be an insert, delete, or update. And as a result, we can actually run that modification, since V is stored, so we get some new version of V, prime. Now what the system has to do, if it can, is perform modifications down here on the base tables, and that would be producing then R1 prime through RN prime. And what we want is these modifications down here to be such that the view query when executed on the, our primes down here would produce also v-prime. It's probably better to make that arrow upwards instead of downwards. In any case I hope that you get the idea that we still need to stay in sync, and the translation of these modifications here, as we saw in virtual views, have various issues, sometimes there's no good, meaningful translation. Sometimes there are many translations, and it's hard to know which one is the right one. So, again, the exact same issues arise. We're not going to talk about these issues at length in this video. I do want to mention actually, that more often with materialized views then with virtual views sometime people just say "I'm not going to allow the view to be updated. Materialized views are often used specifically for performance on queries, and so users will be allowed to query the view, but will not be allowed to modify the view. Now the next topic I want to address is how a database designer picks which materialized views to create. So for virtual views were mostly used as extra layer of abstraction based on modular access to the database or authorization concerns but as I mentioned a couple of times already, materialized views are also used for increased performance and that makes the process of picking which ones to create fairly interesting. So if we think about the benefits of a materialized view from an efficiency standpoint, a number of factors play into whether a materialized view is going to give us increased performance, better efficiency. One of them is just the overall size of the database, one is the complexity of the view. If we have the view, we don't have to re-execute the query So, if it's a complex query, it might be helpful not to be re-executing it over and over. Then there's the question of how many queries are going to be issued on the database that use the view, if we're going to query the view only once or twice, it's probably not worth storing it and keeping it up to date. The other question, then is how many or how often there are going to be modifications to the base data that affect the view, because whenever we modify the base data but this affecting of the view means we have to do extra work to keep the view up-to-date. I also alluded to this notion of incremental maintenance. Incremental maintenance says that we can take modifications to the base data and propagate them into the view without fully recomputing the view. Full recomputation can be a very expensive process. So, if we have a workload where we occasionally could use the view for queries but we're constantly updating the database and having to do full recomputation, clearly it's not going to be worthwhile to create the materialized view. Overall, if we think about the trade offs we're looking at here, at a high level it's what's known as a query update trade off, this actually occurs in various places in database design and applications. So how often are we going to query the database where we get increased performance on our queries versus how often we're gonna get to update the database where updates are gonna cost us in performance. So the idea is then to analyze the workload over the database also based on these factors like the size of the data and the complexity of the view and decide whether we are going to get more advantage by increasing the queries and that's not offset by the disadvantages of the updates. By the way does this sound familiar to you at all, this query update trade off decision of whether to make this extra structure that speeds up queries, but slows down updates? Probably if you're thinking, you'll realize that indexes, or when we talked about them, have exactly the same trade offs to consider. When we build an index, are we going to speed up queries but we are going to slow down updates. And actually materialize views in a certain way generalize the concept of indexes. And in fact that brings us to our next and last topic which is the topic of automatically rewriting user queries to use materialized views. And this again is similar to indexes. So when we build an index in a database, for a database, when we run a query we don't actually see that the query is deciding to use the index. We build the index and it will speed up the queries because the system itself will make that decision to use the index. Sophisticated database systems, these days are also starting to be able to look at what materialized views are present in a database and automatically rewrite queries to use those views without the user being aware of that; the same query answer will be given, it will be given faster based on the use of an existing materialized view. So, as a simple example of that, let's suppose we have a materialized view with the student id, college name and major of students who have applied to a college in California. This is similar to but not identical to the view that we showed earlier. So this is going to be a stored table always up to date. And this view is available to be used by the system if it can speed up a user query. So a user may come along with this query down here and what's this query doing? It's finding students. This time we're looking at the ID and the GPA of students who have applied to a California college, and they want to major in C.S. at that college, and they have a GPA over 3.5. So this query has been issued over just the base tables, but we'll see how the system might decide that if it has at its disposal this materialized view up here, it could modify the query to use the materialized view and it will get better performance because this materialized view has already done some of the work that would be done if we executed the query down here from scratch. So here's what the system can do in the rewrite, it can take this college relation out altogether. That reference to college is gonna be taken care of in our view and let's change this apply here to be the California apply view, instead of the apply relation itself. With the college table gone we don't need that first joined condition anymore and we also don't need to check that the college is in California, that's taken care of in our view. The remainder of the query with apply, replaced by California apply, will give us exactly the same result, and presumably it will do it much faster, again because some of the work in executing the query and evaluating the conditions has already been done when the view was created. So you can imagine, actually, a very complicated and interesting problem for the database system itself. It has lots of materialized views, let say stored in the database V1, V2 all the way to V, you know, 1,000, and along comes a user query Q, might be a complicated query, and the system wants to determine whether any of these views could be used to help Q have better performance. And sometimes that performance improvement can be really, really significant. Again depending on the complexity of the view, the size of the database. The converse problem is a problem of figuring out which views we want to design to help our queries and again that's a very interesting problem as well. Unfortunately, many times that problem is left to the human doing database design although there are some tools being developed right now as we speak to help users with that design problem. So in summary, materialized views provide the same advantages as virtual views in terms of their use for authorization, for modularity of applications. They have the additional feature that they improve query performance as long as the workload is appropriate and doesn't impose too much of a burden when the underlying base data is modified. Designing the right virtual views for an application is a challenging process, it's also challenging for systems to use the views properly, but when they do there can be really dramatic performance improvements.