this video covers the topic of materialized views.
As a reminder the reason that
we use views in database systems
is to hide data from users
to make some queries easier or more natural to express.
And to modularize our access to the database.
And real applications do tend
to use lots, and lots, and lots of views.
So those views are for virtual views.
Virtual views are what we've
been talking about in our previous
videos, I'm not actually sure I used that terminology.
A virtual view is the usual type of view
where we define it as a query of the database.
We don't actually create a table for the view.
Queries and modifications are rewritten based on the view definition.
Now there's also a notion
of a materialized view obviously for
this video and materialized views
give us the same advantages of
virtual views but one additional
advantage which is perhaps the
most important one, which is
to improve query performance over the database.
So again as a quick reminder
about virtual views, we
define a view "V" say,
by giving a query to specify
the view over some relations or even other views.
The schema of the view is the schema of the result of the query.
When we have a query
queue, so this is a user
query, that references the
view V then conceptually, not
actually, we can imagine
that there is a table called V.
We run the view query over the current state of the relations.
We put the result in V.
And then we can evaluate the user
query queue which refers to
V. Now in reality what
happens, as we already mentioned,
is that the user query queue
is rewritten based on the
view definition, to just use the base tables.
Now let's talk about what happens to materialized views.
Again, exactly the same, we define a view.
We give it a name, say V.
And we define it as a query
over a set of table
or other views, then the
system actually creates a physical
table V with the schema of the query result.
Next, the view query is
executed over the current
state of the database and the
results are put physically in
that table V.  Now queries
can refer to V as
if its a table table because it actually is a table stored in a database.
This all sounds great; of course, there are some down sides.
The first down side is that V could be very large.
When we talked about virtual views.
We showed some examples where we
could create a view that
was just enormous - much
larger than could ever be
stored in the database but because
the view was only a logical concept, it wasn't a problem.
When users ran queries over the
view, they'd typically have selection conditions
so you'd never be materializing
that very large view.
In materialized views, obviously you're
creating the view and so
it is a problem if the view is extremely large.
So, that's one of the downsides.
The other downside is that we need to worry if the view is stored.
What happens when we
have modifications to those tables,
over which V is defined.
We need to actually modify
the stored table V, either makes
changes to it based on
the changes to the base tables
or completely recompute the view.
Let's move now to an example.
And we'll use our usual sample
database shown here at the bottom of the slide.
Let's create a materialized view.
We'll give it the name CACS.
It's for CS applicants to
California colleges, so this
is a three way join
over all of our relations and it's
going to select the college name
and student name when the
student has applied to the
college, the college is in
California, and the student is applying to major in CS.
So once this command
is issued the system will actually
create a table called CACS
and now the good news
we can CACS in any query
we want as if it's a table, because it is.
Now the down side is that
the base data over which
the view is defined is modified
we have to worry that our
view is invalid that it's
become out of sync with with the base data.
So let's think about what modifications
could occur to the database
that would cause the view to
become invalid.
Well we have to worry about the three
relations that are referenced in
the view, that is the college relation, the student relation, and the apply relation, and for the college relation, well inserts
could change the results of the view.
We could have a new college.
It seems unlikely, we could
have a new college that the student
will have already applied to in
California, for C.S. certainly
deletes can affect the view,
and then updates to any
of the attributes that are mentioned in
the view, and for the
college, that would be the
college name and the state.
For the student table, again inserts
to student could affect the
view if we already have an
applied tuple and a college tuple that it matches.
Deletes would certainly affect
the view, and again, updates,
and in this case, the attributes
that are referenced from the student
table are the student
name and the student ID.
And finally apply, again, that
[xx] is the most likely one
that would have modification that would
affect the view, inserts, deletes
and again updates and here
the set of attributes that are
relevant are the the
college name, the student
id, and the major.
Now, if there is certain constraints on
the database, referential integrity constraints for example, it
might be that some of these
operations couldn't affect the view.
For example, we might not
be able to insert a college
where there's already an application for
that college or insert a student likewise.
We might not be able to
delete a college if there's
applications referencing it.
So if there are
additional constraints that the system
is aware of, it might be able to eliminate some of these modifications.
But regardless, many modifications will
have to be monitored to make
sure that the view is modified
to stay in sync with the base data.
By the way, if this
feels a little bit familiar
to you, when we talked about
general assertions over the
database, that was one of
the types of constraints that we could specify.
We went through a similar
exercise where if assertions were
defined as clearly as over
the database and we looked at
what operations could occur,
what modifications to the database
needed to be monitored to see
if an assertion might be invalidated.
Really an assertion can almost
be thought of as a materialized view over the database.
And if you look
back at that video, I think
you'll see there really is a correspondence between those two concepts.
So, just to reiterate the
system [xx materialized views
stored, must monitor all
the modifications that might invalidate the view.
When there is a modification, either
the view can be completely recomputed or
sometimes there's clever algorithms called
incremental maintenance algorithms that
can just make small modifications to
the view based on the
modifications that were made to the base data.
So we've talked about queries over materialized views.
Very simple because the views actually stored in the database.
Now what about modifications on materialized views?
Well there's good news and bad news.
The good news is that since
the table is stored if a
user issues a modification command
and insert, delete, or update command,
the system can just perform that command directly on the table.
The bad news is the
base table still need to stay in sync with the view.
So really the exact same
issues that we talked about with
virtual views about a
modification that the user wishes
to execute on the view
being propagated to the base
tables, occur here the
only difference is that we're
actually modifying the view
as well as modifying the base
tables so I'm going
to draw that same square diagram
that we saw for virtual views
to explain again the
issue with modifications on views,
so we have our view V
and based on our view
queries, view query Q.
That view is defined over
down here our set of
relations that could be
based table could be other views.
Now the only difference with virtual
views is that based on the view query.
In this case V is actually
stored in the database, so
it's there relations are also in the database of course.
Now the user comes along
and the user says I'd
like to perform a modification
command on V could be an insert, delete, or update.
And as a result, we
can actually run that modification,
since V is stored, so we
get some new version of
V, prime. Now what
the system has to do,
if it can, is perform modifications
down here on the base
tables, and that would
be producing then R1 prime
through RN prime.
And what we want is
these modifications down here to
be such that the view
query when executed on
the, our primes down here
would produce also v-prime.
It's probably better to make
that arrow upwards instead of downwards.
In any case I hope
that you get the idea
that we still need to stay
in sync, and the translation of
these modifications here, as we
saw in virtual views, have
various issues, sometimes there's no good, meaningful translation.
Sometimes there are many translations,
and it's hard to know which one is the right one.
So, again, the exact same issues arise.
We're not going to talk about these issues at length in this video.
I do want to mention actually,
that more often with materialized
views then with virtual views
sometime people just say "I'm
not going to allow the view to be updated.
Materialized views are often used
specifically for performance on
queries, and so users
will be allowed to query
the view, but will not
be allowed to modify the view.
Now the next topic I want to
address is how a
database designer picks which materialized views to create.
So for virtual views were
mostly used as extra layer
of abstraction based on modular
access to the database or authorization
concerns but as I mentioned
a couple of times already, materialized
views are also used for
increased performance and that
makes the process of picking which ones to create fairly interesting.
So if we think about the
benefits of a materialized view
from an efficiency standpoint, a
number of factors play
into whether a materialized view
is going to give us increased performance, better efficiency.
One of them is just the overall
size of the database, one
is the complexity of the view.
If we have the view,
we don't have to re-execute
the query So, if it's
a complex query, it might
be helpful not to be re-executing it over and over.
Then there's the question of
how many queries are going
to be issued on the database
that use the view, if
we're going to query the view
only once or twice, it's probably
not worth storing it and keeping it up to date.
The other question, then
is how many or
how often there are going
to be modifications to the
base data that affect the
view, because whenever we
modify the base data but
this affecting of the
view means we have to do
extra work to keep the view up-to-date.
I also alluded to this notion of incremental maintenance.
Incremental maintenance says that
we can take modifications to the
base data and propagate them
into the view without fully recomputing the view.
Full recomputation can be a very expensive process.
So, if we have a workload where
we occasionally could use the
view for queries but we're
constantly updating the database
and having to do full recomputation,
clearly it's not going to
be worthwhile to create the materialized view.
Overall, if we think
about the trade offs we're looking
at here, at a high level it's
what's known as a query update
trade off, this actually occurs
in various places in database design and applications.
So how often are we
going to query the database where
we get increased performance on
our queries versus how often
we're gonna get to update the
database where updates are gonna cost us in performance.
So the idea is then to
analyze the workload over the
database also based on
these factors like the size
of the data and the complexity of
the view and decide whether we
are going to get more advantage by
increasing the queries and that's
not offset by the disadvantages of the updates.
By the way does this sound familiar
to you at all, this query
update trade off decision of
whether to make this extra
structure that speeds up queries, but slows down updates?
Probably if you're thinking,
you'll realize that indexes, or
when we talked about them, have
exactly the same trade offs to consider.
When we build an index, are
we going to speed up queries but we are going to slow down updates.
And actually materialize views in
a certain way generalize the concept of indexes.
And in fact that brings us
to our next and
last topic which is the
topic of automatically rewriting user
queries to use materialized views.
And this again is similar to indexes.
So when we build an index in
a database, for a database,
when we run a query we don't
actually see that the query is deciding to use the index.
We build the index and it
will speed up the queries because
the system itself will make that decision to use the index.
Sophisticated database systems, these
days are also starting to
be able to look at what
materialized views are present
in a database and automatically rewrite
queries to use those views
without the user being aware
of that; the same query answer
will be given, it will be
given faster based on the
use of an existing materialized view.
So, as a simple example of
that, let's suppose we have
a materialized view with the
student id, college name and
major of students who have
applied to a college in California.
This is similar to but not
identical to the view that we showed earlier.
So this is going to be a stored table always up to date.
And this view is available to
be used by the system if it can speed up a user query.
So a user may come along
with this query down here and what's this query doing?
It's finding students.
This time we're looking at
the ID and the GPA of
students who have applied
to a California college, and
they want to major in C.S.
at that college, and they have a GPA over 3.5.
So this query has
been issued over just the
base tables, but we'll
see how the system might decide
that if it has at
its disposal this materialized view
up here, it could modify
the query to use the materialized
view and it will get better
performance because this materialized
view has already done some
of the work that would be
done if we executed the query down here from scratch.
So here's what the system can
do in the rewrite, it can take this college relation out altogether.
That reference to college is
gonna be taken care of in
our view and let's change
this apply here to be
the California apply view, instead of the apply relation itself.
With the college table gone we
don't need that first joined
condition anymore and we also
don't need to check that
the college is in California, that's taken care of in our view.
The remainder of the query
with apply, replaced by California
apply, will give us
exactly the same result, and
presumably it will do it
much faster, again because some
of the work in executing
the query and evaluating the
conditions has already been done
when the view was created.
So you can imagine, actually,
a very complicated and interesting
problem for the database system itself.
It has lots of materialized
views, let say stored in
the database V1, V2 all
the way to V, you know,
1,000, and along comes a
user query Q, might be
a complicated query, and the
system wants to determine whether
any of these views could
be used to help Q have better performance.
And sometimes that performance improvement can be really, really significant.
Again depending on the complexity of the view, the size of the database.
The converse problem is a
problem of figuring out which views
we want to design to help
our queries and again that's
a very interesting problem as well.
Unfortunately, many times that
problem is left to the
human doing database design although
there are some tools being developed
right now as we speak
to help users with that design problem.
So in summary, materialized views
provide the same advantages as
virtual views in terms of
their use for authorization, for modularity of applications.
They have the additional feature that
they improve query performance as
long as the workload is appropriate
and doesn't impose too much
of a burden when the underlying base data is modified.
Designing the right virtual
views for an application is a
challenging process, it's also challenging
for systems to use the
views properly, but when they
do there can be really
dramatic performance improvements.