In this video we'll delve into
more detail about the properties of transactions.
As a reminder, transactions are a
concept that's been introduced as
a solution to both the concurrency
control problem and the system
failure problem in databases.
Specifically a transaction is
a sequence of operations that are treated as a unit.
Transactions appear to run
in isolation, even if many
clients are operating on a database at the same time.
And further more if there is
a system failure in unexpected software,
hardware, or power failure every
transactions changes that were
issued to the database are either
reflected entirely or not at all.
Every database connoisseur knows that
transaction support, what are
known as the ACID properties.
Although not everybody always remembers what
A stands for atomicity;
C stands for consistency; I
stands for isolation; and D stands for for durability.
And we're going to talk about each of these four properties in turn.
We're going to talk about isolation first.
We're going to talk about durability
second,then we'll next talk
about atomicity and we'll conclude talking about consistency.
So here's the deal with isolation.
We'll have a whole bunch of
clients operating on our database,
and we'd kind of like each
client to imagine that they're operating on their own.
So as we discussed in the
previous video, each client
issues to the database system a sequence of transactions.
So this first client might
be issuing first transaction T1,
then T2, T3, and so on.
Over here we have T9,
T10, T11 and as a
reminder, each transaction itself can be a sequence of statements.
So, this might be statement one,
statement two, statement three and
so on, and then those
statements will be treated as a unit.
So the isolation property is implemented
by a very specific formal
notion called serializability.
What serializability says is
that the operations within transactions
may be interleaved across clients,
but the execution of those
operations must be equivalent
to some sequential serial orderOf all the transactions.
So for our example over
here, the system itself may
execute all of the
statements within each transaction
and over here concurrently but
it has to guarantee that the
behavior against the database
is equivalent to some sequence in order again.
So perhaps the equivalent sequential
order will be as if the
system first did transaction
T1, then may T2, T9
and T10, maybe back to T3 and So on.
And again the system has to
guarantee that the state
of the database, at this point
in time, even if its
internally the statements within
any of these transactions looks as if
these transactions executed in order.
Now, you might wonder how the
database system could possibly guarantee
this level of consistency while still inter-leading operation.
It uses protocols that are
based on locking portions of the database.
Now we're not going to describe the
implementation, because implementation aspects are not the focus of this course.
What you need to know from
a user's application perspective is
really the properties that are being guaranteed.
Now with the formal
notion of a let's go
back and look at the examples
from the previous video that motivated
the problems we could get into with concurrent access.
The first one was the
example where two separate clients
were updating Standford's enrollment.
Let's just call one of them T1.
It's not a transaction.
And the other T2.
So when we run
thing is against the system and
serializability is guaranteed, then
we will have a behavior that
is at least equivalent to
either T1 followed by T2,
or T2 followed by T1.
So, in this case, when we
start with our enrollment of 15,000,
either execution will correctly
have a final enrollment of 17,500 solving our concurrency problems.
Here's our second example.
In this case the first client was
modifying the major of
student 123 in the
apply table and the second was modifying the decision.
And we saw that if
we allowed these to run in
an interleaved fashion, it would
be possible that only one of the two changes would be made.
Again, with serializability we're going
to get behavior that guarantees
it is equivalent to either
T1 and then T2, or T2 and then T1.
And in both cases, both
changes will be reflected in
the database which is what we would like.
The next example was the one
where we were looking at the
Apply and the Student table,
and we were modifying the Apply
table based on the
GPA in the Student table, and
simultaneously modifying that GPA.
So again if these are issued
as two transactions, we'll have
either T1 followed by
T2 or T2 followed by T1.
Or at least we will have behavior that is equivalent to that.
Now this case is a
bit interesting because either of
these does result in a consistent state of the database.
In the first case, we'll update
all the decision records before the
GPAs are modified for anyone,
and in the second case, will
update the apply records after the GPAs have been modified.
The interesting thing here
is that the order does
matter in this case.
Now, the database systems only guarantees serializability.
They guarantee that the behavior
will be equivalent to some
sequential order but they
don't guarantee the exact sequential
order if the transactions are being issued at the same time.
So if it's important to get,
say, T1 before T2, that
would actually have to be coded as part of the application.
And our last example was
the case where we had the
Apply table, the Archive table,
and we were moving records from
one table to another in
one of our clients, and the
other client was counting the tuples.
And again, so T1 and T2, they're issued as transactions.
The system guarantees that we'll
either move all the
tuples first and then count
them, or will count the tuples and then move them.
Now, again, here's a case
where the order makes a difference,
but if we care specifically
about the order, that would have to be coded as part as the application.
OK, so we've finished our first of the four ACID properties.
The other three will actually be quite a bit quicker to talk about.
Let's talk now about durability.
And we only need to look at one client to understand what's going on.
So let's say that we
have our client, and the client
has issuing a sequence of transactions to the database.
And each transaction again is a sequence of statements.
And finally at the end of the transaction there is a commit.
So what durability guarantees for us
is that if there is a
system crash after a transaction
commits, then all effects
of that transaction will remain in the database.
So, specifically, if at a
later point in time after
this occurs, there's a failure
for whatever reason, the client
can rest assured that the
effects of this transaction are in
the database, and when the system
comes back up, they will still be in there.
So you may be wondering
how it's possible to make this
guarantee since database systems
move data between disc and
memory and a crash could occur at anytime.
They're actually fairly complicated protocols
that are used and they're based on the concept of logging.
But once again we're not gonna talk about the implementation details.
What's important from the user
or application perspective is the properties that are guaranteed.
2 properties down.
Now let's talk about atomicity.
again we'll only look at
one client who's issuing a sequence of transactions to the database.
And let's look at transaction T2
which itself is a sequence
of statements followed by commit.
The case that atomicity deals
with is the case where there's
actually a crash or a
failure during the execution
of the transaction, before it's been committed.
What the property tells
us is that even in
the presence of system crashes, every
transaction is always executed either all or nothing.
on the database.
So in other words, if we have
each of our transactions running, it's
not possible in a system
crash to, say, have executed
on the database a couple of
statements but not the rest of the transaction.
Now once again, you might be wondering how this is implemented.
It also uses a log-in
mechanism and specifically when
the system recovers from a
crash, there is a
process by which partial effects of
transactions that were underway
at the time of the crash, are undone.
Now applications do need to be somewhat aware of this process.
So when an application submits a
transaction to the database, it's possible
that it will get back an error
because there was in fact a
crash during the execution of
the transaction, and then the system is restarted.
In that case the application
does have the guarantee that none
of the effects of the
transaction were reflected in
the database, but it will need to restart the transaction.
Now let's come back to the
fact that the system will undo
partial effects of a transaction
to guarantee the atomicity property,
that each transaction is executed
in an all or nothing fashion.
So, this concept of undoing
partial Full effects of the
transaction is known as
transaction roll back or transaction abort.
And the reason I'm mentioning it
here is that although it
is the implementation mechanisms for
Atima City, it's also an
operation that's exposed by the
database in an application would like to use it.
Specifically, a transaction rollback
can be initiated by the system,
in the case of an error, or a crash recovery.
But it also can be client initiated.
And let me give a
little example where a client might
write code that takes advantage of the operation.
So here is some toy application code.
In this code, the client begins
a transaction, it asks the
Database user for some input.
It performs some sequel commands.
Maybe some modifications to the
database based on the input from the user.
It confirms that the user likes the results of those modifications.
And if the user says
okay, then the transaction is committed,
and we get an atomic execution of this behavior.
But if the user doesn't say okay,
then the transaction is rolled back
and, automatically, these sequel commands
that were executed are undone, and
that frees the application from writing
the code that undoes those commands
explicitly So it can actually be quite
a useful feature for clients to use.
But clients do need to
be very careful, because this
rollback command only undoes
effects on the data itself in the database.
So if perhaps in this
code, the system was
also modifying some variables
or even worse, say delivering
cash out of an ATM
machine, the rollback command is not going to undo those.
It's not gonna modify variables and
it's certainly not going to
pull that cash back into the ATM.
So, there actually is another issue
with this particular client interaction
that I am going to put a "frownie" face here.
It was a nice, simple example of how rollback can be helpful.
But one thing that happens in
this example is that we
begin a transaction and then we wait for the user to do something.
And we actually wait for the user back here.
So experienced database application
developers will tell you to
never hold open a
transaction and then wait for arbitrary amounts of time.
The reason is that transactions do
use this locking mechanism I
alluded to earlier, so when
a transaction is running, it
may be blocking other portions,
blocking other clients from portions of the database.
If the user happened to
go out for a cup of coffee
or is going to come back in
a week, we certainly don't want
to leave the database locked up for an entire week.
So, again and a general rule
of thumb is that transactions should
be constructed in a fashion
that we know they are going to run to completion fairly quickly.
Finally, let's talk about consistency.
The consistency property talks about
how transactions interact with the
integrity constraints that may hold on a database.
As a reminder and integrity constraint
is a specification of which database states are legal.
Transactions are actually very helpful in the management of constraints.
Specifically, when we have
multiple clients interacting with the
database in an interleaved fashion,
we can have a setup where
each client can assume that
when it begins it operates
on a database that satisfies all integrity constraints.
Then each transaction, then sorry,
each client, must guarantee that
all constraints hold when the
transaction ends and that's
typically guaranteed by the
constraint enforcement sub system.
Now with that guarantee, since we
have serialized ability of transactions.
that guaranteesthat constraints always hold.
Specifically the behavior of
the database is some sequential order of the transactions.
We know and we can
assume at the start of the transaction the constraints hold.
And then we guarantee they hold at the end.
And since the behaviors equivalent to
a sequential order then the
next transaction can assume
the constraints hold and so on.
In conclusion transaction are a very powerful concept.
They give a solution for both
concurrency control and system
failure management and databases.
They provide formally understood properties
of atomicity, consistency, isolation and durability.
In the next video we are going to focus more on the isolation property.
We're going to see that in some
cases we may want to
relax the notion of isolation
while still providing properties that
are sufficient for applications in certain circumstances.