Transaction Processing Concepts
•
In a multiuser system, transactions are interleaved
• Some sort of concurrency control is needed to
prevent transactions from interfering with each other
• Recovery is
needed for transactions that partially execute, then fail
• The system log is used in recovery
• There are two types of schedules used in
recovery
–
Based on
recoverability
–
Based on
serializability
•
Transaction
support in SQL
Review of recovery using logs
Undo
l
The log contains a record of every write operation that
changes the value of some database item,
l
So it is
possible to undo the effect of these write operations of a transaction
l
This is
done by tracing backward through the log and resetting all items changed by a
write operation to their old values.
Redo
l
We can
also redo the effect of the write operations of a transaction
l
This is done by tracing
forward through the log and setting all items changed by a write operation
(that did not get done permanently) to their new values.
Schedules and Recoverability
• What is a schedule?
–
When tractions
are executing concurrently in an interleaved way,
– then the order of execution of operations
from the different transactions is know as a schedule
• The order of operations in each transaction must
remain the same as if they were not interleaved
•
For the purpose
of recovery and concurrency control we are mainly interested in read, write,
commit and abort
A example of a transaction schedule
• This is the order of execution of operations from
various transactions when transactions are executing concurrently.
• shorthand - r or read_item, w
for write_item, c for commit, a for
abort
• write the schedule for figures the Lost Update and the
Temporary Update problems used in the last lecture.
• Lost Update: r1(X); r2(X); w1(X); r1(Y); w2(X); c2;
w1(Y); c1;
• Temporary Update: r1(X); w1(X); r2(X); w2(X); c2;
r1(Y); a1;
Two operations conflict if:
• They belong to different transactions, and
– they access the same item X, and
– one of the operations is a write(X)
• Which operations of schedules we wrote conflict? where?
• Which operations do not conflict?
Schedules based on Recoverability
•
Recoverable – Once a transaction T is committed, it is never be
necessary to roll it back.
–
No transaction T
in schedule S commits until all transactions T' that have written an item that
T reads have committed.
• Consider this schedule
– S: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1;
– Is it recoverable?
•
No
– How must the order of operations be changed to make it
recoverable?
Avoiding cascading rollbacks
•
Cascading
rollbacks
– This is when an uncommitted transaction has to
be rolled back because it read an item from a transaction that failed.
– Consider this schedule that illustrates cascading
rollbacks:
•
S: r1(x); w1(X); r2(X); r1(Y); w2(X) w1(Y); a1;
Strict Schedule
• A transaction can neither read nor write an item X
until the last transaction that wrote X has committed (or aborted)
• This simplifies recovery to restoring the before
image of X.
Summary of schedules based on recoverability
• Three types
– Recoverable
- Once a transaction T is committed, it is never be necessary to roll it back.
–
Avoidance of
cascading rollbacks
• Cascading
rollbacks occur when an uncommitted transaction has to be rolled back
because it read an item from a transaction that failed.
– Strict - A
transaction can neither read nor write an item X until the last transaction
that wrote X has committed (or aborted)
Schedules based on serializability
• Serial
schedules:
– Entire transactions are performed in serial order with
no interleaving
– Every serial schedule is assumed to be correct.
•
What is the problem with serial schedules?
• Serializability
of schedules:
– Identifying which schedules are correct when the
serial schedules are interleaved.
• Example: two airline schedules and different ways they
may be interleaved
Serializability of a schedule
• We would like to determine which of the nonserial
schedules always give a correct result
• A schedule is serializable if it is equivalent to
some serial schedule of the same n transactions
• Saying that a nonserial schedule is serializable is
equivalent to saying that it is correct.
• Serializability is hard to check.
–
Interleaving of
operations occurs in an operating system through some scheduler
–
Difficult
to determine beforehand how the operations in a schedule will be interleaved.
Which schedules are equivalent?
• There are several different ways of defining equivalence of schedules.
– Result equivalent – produce the same final state of the database
– Conflict equivalent – the order of any two conflicting operations is the same in both
schedules
– View equivalence – three conditions (to be discussed later)
Result equivalent
• Two schedules are result equivalent if they produce the same final state of the database
• This is the simplest, but least satisfactory way of
defining equivalence of schedules
• Different schedules may accidentally produce the same
final state.
Conflict equivalence
• Two schedules are conflict equivalent if the order of any two conflicting operations is the
same in both schedules
• This is the more commonly used definition of
equivalence
• A schedule is conflict serializable if it is
equivalent to some serial schedule
– We can reorder the nonconflicting
operations if necessary.
Serial and
serializable
• A serial schedule is inefficient because no
interleaving is allowed
–
This leads to low
CPU utilization, and slowing down processes in general
• A serializable schedule gives the benefits of
concurrent execution without giving up correctness
• But the interleaving of transactions is determined by
the operating system
• If the results of interleaving are tested after the OS
has interleaved, then either kept or thrown out, a lot of time may be wasted.
Uses of serializability
• Most DBMS do not test to see if a schedule is
serializable.
• Instead they design protocols (sets of rules) will
ensure serializability of all schedules.
• The most common techniques is called two-phrase
locking
– This locks data items to prevent concurrent
transactions from interfering with one another.
View equivalence
• Simplified version of view equivalence
– As long as each read operation of a transaction reads
the result of the same write transaction in both schedules, they must produce
the same results.
• It has been shown that any conflict-serializable
schedule is also view serializable, but not visa versa
• It has also been shown to be an NP-hard problem to
write an algorithm to test whether a schedule is view serializable.
Transaction support in SQL
• A single SQL statement is always considered to be atomic.
–
Either the
statement completes execution without error or it fails and leaves the database
unchanged.
• Begin and end transaction
–
SQL has no begin transaction statement.
–
A transaction is
initiated automatically with the first SQL statement in a session.
–
Transaction
initiation is done implicitly when particular SQL statements are encountered.
–
Transactions can
be terminated withn either COMMIT or ROLLBACK
–
The next SQL
statement automatically begins a new transaction
Setting
transactions
• Every transaction has certain characteristics
associated with it.
• These are specified by a SET TRANSACTION statement in
SQL
• Example:
–
SET
TRANSACTION READ ONLY
ISOLATION LEVEL SERIALIZABLE
DIAGOSTICS SIZE 6;
Characteristics
of transactions
•
Access mode
– This can be either READ ONLY or READ/WRITE
– The default is READ/WRITE
•
Isolation
level
– This can be READ UNCOMMITTED, READ COMMITTED,
REPEATABLE READ, or SERIALIZABLE
–
The default is
serializable
–
A lower level
than serializable may improve performance, but allow nonserialized
interleaving.
•
Diagnostic
area size
–
This determines
the number of exception conditions that can be described at one time in the
diagnostics area
• caused by the last executed SQL statement
• The default mode of execution is READ/WRITE, but it
can be changed to READ ONLY