Transaction Processing Concepts

• In a multiuser system, transactions are interleaved

• Some sort of concurrency control is needed to prevent transactions from interfering with each other

• Recovery is needed for transactions that partially execute, then fail

• The system log is used in recovery

• There are two types of schedules used in recovery

– Based on recoverability

– Based on serializability

• Transaction support in SQL

Review of recovery using logs

Undo

l The log contains a record of every write operation that changes the value of some database item,

l So it is possible to undo the effect of these write operations of a transaction

l This is done by tracing backward through the log and resetting all items changed by a write operation to their old values.

Redo

l We can also redo the effect of the write operations of a transaction

l This is done by tracing forward through the log and setting all items changed by a write operation (that did not get done permanently) to their new values.

Schedules and Recoverability

• What is a schedule?

– When tractions are executing concurrently in an interleaved way,

– then the order of execution of operations from the different transactions is know as a schedule

• The order of operations in each transaction must remain the same as if they were not interleaved

• For the purpose of recovery and concurrency control we are mainly interested in read, write, commit and abort

A example of a transaction schedule

• This is the order of execution of operations from various transactions when transactions are executing concurrently.

• shorthand - r or read_item, w for write_item, c for commit, a for abort

• write the schedule for figures the Lost Update and the Temporary Update problems used in the last lecture.

• Lost Update: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1;

• Temporary Update: r1(X); w1(X); r2(X); w2(X); c2; r1(Y); a1;

Two operations conflict if:

• They belong to different transactions, and

– they access the same item X, and

– one of the operations is a write(X)

• Which operations of schedules we wrote conflict? where?

• Which operations do not conflict?

Schedules based on Recoverability

• Recoverable – Once a transaction T is committed, it is never be necessary to roll it back.

– No transaction T in schedule S commits until all transactions T' that have written an item that T reads have committed.

• Consider this schedule

– S: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1;

– Is it recoverable?

• No

– How must the order of operations be changed to make it recoverable?

Avoiding cascading rollbacks

• Cascading rollbacks

– This is when an uncommitted transaction has to be rolled back because it read an item from a transaction that failed.

– Consider this schedule that illustrates cascading rollbacks:

• S: r1(x); w1(X); r2(X); r1(Y); w2(X) w1(Y); a1;

Strict Schedule

• A transaction can neither read nor write an item X until the last transaction that wrote X has committed (or aborted)

• This simplifies recovery to restoring the before image of X.

Summary of schedules based on recoverability

• Three types

– Recoverable - Once a transaction T is committed, it is never be necessary to roll it back.

– Avoidance of cascading rollbacks

• Cascading rollbacks occur when an uncommitted transaction has to be rolled back because it read an item from a transaction that failed.

– Strict - A transaction can neither read nor write an item X until the last transaction that wrote X has committed (or aborted)

Schedules based on serializability

• Serial schedules:

– Entire transactions are performed in serial order with no interleaving

– Every serial schedule is assumed to be correct.

• What is the problem with serial schedules?

• Serializability of schedules:

– Identifying which schedules are correct when the serial schedules are interleaved.

• Example: two airline schedules and different ways they may be interleaved

Serializability of a schedule

• We would like to determine which of the nonserial schedules always give a correct result

• A schedule is serializable if it is equivalent to some serial schedule of the same n transactions

• Saying that a nonserial schedule is serializable is equivalent to saying that it is correct.

• Serializability is hard to check.

– Interleaving of operations occurs in an operating system through some scheduler

– Difficult to determine beforehand how the operations in a schedule will be interleaved.

Which schedules are equivalent?

• There are several different ways of defining equivalence of schedules.

– Result equivalent – produce the same final state of the database

– Conflict equivalent – the order of any two conflicting operations is the same in both schedules

– View equivalence – three conditions (to be discussed later)

Result equivalent

• Two schedules are result equivalent if they produce the same final state of the database

• This is the simplest, but least satisfactory way of defining equivalence of schedules

• Different schedules may accidentally produce the same final state.

Conflict equivalence

• Two schedules are conflict equivalent if the order of any two conflicting operations is the same in both schedules

• This is the more commonly used definition of equivalence

• A schedule is conflict serializable if it is equivalent to some serial schedule

– We can reorder the nonconflicting operations if necessary.

Serial and serializable

• A serial schedule is inefficient because no interleaving is allowed

– This leads to low CPU utilization, and slowing down processes in general

• A serializable schedule gives the benefits of concurrent execution without giving up correctness

• But the interleaving of transactions is determined by the operating system

• If the results of interleaving are tested after the OS has interleaved, then either kept or thrown out, a lot of time may be wasted.

Uses of serializability

• Most DBMS do not test to see if a schedule is serializable.

• Instead they design protocols (sets of rules) will ensure serializability of all schedules.

• The most common techniques is called two-phrase locking

– This locks data items to prevent concurrent transactions from interfering with one another.

View equivalence

• Simplified version of view equivalence

– As long as each read operation of a transaction reads the result of the same write transaction in both schedules, they must produce the same results.

• It has been shown that any conflict-serializable schedule is also view serializable, but not visa versa

• It has also been shown to be an NP-hard problem to write an algorithm to test whether a schedule is view serializable.

Transaction support in SQL

• A single SQL statement is always considered to be atomic.

– Either the statement completes execution without error or it fails and leaves the database unchanged.

• Begin and end transaction

– SQL has no begin transaction statement.

– A transaction is initiated automatically with the first SQL statement in a session.

– Transaction initiation is done implicitly when particular SQL statements are encountered.

– Transactions can be terminated withn either COMMIT or ROLLBACK

– The next SQL statement automatically begins a new transaction

Setting transactions

• Every transaction has certain characteristics associated with it.

• These are specified by a SET TRANSACTION statement in SQL

• Example:

– SET TRANSACTION READ ONLY
ISOLATION LEVEL SERIALIZABLE
DIAGOSTICS SIZE 6;

Characteristics of transactions

• Access mode

– This can be either READ ONLY or READ/WRITE

– The default is READ/WRITE

• Isolation level

– This can be READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, or SERIALIZABLE

– The default is serializable

– A lower level than serializable may improve performance, but allow nonserialized interleaving.

• Diagnostic area size

– This determines the number of exception conditions that can be described at one time in the diagnostics area

• caused by the last executed SQL statement

• The default mode of execution is READ/WRITE, but it can be changed to READ ONLY