Transaction Processing Concepts

      In a multiuser system, transactions are interleaved

      Some sort of concurrency control is needed to prevent transactions from interfering with each other

      Recovery is needed for transactions that partially execute, then fail

      The system log is used in recovery

      There are two types of schedules used in recovery

    Based on recoverability

    Based on serializability

      Transaction support in SQL

 

Review of recovery using logs

Undo

l        The  log contains a record of every write operation that changes the value of some database item,

l        So it is possible to undo the effect of these write operations of a transaction

l        This is done by tracing backward through the log and resetting all items changed by a write operation to their old values.

Redo

l        We can also redo the effect of the write operations of a transaction

l        This is done  by tracing forward through the log and setting all items changed by a write operation (that did not get done permanently) to their new values.  

 

Schedules and Recoverability

      What is a schedule?

    When tractions are executing concurrently in an interleaved way,

    then the order of execution of operations from the different transactions is know as a schedule

      The order of operations in each transaction must remain the same as if they were not interleaved

      For the purpose of recovery and concurrency control we are mainly interested in read, write, commit and abort

 

A example of a transaction schedule

      This is the order of execution of operations from various transactions when transactions are executing concurrently.

      shorthand - r or read_item, w for write_item, c for commit, a for abort

      write the schedule for figures the Lost Update and the Temporary Update problems used in the last lecture.

      Lost Update: r1(X); r2(X); w1(X); r1(Y); w2(X); c2; w1(Y); c1;

      Temporary Update: r1(X); w1(X); r2(X); w2(X); c2; r1(Y); a1;

 

Two operations conflict if:

      They belong to different transactions, and

   they access the same item X, and

   one of the operations is a write(X)

      Which operations of schedules we wrote conflict? where?

      Which operations do not conflict?

 

Schedules based on Recoverability

      Recoverable – Once a transaction T is committed, it is never be necessary to roll it back.

    No transaction T in schedule S commits until all transactions T' that have written an item that T reads have committed.

      Consider this schedule

   S: r1(X); w1(X); r2(X); r1(Y); w2(X); c2; a1;

   Is it recoverable?

   No

   How must the order of operations be changed to make it recoverable?

 

Avoiding cascading rollbacks

      Cascading rollbacks

   This is when an uncommitted transaction has to be rolled back because it read an item from a transaction that failed.

   Consider this schedule that illustrates cascading rollbacks:

   S: r1(x); w1(X); r2(X); r1(Y); w2(X) w1(Y); a1;

Strict Schedule

      A transaction can neither read nor write an item X until the last transaction that wrote X has committed (or aborted)

 

      This simplifies recovery to restoring the before image of X.

 

 Summary of schedules based on recoverability

      Three types

   Recoverable - Once a transaction T is committed, it is never be necessary to roll it back.

   Avoidance of cascading rollbacks

   Cascading rollbacks occur when an uncommitted transaction has to be rolled back because it read an item from a transaction that failed.

   Strict - A transaction can neither read nor write an item X until the last transaction that wrote X has committed (or aborted)

 

 

Schedules based on serializability

      Serial schedules: 

   Entire transactions are performed in serial order with no interleaving

   Every serial schedule is assumed to be correct.

   What is the problem with serial schedules?    

      Serializability of schedules:

   Identifying which schedules are correct when the serial schedules are interleaved.

      Example: two airline schedules and different ways they may be interleaved

 

Serializability of a schedule

      We would like to determine which of the nonserial schedules always give a correct result

      A schedule is serializable if it is equivalent to some serial schedule of the same n transactions

      Saying that a nonserial schedule is serializable is equivalent to saying that it is correct.

      Serializability is hard to check.

    Interleaving of operations occurs in an operating system through some scheduler

    Difficult to determine beforehand how the operations in a schedule will be interleaved.

 

Which schedules are equivalent?

      There are several different ways  of defining equivalence of schedules.

   Result equivalent – produce the same final state of the database

   Conflict equivalent – the order of any two conflicting operations is the same in both schedules

   View equivalence – three conditions (to be discussed later)

 

Result equivalent

     Two schedules are result equivalent  if they produce the same final state of the database

      This is the simplest, but least satisfactory way of defining equivalence of schedules

      Different schedules may accidentally produce the same final state.

 

Conflict equivalence

      Two schedules are conflict equivalent if the order of any two conflicting operations is the same in both schedules

      This is the more commonly used definition of equivalence

      A schedule is conflict serializable if it is equivalent to some serial schedule

   We can reorder the nonconflicting operations if necessary.

 

 

Serial and serializable

      A serial schedule is inefficient because no interleaving is allowed

    This leads to low CPU utilization, and slowing down processes in general

      A serializable schedule gives the benefits of concurrent execution without giving up correctness

      But the interleaving of transactions is determined by the operating system

      If the results of interleaving are tested after the OS has interleaved, then either kept or thrown out, a lot of time may be wasted.

 

Uses of serializability

      Most DBMS do not test to see if a schedule is serializable.

      Instead they design protocols (sets of rules) will ensure serializability of all schedules.

      The most common techniques is called two-phrase locking

   This locks data items to prevent concurrent transactions from interfering with one another.

 

 

View equivalence

      Simplified version of view equivalence

   As long as each read operation of a transaction reads the result of the same write transaction in both schedules, they must produce the same results.

      It has been shown that any conflict-serializable schedule is also view serializable, but not visa versa

      It has also been shown to be an NP-hard problem to write an algorithm to test whether a schedule is view serializable.

 

Transaction support in SQL

      A single SQL statement is always considered to  be atomic. 

    Either the statement completes execution without error or it fails and leaves the database unchanged. 

      Begin and end transaction

    SQL has no begin transaction statement.

    A transaction is initiated automatically with the first SQL statement in a session.

    Transaction initiation is done implicitly when particular SQL statements are   encountered.

    Transactions can be terminated withn either COMMIT or ROLLBACK

    The next SQL statement automatically begins a new transaction

 

Setting transactions

      Every transaction has certain characteristics associated with it.

      These are specified by a SET TRANSACTION statement in SQL

      Example:

   SET TRANSACTION READ ONLY
ISOLATION LEVEL SERIALIZABLE
DIAGOSTICS SIZE 6;

 

Characteristics of transactions

      Access mode

    This can be either READ ONLY or READ/WRITE

    The default is READ/WRITE

      Isolation level

    This can be READ UNCOMMITTED, READ COMMITTED, REPEATABLE READ, or SERIALIZABLE

    The default is serializable

    A lower level than serializable may improve performance, but allow nonserialized interleaving.

      Diagnostic area size

    This determines the number of exception conditions that can be described at one time  in the diagnostics area

    caused by the last executed SQL statement

 

      The default mode of execution is READ/WRITE, but it can be changed to READ ONLY