Transaction Processing

Chapter 17

Introduction

      Multiuser systems for databases

      What is meant by a DB transaction.

      Why concurrency control is needed

      Why recovery is needed

      Processing transactions

      Desirable properties of transaction

 

Transaction processing systems

      These are systems with

    large databases and

    hundreds of concurrent users

    that are executing database transactions.

      Examples are:

    Systems for reservations

    Banking

    Stock markets

      They require:

    High availability

    Fast response time for hundreds of users at once

 

How concurrent users are handled

      A multiuser system lets many users access the database concurrently.

      Multiuser systems use interleaving to allow multiprogramming  (execute multiple programs to execute at the same time)

   A computer with one CPU can actually process only one program at a time

   Interleaving allows each user to have a small slice of the CPU’s time

 

What is a transaction

      A transaction is an executing program that forms a logical unit of database processing

   It includes one or more DB operations

   The operations can either be embedded in an application program or explicit SQL queries

   Often begin transaction and end transaction mark the boundaries of a transaction

 

Operations of a transaction

      The two database access operations that a transaction can include are reading and writing

    Executing a read_item(X) includes the following steps

    Find the address of the disk block  that contains X

    Copy that block into a buffer in RAM (if not already there)

    Copy item X from the buffer to the program variable named X

    Executing a write_item(X) includes the following steps:

    Find the address of the disk block that contains item X

    Copy that block into a buffer in RAM

    Copy item X from the buffer to a program variable

    Change the variable

    Copy item X from the program variable to its correct location in the buffer

    Store the updated block from the buffer back to disk

 

Why concurrency control is needed

      If there is no concurrency control, then a transaction may be interrupted before it completes all its statements

   I.e. the transactions is no longer atomic

      If the transaction to which control is given is working on the same data as the interrupted transaction, the results may become incorrect

 

The lost update problem

      Suppose T1 and T2 are submitted at about the time

      Then the operations are interleaved as show in the diagram

      The final value of item X is incorrect, because T2 reads the value of X before T1 changes it

      The updated value from T1 is lost

 

The temporary update problem

      Often call the Dirty Read problem

      This occurs when:

   one transaction, T1, updates a database item,

   The updated item is read by another transaction, T2,

   The first transaction fails for some reason, and so is rolled back

   This leaves T2 with a value of X that is no longer correct

 

The incorrect summary problem

      Suppose one transaction is calculating an aggregate sum function on one field of a relation

      Another transaction is updating the same field on some of the records in the relation

      If these two transactions are interleaved, the summing function may calculate some values before they are updated and others after they are updated.

 

Recovery

      Even with concurrency control, sometimes recovery is necessary

      Recovery means restoring the database to a state known to be correct
after some failure has rendered the current state incorrect

 

Why recovery is needed

      The DBMS is responsible for making sure that for every transaction either:

   1) all the operations in the transaction are completely successfully and permanetly recorded on disk, or

   2) the transaction has no effect whatsoever on the DB or any other transaction

      If a transaction starts execution, then fails for any reason the operations of the transaction that have been carried out must be undone

      This is called a rollback

 

Ways a transaction can fail

      System crash

    Hardware or software failure

      Transaction or system error

    Integer overflow, divide by zero, user interruption

      Local errors or exception conditions  detected by the transaction

    Data not found, other exceptions

      Concurrency control enforcement

    Deadlock, serializability

      Disk failure

      Physical problems and catastrophes

 

Transaction states

      A transaction is an atomic unit of work that is either completed in its entirety or not done at all.

      The system needs to keep track of where in its execution the transaction is for recovery purposes

      States of a transaction:

    Begin_transaction:

    Read or write:

    End_transaction:

    Commit_transaction:

    Rollback (or abort):

 

The system log

      To be able to recover from transaction failures, the system maintains a log on disk

      The log keeps track all transactions that affect the values of the db items

      The log is kept on disk

      The types of entries to the log are: (T refers to a unique transaction_id

    [start_transaction, T]

    [[write_item, T, X, old_value, new_value]

    [read_item, T, X]

    [commit, T]

    [abort, T]

 

Commit point of a transaction

      A transaction reaches its commit point when:

    all its operations have been executed successfully, and

    the effect of all the operations have been recorded in the log.

      At this point, the transaction write a commit record into the log.

      If a failure occurs, all the transactions in the log that have begun, but have not committed may have to be rolled back.

 

Writing the log to disk

      It is not practical to write to disk every time a log entry is made

      Commonly, one or more blocks of the log file are kept in main memory buffers until they are filled with log entries

    Then they are written to disk

      At the time of a system crash, only the log entries that are on disk are considered in recovery

      So, the log is force-written to disk before a transaction reaches its commit point.

    i.e., the log must be written to disk before the transaction can commit

 

Desirable properties of transactions (called the ACID properties)

      Atomic transactions should posses several properties, enforced by the DBMS

      Atomicity

    it is either performed iniits entirety or not at all

      Consistency preservation

    it transforms a consistent state of the db into another consistent state

      Isolation

    it should not make its update visible to other transactions until it is committed

      Durability or permanency

    once it commits, its updates survive, even if there is a subsequent system crash