Ways to control Concurrency

Chapter 18

Only sections 1, 5, 6, 7

 

Brief review of transaction processing

      Because transactions are interleaved by the OS, some sort of concurrency control is needed

      A schedule is the order of operations of concurrent executing transactions

   take the interleaving into account

      Some schedules are serializable, others are not

 

Overview of subject

      We will look at ways to control concurrency to ensure the isolation property of concurrently executing transactions

      Most of these techniques ensure serializability of schedules

      They use protocols that guarantee serializability

    Most DBMSs use locking of data items

    Another set of protocols use timestamps

      Another factor that affects concurrency controls the granularity of the data items

    The granularity may be anything from a data attribute or tuple, or entire table

 

Summary of topics for today

      Locking data items

      Binary locks

      Shared/exclusive (read/write) locks

      Two phase locking

      Problems with two phase locking

   Deadlock

   Starvation          

 

Locking data items

      A lock is a variable associated with a data item

    Exactly what is meant by a data item is determined by the granularity.

      The lock describes the status of the item; what possible operations can be applied to it

      Usually there is one lock for each data item in the DB

      Locks are used to guarantee the serializability of transaction schedules

      There are problems with using locks: deadlock and starvation

 

Types of locks

      Binary locks

   These are simple two state variables associated with a data item

   But they are so restrictive that that are not used

   We will discuss them, to get an idea of locks

      Shared/exclusive locks

   These provide a more general locking capability

   They are used in practical DB locking schemes

 

Binary locks

      A binary lock can have two states: locked or unlocked (zero or one)

    If the value lock(x) = 1, x cannot be accessed,
if the value
lock(x) = 0, then x can be accessed

      Two operations  are used:

   Lock_item(x) and unlock_item(x)

      Look at the implementation of lock_item and unlock_item

      The code for the two operations must not be interleaved while it is executing (it is a critical section)

 

Implementing a binary lock

      In its simplest form, a lock can be a record with three fields < data item, 0 or 1, changeLock>

    It also needs associated with its queue for transactions that are waiting to access the item.

      The system needs to maintain a lock table

    This would contain the names of those items that are locked (usually a hash table)

    If a data item is not in the table, it is assumed unlocked.

      The DBMS must have a lock manager subsystem to keep track of and control access to locks

 

Rules transactions T must follow

             T must lock(x) before reading or writing x

             T must unlock (x) after all reads and writes are completed

             T must not lock(x) if it already has it locked

             T must not unlock(x) unless it already holds the lock on x

             These rules can be enforced by the lock manager

 

 

Binary lock problems

      A binary lock forces mutual exclusion on the data item

   At most one transaction can access a given item at a time

   If all a transaction wants to do is read, several transactions should be allowed to access at once

   But if a transaction wants to write, it must have exclusive access.

      A multiple-mode lock  solves this problem

 

Shared/Exclusive (read/write) locks

      There are three locking options instead of just two

    Read_lock(x) other transactions are allowed to read the item

    Every transaction reading sets a read_lock on x

    Write_lock(x) a single transaction exclusively holds the lock on the item

    Unlock(x) unlocks either type of  lock

      A lock then, can have three states

      Each record in the lock table will have four fields <data item, lock state, num reads, change lock>

      Look at implementation of locks and unlock

      Each function must be indivisible

 

Rules transaction T for read/write locks

             T must issue either a read or write lock before reading or writing an item

             T must issue a write lock if it is going to write

             T must unlock after operations are completed

             T must not issue locks if it already holds them

             T must not issue an unlock for a lock it does not hold

 

Conversion of locks

      Sometimes a transaction may want to convert a lock it holds to a different type of lock

      Upgrading

    To convert a read lock to a write lock, no other transaction can be holding a read lock

      Downgrading

    It is easier to convert a write lock to a read lock

      If upgrading and downgrading of locks is permitted, extra information must be kept in the lock table

    The specific transaction hold the lock must be kept, instead of just the fact that some transaction hold the lock

 

Locks and serializability

      Just the fact that locks are used does not guarantee serializability

      Look at figure 18.3

      The incorrect result occurred because the transactions were unlocked too early

      To guarantee serializability, an additional protocol must be added

    This will concern the positioning of locks and unlocks for each transaction

    The best known is two-phase locking

 

Two-phase locking

      All locking operations in a transaction precede the first unlock operation

    All locks must be made before anything is unlocked

      A transaction can be divided into two phases

    An expanding phase where the number of locks increase

    A shrinking phase, where the number of locks decrease

      If lock conversion is allowed:

    Upgrading must be done during the expanding phrase

    From read lock to write lock

    Downgrading must be done during the shrinking phase

    From write lock to read lock

      It can be proved that if every transaction follows the two phase locking, the schedule is always serializable.

    Now, no test for a serializable schedule is necessary

 

Problems with two phase-locking

      It may limit the amount of concurrency that can occur in a schedule

      Not all the possible serializable schedules are permitted.

      Some time slice allocations may result in deadlock

   Look at possible time slices in Figure 18.4

      Starvation may also occur

 

Variations of two-phase locking

      Basic

    This is what we have been talking about

    It guarantees serializability, but may cause deadlock

      Conservative

    All locks are made before the transaction begins execution

    This requires predeclaring its read and write sets

    If any of the items in the sets cannot be locked, none are locked; it waits until all are available for locking

    As you might guess, this is not practical for several reasons

    This is deadlock free

      Strict 2PL

    This is the most popular variation of 2PL

    A transaction does not unlock any of its write locks until after it commits or aborts

    No other transaction can read or write an item that this transaction writes, until it is done

    This is not deadlock free

      Rigorous 2PL

    This is more restrictive that strict 2PL

      A transaction does not release any of its locks until after it commits or aborts

 

Summary of variations of 2PL

      Conservative 2PL

   Since it must lock all items before it starts, it is always in the shrinking phase

      Rigorous 2PL

   Since it does not unlock any of its locks until done, it is always in the expanding phase

      The only variation that guarantees deadlock free is conservative.

 

 

Dealing with deadlock

      Deadlock occurs when each transaction in a set of two or more is waiting for some item that is locked by another transaction

      Deadlock prevention protocols

    But they may cause some transactions to be aborted and restarted needlessly

    This is true even though those transactions may never actually cause deadlock

      Deadlock detection and timeouts

    Waiting until deadlock occurs and detecting it, then fixing it is a more practical approach.

 

Deadlock prevention

1. Require a transaction to lock all the items it needs in advance; not very practical

2. Timestamp protocols

    If T1 starts execution before T2, we say that T1 < T2

    Wait-die: 

    An older waits on a younger transaction that has an item it needs

    A younger transaction requesting an item held by an older transaction is aborted, then restarted

    Wound-wait:

    A younger T waits on an older transaction

    If the older T needs an item the younger has locked, the younger is aborted, then restarted.

 

Deadlock prevention

3. No time stamping required

   No waiting

   If a transactions unable to obtain a lock, it aborts, and restarts later

   Causes a lot of unnecessary aborts and restarts

   Cautious waiting

   Suppose you have two transactions T1 and T2.  T2 has a lock on item X that T1 needs.  T1 waits only if T2 is not also waiting for another item

   A transaction only waits for transactions that are not waiting.

   If T2 later becomes blocked, deadlock still will not occur

 

Deadlock detection

      A more practical method of dealing with deadlock is to wait until it occurs, then do something.

      The classic way of detecting deadlock is a wait-for graph

    Every transaction executing has a node on the graph, along with all the items it has locked, and all it is waiting for

    There is directed edge from every transaction waiting for an item to the transaction locking that item.

    If the graph has any cycles, deadlock has occurred.

      The graph must be updated every time a transaction asks for a lock, gets a lock, or releases a lock

 

Deadlock

      Checking for cycles in a directed graph is not especially easy for a computer

      The decision must be made when the system should check for a deadlock

    This could be based on:

    The number of transactions currently running, or

    The amount of time some transactions have been waiting

      Victim selection

    Usually, the victim should NOT be a transaction that has been running a long time, and had made a lot of updates

      Timeouts instead of deadlock detection

    If as transaction waits longer than some specified period, deadlock is assumed, and it is aborted

    This method is at least simple, with low overhead

 

Starvation

      Starvation is when a transaction continually gets aborted or left waiting, while others are executing normally.

   This may occur if the waiting or aborting scheme is unfair

      Ways to assure fairness

   Use a first-come first-served queue for waits

   Use priorities for transactions that have been waiting longest, or aborted.

 

Granularity of data items

    Fine granularity refers to small data item size like an attribute

    Coarse granularity refers to large data item size like a disk block or a whole file

      Tradeoffs when deciding on granularity

    The larger the data item size, the lower the degree of concurrency

    The smaller the data item size, the more items there are in the DB

   Every item has associated with it a lock;

   there will be more locks and unlocks;

   the lock table will be larger;

   if timestamping is used, more timestamps

 

What is the best granularity?

     Depends on the type of transactions involved.

  If most transactions access a small number of records; granularity should be one record

  If most transaction access many records in the same file; use block or file granularity