Ways to control Concurrency
Chapter 18
Only sections 1, 5, 6, 7
Brief review of transaction processing
• Because transactions are interleaved by the OS, some
sort of concurrency control is needed
• A schedule is the order of operations of concurrent
executing transactions
– take the interleaving into account
• Some schedules are serializable, others are not
Overview of subject
• We will look at ways to control concurrency to ensure
the isolation property of concurrently executing transactions
• Most of these techniques ensure serializability
of schedules
• They use protocols that guarantee serializability
–
Most DBMSs use locking
of data items
–
Another set of
protocols use timestamps
• Another factor that affects concurrency controls the
granularity of the data items
–
The granularity
may be anything from a data attribute or tuple, or entire table
Summary of topics for today
• Locking data items
• Binary locks
• Shared/exclusive (read/write) locks
• Two phase locking
• Problems with two phase locking
– Deadlock
– Starvation
Locking data items
• A lock is a variable associated with a data
item
–
Exactly what is
meant by a data item is determined by the granularity.
• The lock describes the status of the item; what
possible operations can be applied to it
• Usually there is one lock for each data item in the DB
• Locks are used to guarantee the serializability
of transaction schedules
• There are problems with using locks: deadlock and
starvation
Types of locks
•
Binary locks
– These are simple two state variables associated with a
data item
– But they are so restrictive that that are not used
– We will discuss them, to get an idea of locks
•
Shared/exclusive
locks
– These provide a more general locking capability
– They are used in practical DB locking schemes
Binary locks
• A binary lock can have two states: locked or unlocked
(zero or one)
–
If the value lock(x) = 1, x cannot be accessed,
if the value lock(x)
= 0, then x can be accessed
• Two operations
are used:
– Lock_item(x) and unlock_item(x)
•
Look at the
implementation of lock_item and unlock_item
• The code for the two operations must not be
interleaved while it is executing (it is a critical section)
Implementing a binary lock
•
In its simplest
form, a lock can be a record with three fields < data item, 0 or 1,
changeLock>
–
It also needs
associated with its queue for transactions that are waiting to access the item.
•
The system needs
to maintain a lock table
–
This would
contain the names of those items that are locked (usually a hash table)
–
If a data item is
not in the table, it is assumed unlocked.
• The DBMS must have a lock manager subsystem to
keep track of and control access to locks
Rules transactions T must follow
•
T must lock(x) before reading or writing x
•
T must unlock (x) after all reads and writes are completed
•
T must not lock(x) if it already has it locked
•
T must not unlock(x) unless it already holds the lock on x
•
These rules can
be enforced by the lock manager
Binary lock problems
• A binary lock forces mutual exclusion on the
data item
– At most one transaction can access a given item at a
time
– If all a transaction wants to do is read, several
transactions should be allowed to access at once
– But if a transaction wants to write, it must have
exclusive access.
• A multiple-mode lock
solves this problem
Shared/Exclusive (read/write) locks
• There are three locking options instead of just two
–
Read_lock(x) other
transactions are allowed to read the item
–
Every transaction
reading sets a read_lock on x
–
Write_lock(x) a single
transaction exclusively holds the lock on the item
–
Unlock(x) unlocks either type
of lock
• A lock then, can have three states
•
Each record in
the lock table will have four fields <data item, lock state, num reads, change lock>
• Look at implementation of locks and unlock
• Each function must be indivisible
Rules transaction T for read/write locks
•
T must issue
either a read or write lock before reading or writing an item
•
T must issue a
write lock if it is going to write
•
T must unlock
after operations are completed
•
T must not issue
locks if it already holds them
•
T must not issue
an unlock for a lock it does not hold
Conversion of locks
• Sometimes a transaction may want to convert a lock it
holds to a different type of lock
• Upgrading
–
To convert a read
lock to a write lock, no other transaction can be holding a read lock
• Downgrading
–
It is easier to
convert a write lock to a read lock
• If upgrading and downgrading of locks is permitted,
extra information must be kept in the lock table
–
The specific
transaction hold the lock must be kept, instead of just the fact that some
transaction hold the lock
Locks and serializability
• Just the fact that locks are used does not guarantee
serializability
• Look at figure 18.3
• The incorrect result occurred because the transactions
were unlocked too early
• To guarantee serializability, an additional protocol
must be added
–
This will concern
the positioning of locks and unlocks for each transaction
–
The best known is
two-phase locking
Two-phase locking
• All locking operations in a transaction precede the
first unlock operation
–
All locks must
be made before anything is unlocked
•
A transaction can
be divided into two phases
– An expanding phase where the number of locks
increase
– A shrinking phase, where the number of locks
decrease
•
If lock
conversion is allowed:
– Upgrading
must be done during the expanding phrase
• From
read lock to write lock
– Downgrading
must be done during the shrinking phase
• From
write lock to read lock
•
It can be proved
that if every transaction follows the two phase locking, the schedule is always
serializable.
– Now, no test for a serializable schedule is necessary
Problems with two phase-locking
• It may limit the amount of concurrency that can occur
in a schedule
• Not all the possible serializable schedules are
permitted.
• Some time slice allocations may result in deadlock
– Look at possible time slices in Figure 18.4
• Starvation may also occur
Variations of two-phase locking
•
Basic
–
This is what we
have been talking about
–
It guarantees
serializability, but may cause deadlock
•
Conservative
–
All locks are
made before the transaction begins execution
–
This requires
predeclaring its read and write sets
–
If any of the
items in the sets cannot be locked, none are locked; it waits until all are
available for locking
–
As you might
guess, this is not practical for several reasons
–
This is deadlock
free
•
Strict 2PL
–
This is the most
popular variation of 2PL
–
A transaction
does not unlock any of its write locks until after it commits or aborts
–
No other
transaction can read or write an item that this transaction writes, until it is
done
–
This is not
deadlock free
•
Rigorous 2PL
–
This is more
restrictive that strict 2PL
• A transaction does not release any of its locks
until after it commits or aborts
Summary of variations of 2PL
• Conservative 2PL
– Since it must lock all items before it starts, it is
always in the shrinking phase
• Rigorous 2PL
– Since it does not unlock any of its locks until done,
it is always in the expanding phase
• The only variation that guarantees deadlock free is
conservative.
Dealing with deadlock
• Deadlock occurs when each transaction in a set of two
or more is waiting for some item that is locked by another transaction
• Deadlock prevention protocols
–
But they may
cause some transactions to be aborted and restarted needlessly
• This
is true even though those transactions may never actually cause deadlock
• Deadlock detection and timeouts
–
Waiting until
deadlock occurs and detecting it, then fixing it is a more practical approach.
Deadlock prevention
1. Require a transaction to lock all the
items it needs in advance; not very practical
2. Timestamp protocols
–
If T1 starts
execution before T2, we say that T1 < T2
–
Wait-die:
• An
older waits on a younger transaction that has an item it needs
• A
younger transaction requesting an item held by an older transaction is aborted,
then restarted
–
Wound-wait:
• A
younger T waits on an older transaction
• If
the older T needs an item the younger has locked, the younger is aborted, then
restarted.
Deadlock prevention
3. No time stamping required
–
No waiting
•
If a transactions unable to obtain a lock, it aborts,
and restarts later
•
Causes a lot of unnecessary aborts and restarts
–
Cautious
waiting
•
Suppose you have two transactions T1 and T2. T2 has a lock on item X that T1 needs. T1 waits only if T2 is not also waiting for
another item
– A transaction only waits for transactions that are not
waiting.
•
If T2 later becomes blocked, deadlock still will not
occur
Deadlock detection
• A more practical method of dealing with deadlock is to
wait until it occurs, then do something.
•
The classic way
of detecting deadlock is a wait-for graph
–
Every transaction
executing has a node on the graph, along with all the items it has locked, and
all it is waiting for
–
There is directed
edge from every transaction waiting for an item to the transaction locking that
item.
–
If the graph has
any cycles, deadlock has occurred.
• The graph must be updated every time a transaction
asks for a lock, gets a lock, or releases a lock
Deadlock
• Checking for cycles in a directed graph is not
especially easy for a computer
• The decision must be made when the system
should check for a deadlock
–
This could be
based on:
• The
number of transactions currently running, or
• The
amount of time some transactions have been waiting
• Victim selection
–
Usually, the
victim should NOT be a transaction that has been running a long time, and had
made a lot of updates
• Timeouts instead of deadlock detection
–
If as transaction
waits longer than some specified period, deadlock is assumed, and it is aborted
–
This method is at
least simple, with low overhead
Starvation
• Starvation is when a transaction continually gets
aborted or left waiting, while others are executing normally.
– This may occur if the waiting or aborting scheme is
unfair
• Ways to assure fairness
– Use a first-come first-served queue for waits
– Use priorities for transactions that have been waiting
longest, or aborted.
Granularity of data items
–
Fine granularity
refers to small data item size like an attribute
–
Coarse
granularity refers to large data item size like a disk block or a whole file
• Tradeoffs when deciding on granularity
–
The larger the
data item size, the lower the degree of concurrency
–
The smaller the
data item size, the more items there are in the DB
•
Every item has associated with it a lock;
•
there will be more locks and unlocks;
•
the lock table will be larger;
•
if timestamping is used, more timestamps
What is the best granularity?
• Depends on the type of transactions involved.
– If most transactions access a small number of records;
granularity should be one record
– If most transaction access many records in the same
file; use block or file granularity