Transaction
Processing
Chapter
17
Introduction
• Multiuser systems for databases
• What is meant by a DB transaction.
• Why concurrency control is needed
• Why recovery is needed
• Processing transactions
• Desirable properties of transaction
Transaction processing systems
• These are systems with
–
large databases and
–
hundreds of concurrent users
–
that are executing database transactions.
• Examples are:
–
Systems for
reservations
–
Banking
–
Stock markets
• They require:
–
High availability
–
Fast response
time for hundreds of users at once
How concurrent users are handled
• A multiuser system lets many
users access the database concurrently.
•
Multiuser systems use interleaving to allow multiprogramming (execute
multiple programs to execute at the same time)
– A computer with one CPU can actually process only one
program at a time
– Interleaving allows each user to have a small slice of
the CPU’s time
What is a transaction
• A transaction is an executing program that forms a
logical unit of database processing
–
It includes one
or more DB operations
– The operations can either be embedded in an
application program or explicit SQL queries
– Often begin transaction and end transaction
mark the boundaries of a transaction
Operations of a transaction
• The two database access operations that a transaction
can include are reading and writing
–
Executing a read_item(X) includes the following steps
• Find
the address of the disk block
that contains X
• Copy
that block into a buffer in RAM (if not already there)
• Copy
item X from the buffer to the program variable named X
–
Executing a write_item(X) includes the following steps:
• Find
the address of the disk block that contains item X
• Copy
that block into a buffer in RAM
• Copy
item X from the buffer to a program variable
• Change
the variable
• Copy
item X from the program variable to its correct location in the buffer
• Store
the updated block from the buffer back to disk
Why concurrency control is needed
• If there is no concurrency control, then a transaction
may be interrupted before it completes all its statements
– I.e. the transactions is no longer atomic
• If the transaction to which control is given is
working on the same data as the interrupted transaction, the results may become
incorrect
The lost update problem
• Suppose T1 and T2 are submitted at about the time
• Then the operations are interleaved as show in the
diagram
• The final value of item X is incorrect, because T2
reads the value of X before T1 changes it
• The updated value from T1 is lost
The temporary update problem
• Often call the Dirty Read problem
• This occurs when:
– one transaction, T1, updates a database item,
– The updated item is read by another transaction, T2,
– The first transaction fails for some reason, and so is
rolled back
– This leaves T2 with a value of X that is no longer
correct
The incorrect summary problem
• Suppose one transaction is calculating an aggregate
sum function on one field of a relation
• Another transaction is updating the same field on some
of the records in the relation
• If these two transactions are interleaved, the summing
function may calculate some values before they are updated and others after
they are updated.
Recovery
• Even with concurrency control, sometimes recovery is
necessary
• Recovery means restoring the database to a state known
to be correct
after some failure has rendered the current state incorrect
Why recovery is needed
• The DBMS is responsible for making sure that for every
transaction either:
– 1) all the operations in the
transaction are completely successfully and permanetly
recorded on disk, or
– 2) the transaction has no
effect whatsoever on the DB or any other transaction
• If a transaction starts execution, then fails
for any reason the operations of the transaction that have been carried out
must be undone
• This is called a rollback
Ways a transaction can fail
• System crash
–
Hardware or
software failure
• Transaction or system error
–
Integer overflow,
divide by zero, user interruption
• Local errors or exception conditions detected by the transaction
–
Data not found,
other exceptions
• Concurrency control enforcement
–
Deadlock, serializability
• Disk failure
• Physical problems and catastrophes
Transaction states
• A transaction is an atomic unit of work that is either
completed in its entirety or not done at all.
• The system needs to keep track of where in its
execution the transaction is for recovery purposes
• States of a transaction:
–
Begin_transaction:
–
Read or write:
–
End_transaction:
–
Commit_transaction:
–
Rollback (or
abort):
The system log
• To be able to recover from transaction failures, the
system maintains a log on disk
• The log keeps track all transactions that affect the
values of the db items
• The log is kept on disk
• The types of entries to the log are: (T refers
to a unique transaction_id
–
[start_transaction, T]
–
[[write_item, T, X, old_value, new_value]
–
[read_item, T, X]
–
[commit, T]
–
[abort, T]
Commit point of a transaction
• A transaction reaches its commit point when:
–
all its operations have been executed successfully, and
–
the effect of all the operations have been recorded in
the log.
• At this point, the transaction write
a commit record into the log.
• If a failure occurs, all the transactions in the log
that have begun, but have not committed may have to be rolled back.
Writing the log to disk
• It is not practical to write to disk every time a log
entry is made
• Commonly, one or more blocks of the log file are kept
in main memory buffers until they are filled with log entries
–
Then they are
written to disk
• At the time of a system crash, only the log entries
that are on disk are considered in recovery
• So, the log is force-written to disk before a
transaction reaches its commit point.
–
i.e., the log
must be written to disk before the transaction can commit
Desirable properties of transactions (called
the ACID properties)
• Atomic transactions should posses several properties,
enforced by the DBMS
• Atomicity
–
it is either performed iniits
entirety or not at all
• Consistency preservation
–
it transforms a consistent state of the db into another
consistent state
• Isolation
–
it should not make its update visible to other
transactions until it is committed
• Durability or permanency
–
once it commits, its updates survive, even if there is a
subsequent system crash