Overview of the database
course
I. Three schema architecture
A. Purpose of using three schema
1. Data independence
2. Multiple user views
3. A catalog to store metadata, available to
both the user and DBMS
B. External
schema or view
1. This is what you have been working on in
your projects
C. Logical
schema (conceptual level)
1. Models we looked at
a. Entity-relational model
b. Relational
model
(1) Structure
(a) Normal forms
(b) SQL-DDL
(2) Manipulation
(a) Relational algebra operations
i) Unary
operations
a) select
b) project
ii) Set operations
a) union
b) intersection
c) set difference
d) Cartesian product
iii) Binary operations
a) join
b) division
iv) Added operations
(not part of RA)
a) rename
b) aggregate functions
c) grouping
(b) SQL-DML
(3) Integrity
(enforced by constraints)
(a) General constraints for all DB
i) Key constraints
ii) Entity integrity constraints
iii) Referential integrity constraints
(b) Application based constraints
i) Some can be
specified in SQL
a) data types
b) check constraints
c) not null
ii) Others are specified by an application
c. Object-Relational model
(1) First Normal form is not enforced
2. Database design (two tools to help)
a. E-R diagrams (top down design)
b. Normal forms and functional dependencies
(bottom up design)
(1) 1NF requires are attributes to be atomic
(2) 2NF does not allow partial dependencies
(3) 3NF does not allow transitive dependencies
(4) BCNF - every determinate is a candidate key
D. Internal
schema (physical level)
1. Physical
file organization on disk (primary organization)
a. Know advantages and disadvantages of :
(1) unordered records
(heap file)
(2) sorted files
(3) Hashed files
2. Index structures
a. Types of indexes
(1) Primary index (nondense)
(a) on the ordered key
field
(b) cluster index on the
ordered nonkey field
(2) Secondary index on an unordered field (dense)
(a) unordered key field
(b) unordered nonkey field
(3) Indexes with a tree data structure
(a) multilevel indexes
(b) B+ trees
i) the number of pointers is determined by the block size and
the size of a key
b. How much do indexes help?
(1) if you know the
block size, the number and size of records to be stored, you can calculate the
(a) blocking factor
(records/block)
(b) the number of blocks
needed to store the file
(c) the number of disk
accesses with out an index needed to find a record
(2) if an index is used,
and you know the size of an index record, you can find
(a) the index blocking
factor
(b) the number of blocks
to store the index
(c) the number of disk
accesses to find a record
3. Processing
and optimizing queries
a. First an initial query tree is built
(1) Cartesian cross of the relations in the from clause
(2) add the join and
selection conditions
(3) last add the
projections
b. To "optimize" the tree
(1) move the conditions
and projections as far down the tree as possible
c. Optimize this query: Retrieve the last name
of employees who work more than 10 hours/week on ‘ProductX'
4. Transaction
processing
a. Concurrency
control
(1) Lack of concurrency control can result in:
(a) the lost update
problem
(b) the temporary update
problem
(c) the incorrect
summary problem
(2) Schedules based on Recoverability
(a) Recoverable-- once a transaction is
committed, it never has to be rolled back
(b) Avoid cascading rollback– No uncommitted
transaction has to be rolled back because it read from a transaction that
failed
(c) Strict schedule– a transaction can neither
read nor write an item X until the last transaction the wrote X has committed
(3) Schedules based on serializability
(a) Most commonly used is conflict equivalence
(b) A schedule is conflict serializable if it is
equivalent to some serial schedule
(4) Two phase locking insures serializable
schedules
(a) A read/write lock is associated with every
data item in the DB
(b) Granularity of data items determines the size
of the data item
(c) All locking operations must come before any
unlocking operations
(5) Two phase locking can lead to deadlock and
starvation
(a) deadlock can either
be prevented or detected
(b) starvation can be
prevented by using a fair system for aborts or waiting
b. Recovery
techniques
(1) Recovery concepts
(a) A write ahead log is
used to record all changes to the DB before those changes are actually made
(b) The DBMS cache is the buffers used to hold
the blocks transferred from disk
i) each buffer has flags associated with it
a) the dirty bit
indicates if it has been modified
b) the pin-unpin bit
indicates if it can be written to disk
(c) Writing buffers to disk terminology
i) steal/no-steal
a) buffers that are
no-steal have the pin-unpin bit set
ii) force/no-force
a) a force is when a
transaction is written to disk as soon as it commits
iii) The most common protocol used by DBMS is
steal/no-force
(d) Checkpoints involve
i) "force writing the contents of the buffers to disk
ii) writing a
checkpoint on the log
(2) Main techniques used for recovery
(a) Deferred update
i) the DB is not updated until after a transaction reaches a
commit
ii) transactions never
need to be undone after failure
iii) impractical for
large DBs with many transactions
(b) Immediate update
i) the DB may be updated before the transaction reaches a
commit point
ii) two main
categories
a) uses
steal/force–never a need to redo
b) used steal/no-force
II. Readings in text
A. Part
1 Introduction and Conceptual Modeling
Chapter
1 Databases and database Users
Chapter
2 System concepts and Architecture
Chapter 3 Modeling using the E-R diagram
B. Part
2 The Relational Model
Chapter
5 Concepts and Constraints
Chapter
6 Relational Algebra (omit 6.7-8)
Section
7.1 Mapping ER diagrams to tables
Chapter 8 SQL schema definition, constraints, queries
C. Part
3 Design theory
Chapter 10 Functional dependencies and normalization
D. Part
4 Data storage, indexing, query processing
Chapter 13 Disk storage, file structures, & hashing
Chapter
14 Indexing structures for files
Section
15.1, 15.3.2, 15.7 Query Optimization & algorithms for join
Chapter 16 Practical design & tuning
E. Part
5 Transaction Processing concepts
Chapter
18 Concurrency control techniques
Chapter
17 Intro to transaction processing
Chapter
18 Concurrency control techniques
Chapter
19 Database recovery techniques