Overview of the database course

Overview of the database course

I. Three schema architecture

A. Purpose of using three schema

1. Data independence

2. Multiple user views

3. A catalog to store metadata, available to both the user and DBMS

B. External schema or view

1. This is what you have been working on in your projects

C. Logical schema (conceptual level)

1. Models we looked at

a. Entity-relational model

b. Relational model

(1) Structure

(a) Normal forms

(b) SQL-DDL

(2) Manipulation

(a) Relational algebra operations

i) Unary operations

a) select

b) project

ii) Set operations

a) union

b) intersection

c) set difference

d) Cartesian product

iii) Binary operations

a) join

b) division

iv) Added operations (not part of RA)

a) rename

b) aggregate functions

c) grouping

(b) SQL-DML

(3) Integrity (enforced by constraints)

(a) General constraints for all DB

i) Key constraints

ii) Entity integrity constraints

iii) Referential integrity constraints

(b) Application based constraints

i) Some can be specified in SQL

a) data types

b) check constraints

c) not null

ii) Others are specified by an application

c. Object-Relational model

(1) First Normal form is not enforced

2. Database design (two tools to help)

a. E-R diagrams (top down design)

b. Normal forms and functional dependencies (bottom up design)

(1) 1NF requires are attributes to be atomic

(2) 2NF does not allow partial dependencies

(3) 3NF does not allow transitive dependencies

(4) BCNF - every determinate is a candidate key

D. Internal schema (physical level)

1. Physical file organization on disk (primary organization)

a. Know advantages and disadvantages of :

(1) unordered records (heap file)

(2) sorted files

(3) Hashed files

2. Index structures

a. Types of indexes

(1) Primary index (nondense)

(a) on the ordered key field

(b) cluster index on the ordered nonkey field

(2) Secondary index on an unordered field (dense)

(a) unordered key field

(b) unordered nonkey field

(3) Indexes with a tree data structure

(a) multilevel indexes

(b) B+ trees

i) the number of pointers is determined by the block size and the size of a key

b. How much do indexes help?

(1) if you know the block size, the number and size of records to be stored, you can calculate the

(a) blocking factor (records/block)

(b) the number of blocks needed to store the file

(2) if an index is used, and you know the size of an index record, you can find

(a) the index blocking factor

(b) the number of blocks to store the index

3. Processing and optimizing queries

a. First an initial query tree is built

(1) Cartesian cross of the relations in the from clause

(2) add the join and selection conditions

(3) last add the projections

b. To "optimize" the tree

(1) move the conditions and projections as far down the tree as possible

c. Optimize this query: Retrieve the last name of employees who work more than 10 hours/week on ‘ProductX'

4. Transaction processing

a. Concurrency control

(1) Lack of concurrency control can result in:

(a) the lost update problem

(b) the temporary update problem

(2) Schedules based on Recoverability

(a) Recoverable-- once a transaction is committed, it never has to be rolled back

(b) Avoid cascading rollback– No uncommitted transaction has to be rolled back because it read from a transaction that failed

(c) Strict schedule– a transaction can neither read nor write an item X until the last transaction the wrote X has committed

(3) Schedules based on serializability

(a) Most commonly used is conflict equivalence

(b) A schedule is conflict serializable if it is equivalent to some serial schedule

(4) Two phase locking insures serializable schedules

(a) A read/write lock is associated with every data item in the DB

(b) Granularity of data items determines the size of the data item

(5) Two phase locking can lead to deadlock and starvation

(a) deadlock can either be prevented or detected

(b) starvation can be prevented by using a fair system for aborts or waiting

b. Recovery techniques

(1) Recovery concepts

(a) A write ahead log is used to record all changes to the DB before those changes are actually made

(b) The DBMS cache is the buffers used to hold the blocks transferred from disk

i) each buffer has flags associated with it

a) the dirty bit indicates if it has been modified

b) the pin-unpin bit indicates if it can be written to disk

i) steal/no-steal

a) buffers that are no-steal have the pin-unpin bit set

ii) force/no-force

a) a force is when a transaction is written to disk as soon as it commits

iii) The most common protocol used by DBMS is steal/no-force

(d) Checkpoints involve

i) "force writing the contents of the buffers to disk

ii) writing a checkpoint on the log

(2) Main techniques used for recovery

(a) Deferred update

i) the DB is not updated until after a transaction reaches a commit

ii) transactions never need to be undone after failure

iii) impractical for large DBs with many transactions

(b) Immediate update

i) the DB may be updated before the transaction reaches a commit point

ii) two main categories

a) uses steal/force–never a need to redo

b) used steal/no-force

II. Readings in text

A. Part 1 Introduction and Conceptual Modeling

Chapter 1 Databases and database Users

Chapter 2 System concepts and Architecture

Chapter 3 Modeling using the E-R diagram

B. Part 2 The Relational Model

Chapter 5 Concepts and Constraints

Chapter 6 Relational Algebra (omit 6.7-8)

Section 7.1 Mapping ER diagrams to tables

Chapter 8 SQL schema definition, constraints, queries

C. Part 3 Design theory

Chapter 10 Functional dependencies and normalization

D. Part 4 Data storage, indexing, query processing

Chapter 13 Disk storage, file structures, & hashing

Chapter 14 Indexing structures for files

Section 15.1, 15.3.2, 15.7 Query Optimization & algorithms for join

Chapter 16 Practical design & tuning

E. Part 5 Transaction Processing concepts

Chapter 18 Concurrency control techniques

Chapter 17 Intro to transaction processing

Chapter 18 Concurrency control techniques

Chapter 19 Database recovery techniques