Overview of the database course

I.     Three schema architecture

       A.    Purpose of using three schema

               1.    Data independence

               2.    Multiple user views

               3.    A catalog to store metadata, available to both the user and DBMS

       B.  External schema or view

               1.    This is what you have been working on in your projects

       C.  Logical schema (conceptual level)

               1.    Models we looked at

                      a.     Entity-relational model

                      b.    Relational model

                             (1)   Structure

                                    (a)   Normal forms

                                    (b)   SQL-DDL

                             (2)   Manipulation

                                     (a)   Relational algebra operations

                                            i)     Unary operations

                                                   a)    select

                                                   b)    project

                                            ii)     Set operations

                                                   a)    union

                                                   b)    intersection

                                                   c)    set difference

                                                   d)    Cartesian product

                                            iii)    Binary operations

                                                   a)    join

                                                   b)    division

                                            iv)    Added operations (not part of RA)

                                                   a)    rename

                                                   b)    aggregate functions

                                                   c)    grouping

                                    (b)   SQL-DML

                             (3)   Integrity (enforced by constraints)

                                    (a)   General constraints for all DB

                                            i)     Key constraints

                                            ii)     Entity integrity constraints

                                            iii)    Referential integrity constraints

                                    (b)   Application based constraints

                                            i)     Some can be specified in SQL

                                                   a)    data types

                                                   b)    check constraints

                                                   c)    not null

                                           ii)     Others are specified by an application

                      c.     Object-Relational model

                             (1)   First Normal form is not enforced

               2.    Database design (two tools to help)

                      a.     E-R diagrams (top down design)

                      b.    Normal forms and functional dependencies (bottom up design)

                             (1)   1NF requires are attributes to be atomic

                             (2)   2NF does not allow partial  dependencies

                             (3)   3NF does not allow transitive dependencies

                             (4)   BCNF - every determinate is a candidate key

 

       D.  Internal schema (physical level)

               1.    Physical file organization on disk (primary organization)

                      a.     Know advantages and disadvantages of :

                             (1)   unordered records (heap file)

                             (2)   sorted files

                             (3)   Hashed files

              2.    Index structures

                      a.     Types of indexes

                             (1)   Primary index (nondense)

                                    (a)   on the ordered key field

                                    (b)   cluster index on the ordered nonkey field

                             (2)   Secondary index on an unordered field (dense)

                                    (a)   unordered key field

                                    (b)   unordered nonkey field

                             (3)   Indexes with a tree data structure

                                    (a)   multilevel indexes

                                    (b)   B+ trees

                                            i)     the number of pointers is determined by the block size and the size of a key

                      b.    How much do indexes help?

                             (1)   if you know the block size, the number and size of records to be stored, you can calculate the

                                    (a)   blocking factor (records/block)

                                    (b)   the number of blocks needed to store the file

                                    (c)   the number of disk accesses with out an index needed to find a record

                             (2)   if an index is used, and you know the size of an index record, you can find

                                    (a)   the index blocking factor

                                    (b)   the number of blocks to store the index

                                    (c)   the number of disk accesses to find a record

               3.    Processing and optimizing queries

                      a.     First an initial query tree is built

                              (1)   Cartesian cross of the relations in the from clause

                             (2)   add the join and selection conditions

                             (3)   last add the projections

                      b.    To "optimize" the tree

                             (1)   move the conditions and projections as far down the tree as possible

                      c.     Optimize this query: Retrieve the last name of employees who work more than 10 hours/week on ‘ProductX'

               4.    Transaction processing

                      a.     Concurrency control

                             (1)   Lack of concurrency control can result in:

                                    (a)   the lost update problem

                                    (b)   the temporary update problem

                                    (c)   the incorrect summary problem

                             (2)   Schedules based on Recoverability

                                    (a)   Recoverable-- once a transaction is committed, it never has to be rolled back

                                    (b)   Avoid cascading rollback– No uncommitted transaction has to be rolled back because it read from a transaction that failed

                                    (c)   Strict schedule– a transaction can neither read nor write an item X until the last transaction the wrote X has committed

                             (3)   Schedules based on serializability

                                    (a)   Most commonly used is conflict equivalence

                                    (b)   A schedule is conflict serializable if it is equivalent to some serial schedule

                             (4)   Two phase locking insures serializable schedules

                                    (a)   A read/write lock is associated with every data item in the DB

                                    (b)   Granularity of data items determines the size of the data item

                                    (c)   All locking operations must come before any unlocking operations

                             (5)   Two phase locking can lead to deadlock and starvation

                                    (a)   deadlock can either be prevented or detected

                                    (b)   starvation can be prevented by using a fair system for aborts or waiting

                      b.    Recovery techniques

                             (1)   Recovery concepts

                                    (a)   A write ahead log is used to record all changes to the DB before those changes are actually made

                                    (b)   The DBMS cache is the buffers used to hold the blocks transferred from disk

                                            i)     each buffer has flags associated with it

                                                   a)    the dirty bit indicates if it has been modified

                                                   b)    the pin-unpin bit indicates if it can be written to disk

                                    (c)   Writing buffers to disk terminology

                                            i)     steal/no-steal

                                                   a)    buffers that are no-steal have the pin-unpin bit set

                                            ii)     force/no-force

                                                   a)    a force is when a transaction is written to disk as soon as it commits

                                            iii)    The most common protocol used by DBMS is steal/no-force

                                    (d)   Checkpoints involve

                                            i)     "force writing the contents of the buffers to disk

                                            ii)     writing a checkpoint on the log

                             (2)   Main techniques used for recovery

                                    (a)   Deferred update

                                            i)     the DB is not updated until after a transaction reaches a commit

                                            ii)     transactions never need to be undone after failure

                                            iii)    impractical for large DBs with many transactions

                                    (b)   Immediate update

                                            i)     the DB may be updated before the transaction reaches a commit point

                                            ii)     two main categories

                                                   a)    uses steal/force–never a need to redo

                                                   b)    used steal/no-force

 


II.  Readings in text

       A.    Part 1 Introduction and Conceptual Modeling

               Chapter 1 Databases and database Users   

               Chapter 2 System concepts and Architecture

               Chapter 3 Modeling using the E-R diagram

 

       B.    Part 2 The Relational Model

               Chapter 5 Concepts and Constraints

               Chapter 6 Relational Algebra (omit 6.7-8)

               Section 7.1 Mapping ER diagrams to tables

               Chapter 8 SQL schema definition, constraints, queries

 

       C.    Part 3 Design theory

               Chapter 10 Functional dependencies and normalization

 

       D.   Part 4 Data storage, indexing, query processing

               Chapter 13 Disk storage, file structures, & hashing

               Chapter 14 Indexing structures for files

               Section 15.1, 15.3.2, 15.7 Query Optimization & algorithms for join

               Chapter 16  Practical design & tuning

 

       E.    Part 5 Transaction Processing concepts

               Chapter 18 Concurrency control techniques

               Chapter 17 Intro to transaction processing

               Chapter 18 Concurrency control techniques

               Chapter 19 Database recovery techniques