Data Integrity

Integrity constraints over relations

•      A database is only as good as the information stored in it

•      A DBMS must help prevent the entry of incorrect information

•      An integrity constraint is a condition specified on the schema that restricts the data that can be stored

•      Integrity constraints are specified when you create tables in using SQL’s Data Definition Language (DDL)

 

Types of Integrity constrains

•      Application based constraints
        (semantic integrity constraints)

•      Schema-based Constraints
        (can be specified in the DDL)

–   Types of general constraints

•   Domain Constraint—must be atomic values

•   Key Constraint

•   Entity Integrity Constraint

•   Referential Integrity Constraint

 

Key Constraints

•      First we need to look at types of keys

–   Superkey

•   Any set of attributes in a relation that make a tuple unique

–   Candidate key

–   Primary Key

–   Foreign Key

 

•      Formal definition of Candidate Key

–   Let R be a relation with attributes A1, A2,...,An.

–   Let K (the candidate key)  be a set of attributes of R;
                          K = {Ai, Aj,...,Ak}

•      K is a candidate key of R if and only if the following two properties are satisfied:

–    Uniqueness - At any given time, no two tuples of R have the same value for Ai,  Aj,..., and Ak.

–    Minimality - None of Ai, Aj,...Ak can be discarded from K without destroying the uniqueness property.

•      These two properties must be time independent.

 

Primary key

•      Choosing a primary key from the candidate keys is arbitrary

•      Primary keys are the only tuple-level addressing mechanism within the relation model.

•      Key integrity constraint states that
       
a tuple must have a Primary Key that
        is unique and minimal

•      Entity integrity constraint states that
         the Primary Key must not be NULL

•      All DBMS support this constraint

•      These constraints are specified on individual relations.

 

Foreign key

•      Specifies a constraint between two relations

Formal definition of a Foreign Key

•      Foreign Key is a set of attributes
        FK = {A1, A2,...An}

•      FK is a foreign key of R1 if it satisfies two rules:

–    1.  FK has the same domain as the PRIMARY KEY of R2

•    We say FK refers to the relation R2

–    2.  A value of FK must occur as a value of the PRIMARY KEY

•    or the FK may be NULL.

 

Note about foreign keys:

•      The foreign key must refer to a primary key; they must have the same domain.

•      The foreign key does not have to be a primary key.

•      The foreign key can be NULL.

•      A foreign key can refer to its own relation.   

–   R1 and R2 need not be distinct.

•      You must specify two things for a foreign key:
        1) the attribute name(s)
        2) the relation to which it refers.

•       Foreign to primary key matches represent the glue that holds the database together.

•      Look at the foreign keys in company DB.

 

The Company Database foreign keys

•      dept_location.dnumber references department

•      department.mgrssn  references employee

•      works_on.essn references employee

•      works_on.pno references project

•      project.dnum references department

•      dependent.essn references employee

•      employee.dno  references department

•      employee.superssn  references employee

 

Summary of constraints on a Relational Database

•      Domain Constraint

–   The value of each attribute A must be an atomic value from the domain for that attribute.

•      Key Constraint

–   A primary key must be unique and minimal.

•      Entity Integrity Constraint

–   No primary key value can be NULL

•      Referential Integrity Constraint

–   The value of a foreign key must match a value of the primary key in the relation to which it refers.

 

Constraints violations caused by data manipulation

•      Manipulation is divided into retrievals or updates

–    Since retrievals don’t change anything, they do not violate constraints

•      There are three basic update operations:

–    Insert a tuple or tuples

•    Can violate any of the four types of constraints

–    Delete a tuple or tuples

•    Can modify only the referential integrity constraint

–    Modify a tuple or tuples

•    Usually the only problems occur when a primary or foreign key is modified.

 

Violations?

•      Tell which constraint they violate, if any. Choices:

–     Key ConstraintA primary key must be unique and minimal.

–     Entity Integrity Constraint No primary key value can be NULL

–     Referential Integrity Constraint  Foreign key violation

 

•      Insert <'Product A', 4, 'Belaire', 2> into project

•      Update the DNO of the employee tuple with ssn = 999887777 to 7.

•      Update the ssn of the employee tuple with ssn = 999887777 to 987654321

•      Delete the works_on tuple with essn=999887777 and pno = 10.

•      Insert <’Production', 4, '943775543'', '01-oct-88'> into department

•      Delete the employee tuple with  ssn = '987654321'.

•      Insert <'453453453', 'John', 'M', '12-dec-60', 'spouse'> into dependent.

•      Delete the project tuple with pno = 10

 

What if updates violate the integrity constraints?

•      Two choices:

•      1) Reject the operation that would result in an illegal state.

•      2) Accept the operation, cascade the results to keep the DB legal.