Database Concepts and Architecture
Data models
The purpose of a model is to provide a level of
abstraction (i.e. hide details)
The model is a set of concepts that can be used
to describe the structure of a database.
The structure includes:
data types and length
relationships
Constraints
A data model often includes:
A set of
operations for retrievals & updates
A set of valid
user-defined operations that are allowed
Different models have different objectives.
Data model categories
A High level
or conceptual model is close to the way users perceive the data.
This type of model is often used to transmit ideas from
end user to the DB designer.
The E-R diagram is of this type
A representational or implementation model hide
some details of how the data is stored, but can be implemented on a computer
The relational model is of this type
A low level or physical model is the model of
how the data is physically stored on the computer.
This would include stored in a hash table or B+ tree
The main data model we will look at is the relational
model
the data is perceived by the user as tables
the operations on the tables generates new tables from
base tables
But we will use the E-R (Entity-Relationship)
model to help with design
We will also look at indexes and B+ trees, low level
models
The Object-oriented model is becoming more and
more important, as is the object oriented/relational model
The Hierarchical and Network model are
older models (covered in the appendix of your text)
Database Schemas and Instance
Understanding database systems requires making a clear
distinction between the databases structure (the schema) and its contents
(the state or instances.)
Schema - the
description of the structure database
the meta data (specified during db design)
does not change often
Instance -
the actual data
changes all the time
the db at an instant in time
The DBMS architecture
The architecture helps achieve the important
characteristics of a DBMS
Three of these important characteristics are:
Data independence
Multiple user views
A catalog to store the metadata that is available to
both the user and the DBMS
The goal of the architecture used is to separate the
user applications and the physical database
Three-schema
Architecture
Reminder: A schema is a description of the structure
of the database
Internal level has an internal schema
Conceptual level has a conceptual schema
External level usually has several external schemas
Note that each of these schemas are only descriptions
of data; the
data really only exists at the physical level
External Level
The end user is presented with special views that are
tailored to their specific needs
Some of these
views may be forms to fill out, others for interactive retrieval of
information, etc.
Many different views may be available that are completely different that the
way the base tables are arranged
Conceptual or
logical level
Description of the structure of the entire DB
Hides the details of physical storage
Concentrates on describing entities,
data types, relationships, user operations, and constraints.
In the relational model, this is the base tables
Changes at this level can be made without having any
effect on the external level
Internal Level
This is a description of the physical storage
structure of the database
Operations performed here are translated into
modifications of the contents and structure of the files
Advantages of the three tiered architecture
This division into levels allows both developers and
users to work on their own levels
They do not need
to know the details of the other levels
AND they do not
have to know anything about changes in the other levels
Most DBMS do not separate the three levels completely.
Mappings
Note this:
The only data that is saved in the database is at the
internal level
Both the data and metadata are saved by the DBMS at the
physical level
Each user group refers only to its own external schema
so the DBMS must transform a request on an external
schema
into a request against the conceptual schema,
then into a request on the internal schema for processing.
What if the request is for a
retrieval?
A lot of overhead!
The process of transforming requests and results
between levels are called mappings.
These mappings may be time consuming, so some DBMS do
not separate the three levels completely.
Data independence
One reason for the three level
architecture is to give data independence.
One way to define data independence: the capacity to
change the schema at one level of a DB system without having to change the
schema at the next higher level
If the schema at one level is changed,
and the mapping to the next higher level are changed,
the schema at the next level can remain unchanged.
Two types of data independence
Logical data independence
Possible to
change the conceptual schema without changing external schema or application
programs
Columns could be
added to tables without changing the existing external schema or application
programs
Changes here
necessitate changes to the physical level
Physical data independence
Possible to
change internal schema without changing the conceptual schema
Example of physical data independence:
DBA decides it is more efficient to store the data in
a B+ tree instead of a heap, so changes the internal schema
then he changes the mapping to the conceptual schema
conceptual schema remains unchanged.
Database languages
Data Definition
Language (DDL) is used to specify the
internal & conceptual schemas.
The DBMS will identify descriptions of the schema and
store the schema descriptions in the DBMS catalog.
After the structure is defined and made machine
readable, data can be added to the database.
Data Manipulation Language (DML) is used to manipulate the DB
Manipulation includes:
retrieval (querying the DB)
inserting records
deleting records
changing data already entered
SQL is both DDL and DML for the different levels
It is based on Sequel ( and often called Sequel)
SQL is a comprehensive integrated language that
includes constructs for:
conceptual schema definition,
view definition
data manipulation
SQL can either be embedded in a host language
or stand alone, called a query language.