Database Systems

Class information on the web

•      Syllabus includes:

–   Text

–   Grading criteria

–   Final time

–   Office hours

•      Lecture schedule includes

–   Reading assignments

–   Schedule of lecture topics and exams

•   Most of the lectures will be posted before class

–   Link to assignments

 

Topics we will cover

•      Relational DB model

–    relational algebra

–    SQL

•    Data definition language DDL

•    Data manipulation language DML

•      DB design

–    E-R diagrams

–    Functional dependencies

–    Normalization

•      Physical storage

–    Index files and B+ trees

•      Lab work

–    Access—learn on your own

–    PostGRES-- this will be the focus of our lab work

•      System Implementation Concerns

–    query processing & optimization

–    transaction processing

–    concurrency control

–    recovery techniques

–    security and authorization

 

Managing Data

•      The goal of this course is to present an in-depth introduction to database management systems.

•      The emphasis will be on

–   How to design a database, and

–   How to use a DBMS effectively

•      Many decisions about how to use a DBMS for a given application depend on the capabilities of the DBMS

–   So, it is also necessary to understand how  DBMS works.

 

We will address these questions

•      DB design & application development

–    What factors must be considered in storing the data?

–    How can an application be built to access the data?

•      Data analysis

–    How can we write queries to access the data?

•      Concurrency & Robustness

–    How can many users use the DB at the same time?

–    How is the data protected in the event of system failures?

•      Efficiency

–    What can be done to make the system fast?

 

A historical perspective

•      The earliest databases were sheets of paper kept in metal file cabinets

•      In the early 60’s, Charles Bachman at General Electric developed the network data model

•      In late 60’s IBM developed the hierarchical data model

–    Both the network and hierarchical models required pointers to access the data

•      In 1970 Edgar Codd at IBM developed the relational data model.

–    This model works with sets, and is based on set theory

–    When first proposed, it ran so slow that some thought it would never become viable

 

Importance of Databases

•      We are in the Age of Information

•      Computers are information-processing machines

•      Databases are at the center of computing

•      Today’s DBMS are the result of decades of research and development

–   Microsoft sells more than a million licenses for Access each month

–   Oracle has become the second largest software company in the world by specializing in database systems

 

What is a Database?

•      It is a

–   large,

–    persistent,

–   integrated collection of

–   dynamic data that

–   provides some operations to

–   manipulate and access this data.

•      A DB can be of any size & varying complexity

•      A database may be generated and maintained manually, or it may be computerized.

What is a Database Management System?

•      It is a collection of programs that enable users to:

–    Create or define the structure for a database

–    Maintain the database by entering and editing data,

–    Retrieve  information from a DB.

•      It is a combination of software and data:

–    A physical DB: a collection of files;

–    The schema or meta-data: the information on how the data is organized;

–    The DB engine: software that supports access to and modification of the data; and

–    The data definition (DDL) and manipulation languages (DML) : programming languages that support schema definition and database access

 

Example of a Relational Database

•      There are five files on the handout

–   each table is kept in a separate file

•      Each file has a definite structure

–   the structure of each file is defined when the db is created

–   Every record (row) in a file has the same structure

–   Each column is the same data type

–   This structure is defined when the DB is created

•      Records in various files are related

 

An example  relational database

•      Entity - an object about which we want to record data

–    what are the entities for this DB?

•      Can you find the names of the courses Smith is taking?

•      What is the relationship between the tables?

•      Each file has a primary key and zero or more foreign keys

–    A primary key is a unique identifier for a table

–    Foreign keys establishes the relationship between tables

•    It is an attribute that refers to a primary key in another table

 

Queries and updates

•      Queries are questions asked of the database

–   A query needs only to read the data in the DB

–   It does not need to write anything

•      Updates change the data in the database

–   A user must have write privileges to update

 

Database vs. traditional file processing

•      Traditional file processing

–   Each user defines and implements the files needed for a specific application

•   The data file is intimately bound to the program that uses the data

•   The programs you have written that do file I/O – the data was arranged in a certain way in the file, and the program has to know exactly how it was arranged

•   If you wanted to change the data arrangement or add a field, you would have to change the program than manipulated the data

–   The same data may be stored in more than one place by different users

 

The database approach

•      Stores both data and meta-data

–    The meta-data describes the structure of the tables that hold the data; how many columns, what type of data should be in each column, the constraints on the data, etc.

–    The meta-data is stored in a system catalog, which is part of the DBMS

•      Data independence

–    The structure of the data file is independent from the software that accesses them

–    This means the structure of the data file can be changed without changing the programs that access the data

•      Data Abstraction

–    A data model hides storage details that may not be of interest to the end user

–    The most efficient way to store the data usually is not the same as the way the user wants to look at it

•      Multiple views—a view may put tables together, show only parts of tables or composite tables, or shows virtual data

–    Example: a user may want a transcript view

 

Intended uses of a DBMS

•      Controlling redundancy

–    Redundancy leads to several problems: Duplication of effort, wasted storage space, data may become inconsistent

•      Sharing of data among multiple users

–    Each user may want a different view of the data

–    There must be concurrency control

•      Restricting Access

–    Different users should have different privileges

•      Enforcing Integrity constraints

•      Providing backup and recovery

 

People in a Database system

•      End users—these may be casual or sophisticated

–    A bank teller, airline agent, or CEO

•      Database designers—these are software professions who design the structure and create the DB

•      Applications developers—these design and develop applications to work with the DB

–    Include user interfaces, canned transactions and data analysis

•      Database administrators—they are responsible for:

–     controlling access to the database system,

–    maintaining data accuracy and integrity and

–    monitoring and improving performance

–    backup and recovery