Project possibility

•      This project requires some knowledge of the stock market (or a eagerness to learn)

•       

•      The Relational Model: Operations

The Relational Algebra

Sections 6.1-6.5

 

Importance of relational algebra

•      We study the relational algebra for several reasons

–   It provides a formal foundation for relational model operations, mainly queries

–   It is used as a basis for implementing and optimizing queries in DBMS

–   Some of its concepts are incorporated into SQL

•      Basically, in relational algebra we will specify HOW to find the result of a query; in SQL, we just specify WHAT we want, not how to do it

 

Operations to manipulate the data

•    Queries in algebra are composed using a collection of operators.

•    A fundamental property is that

–  every operator accepts relation(s) as input, and

–  returns a relation as the result.

 

 

Relational algebra operations

•      Unary operations

–   Select – limits the number of tuples

–   Project –limits the number of attributes

•      Set Operations

–   Union

–   Intersection

–   Set difference – subtracts one relation from another

–   Cartesian product-- cross product of two relations

•      Complex binary operations

–    join

–    Division

•      Additional operations (not part of the original algebra)

–   Rename – renames a relation

–   Aggregate functions

–   Grouping

 

Select

•      Selects tuples that satisfy a given predicate.

•      It operates on one relation.

•      For select we use the Greek letter sigma F

•      Form: F condition (RELATION NAME)

•      The condition is a boolean expression

 

Examples of Select:

F branch_name="Perryridge" (LOAN)

F amount > 1200 (LOAN)

•      The Booleans AND/OR can be used to connect conditions

•      The selection condition is applied independently to each tuple in the relation

–    If the condition evaluated to TRUE, then the tuple is selected

•       The degree of the resulting relation is the same as the original one (same # of attributes)

 

Project

•      Limits   the attributes to those specified

•      A unary operator

•      The Greek letter pi B is the selection operator

•      Form: B attribute list (RELATION NAME)

 

Example of Project:

B loan_number, amount (LOAN)

•      The degree is equal to the number of attributes named in the attribute list

–    They will be listed in the same order as in the list

•      If the attribute list contains only nonkey attributes, there may be duplicates

–    project removes any duplicates, so the result will still be a relation

•      The number of tuples is always <= the number in the original relation

 

Select and Project

•      If you think of a relation as a table,

–   the select operations selects some of the rows and discards other;

–   the project operations keeps some of the columns and throws the others away

–   This lets the user look at only the information they are interested in. 

•      Select and project work on only  one table

 

Combining Relational Operators

•      Query: Find those customers who live in Harrison.

B c_name (F c_city = "Harrison" (CUSTOMER))

•      You need to visualize the intermediate table. 

•      If you project before you select, you have lost all the attributes but the one named.

•      You could rename the intermediate table
City_h
฿ (F c_city = "Harrison" (CUSTOMER)

B c_name(City_h)

 

Set Theoretic Operations

•      These are the standard set operations

–   Union, intersection and difference

•      A relation should be thought of as a set of tuples

•      These operate on two sets, i.e. two relations, so are used to merge the relations in different ways

•      All these operations must be union-compatible

 

Union-compatible

•      Two requirements to union compatibility

–   The relations must be of the same degree (same # of attributes)

–   The domains of corresponding attributes must be the same

•      Note that the relations in a union-compatible operation usually are intermediate, the result of other operations

•      Duplicate tuples are eliminated

 

Union   c

•      Puts two compatible sets together

•      A binary operator

•      Form: (relation1) c (relation2)

•      Your text uses the convention that the resulting relation has the same name as the first relation

 

Query: Find the names of all bank customers who have either an account or loan or both.

•      We need information from two relations and we need to make them union compatible

–  Names of all customers with a loan:   Bc_name (BORROWER)

–  Names of all customers with an account: B c_name (DEPOSITOR)

The c of these two sets

  B c_name (BORROWER) c Bc_name(DEPOSITOR)

•      Or, you could rename the relations then do the union

r1 := B c_name (BORROWER)

r2 := B c_name (DEPOSITOR)

r1 c r2

 

Set Difference

•      Finds tuples that are in one relation but not in another

•      Binary operator

•      Form: relation1 difference relation2

•      The relations must be union compatible

–   (attributes same number & type)

 

Query: Find all customers of the bank who have an account but not a loan.

B   c_name (DEPOSITOR) difference  B c_name (BORROWER)

Or name the intermediate result, the apply the operation

    r1 := B c_name (BORROWER)

r2 := B c_name (DEPOSITOR)

r1 difference r2

 

Queries

•      1) Find the loans from Perryridge where the amount of the loan > 1200

       F branch_name="Perryridge“ AND amount > 1200 (LOAN)

•      2) Find those customers who live in Harrison.

       B c_name (F c_city = "Harrison" (CUSTOMER))

•      3) Are there any branches that have accounts but not loans?  Which one(s)?  Loans but not account?

•       4) Find all customers of the bank who have an either  an account or a loan or both.

 

Answers to queries

3)  B branch_name (ACCOUNT) difference
                               
B branch_name (LOAN)

4) r1 ฿ B c_name (BORROWER)

r2 ฿ B c_name (DEPOSITOR)

r1 c r2

 

Renaming

•      You can rename attributes as well as relations

•      Renaming relations

–   New_name ฿ (relations algebra expression)

•      Renaming attributes

•      temp ฿ Fbalance > 500(Loan)

•      High (account, bal)฿ Baccount-number, balance(temp)

 

Cartesian-Product  X

•      Cross product of two relations (binary operator)

•      The relations do not have to be union-compatible

•      In the new relation

–    the number of attributes is the sum of # of attributes in the two relations.

–    the number of tuples is the product of the number of tuples in the two relations.

•      Each tuple in the first is combined with every tuple in the second

 

Example:  borrower X loan

•      Attributes of resultant relation

–   5 attributes

–   since the same attribute name appears in both they must be qualified

–   borrower.loan_number and loan.loan_number

•      Tuples of resultant relation

–   56 tuples

–   a tuple is constructed out of each possible pairs of tuples; one from each relation

 

Cartesian Product uses

•      The X operation gives a lot of useless tuples, but it can be used to put together relations that are not union compatible

•      Almost always select is then used to eliminate the unwanted tuples

•      Since select is used almost universally with X, a special operation was created to specify this sequence as a single operation

•      This is the join operation

 

Non Fundamental Operators

•      The previous fundamental operators of the relational algebra are sufficient to express any relations-algebra query. 

•      But if we define a few more operations, some of the queries become simpler. 

•      So the following operators do not give any more power to the algebra, just make life easier.

 

 Join 

•      Forms a Cartesian product of two relations,

–  then performs a selection forcing =, < or > on those attributes that appear in both relations, and then removes all duplicates

•      Form:  R1 condition  R2

•       Example:
     BORROWER
B.loan-num=L.loan-num LOAN

 

Resultant relation of

•      5 attributes, two from Borrower, three from Loan

–   Note that the number of attributes is the same as X

•      Only those tuples where the condition was true are included

 

Equijoin

•      If the join condition is equality (the most common), the join is called an equijoin

•      In an equijoin, one or more pairs of attributes have identical values

•      A natural join is an equijoin where one of the identical columns is eliminated.

 

Natural Join  *

•      The standard definition of a natural join requires the two join attributes have the same name   Borrower * Loan

–   If they don’t have the same names one can be renamed

•      A nonstandard definition allows you to specify the join attributes from each relation

–   The names do not have to be the same
Employee * (DNO, DNUMBER) Department

 

 

Query: Find the names and amount of all customers who have loans at the bank.

B c_name, amount (BORROWER * LOAN)

 

Query: Find the names of all branches with customers living in Harrison who have an account.

•      Find all customers living in Harrison
R1
฿ BC_name( F c_city="Harrison" (CUSTOMER))

•      Find customers in Harrison with an account

•      R2 ฿ ( R1 * DEPOSITOR)

•      Now find which branch those accounts are in

•      .B branch_name (R2 * ACCOUNT ))

•      You can do all this in one statement, just make sure you get the order correct

 

Set Intersection   1

•      Finds tuples that are in both relation1 and relation2

•      Form: relation11 relation2

•      The relations must be union compatible

 

Query: Find all customers who have both a loan and an account.

•      .B c_name  (BORROWER) 1 Bc_name(DEPOSITOR)

 

•      Query:  Are there any customers who have neither loans nor accounts?  Give their names.

R1 ฿ B c_name (BORROWER) c B c_name (DEPOSITOR)

B c_name (CUSTOMER) – R1

 

Division

•      Suited to queries that include the phrase "for all"

•      Query:Find all customers who have an account at all the branches in Brooklyn

•      First, get two intermediate relations:

1)Find all branches in Brooklyn

    BinB฿B branch_name (Fbranch_city = "Brooklyn" (BRANCH))

2) Find all customer's branches

•      .CB ฿ B c_name, branch_name (DEPOSITOR*ACCOUNT)

•      3) divide     CB  ๗ BinB

 

The division operation

•      It is applied to two relations ( R and Sub) where one is a subset of the other (Sub a subset of R)

•      R(X, Y) is divided by Sub(X);

•      The result relation will have attribute(s) Y

•      For a tuple to appear in the result of the division the values must appear in R in combination with every tuple in Sub

 

Retrieve the names of employees who work on all the projects that ‘John Smith’ works on

•      We have to have two relations here

–    Employees and the projects they work on

–    The projects that John Smith works on

•      Employees and the projects they work on

–    From the WORKS_ON table, project out ESSN and PNO

•      The projects that John Smith works on

–    Join the EMPLOYEE and WORKS_ON tables, select name= JohnSmith, then project out PNO

•      Divide the first by the second, then join the results back with the EMPLOYEE table to get the names

 

 

Additional Relational Operations

•      Relational algebra cannot perform some operations that are needed by a DBMS.

•      Extra operations have been defined that extend the power of the relational algebra

–   Aggregate functions

•Sum

•Average

•Maximum

•Minimum

•Count

–   Grouping

 

 

Aggregate Functions

•      A mathematical functions that takes a collection of values and returns a single value as a result

•      Form function attribute (RELATION)

•      Query:  Find the sum of salaries of all the part time employees;

            sum salary (PT-WORKS)

•      Query:   find the average salary.

            avg salary (PT-WORKS)

•      the result of both these queries will be a relation with one attribute and one tuple

 

 

Aggregate Functions (cont.)

•      Naming the attributes in the resultant relation

–   either rename, or

–   concatenation of the function name with the attribute name

•      if we want to eliminate duplicates the word distinct can be added

 

•      Query: Find the number of branches appearing in the pt-works relation
 
count distinct branch_name (PT-WORKS)

 

Grouping Functions

•      Another type of request involves.

–   1) grouping the tuples by an attribute and then

–   2) applying an aggregate function independently to each group of tuples.

•      Form:
<grouping attribute> group  <function-attribute-pair list>  Relation

•      The resulting relation has the grouping attribute plus one attribute for each element in the function list

 

•      Query: Find the total salary of all part-time employees at each branch individually.

        b_name group sum salary (pt-works)

•      this means group by b_name and sum salary by those groups

•      Query: For each branch find the sum of the salaries and the maximum salary for part-time employees.

•      Result (b_name, sum, max) ฿
   
( b_name group sum salary, max salary (PT-WORKS)

Queries

1.What is the total amount of outstanding loans at each branch?

2. Give the branch names of those branches that have assets over 1,000,000

3. List the customer names who have accounts at Brighton

4. In what branches does Johnson have an account?     

 

Working with the Company database

•      Query:  Retrieve the social security numbers of all employees who either work in department 5 or directly supervise an employee who works in department 5

•      .dept5_emps <-- F dno=5 (employee)

•      result1   <-- B ssn (dept5_emps)

•      result2 (ssn) <-- B superssn(dep5_emps)

•      Result   <-- result1 union result2

More queries

Q7. List the names of managers who have at least one dependent

–    strategy: intersect manages SSN with the SSN of employees with dependents

Q6. Retrieve the names of employees who have no dependents

     strategy:  subtract the SSN of employees with dependents from all employee’s SSN

Q5. List the names of all employees with two or more dependents

     strategy: use an aggregate function to count the dependent of each employee

Q4. Make a list of project numbers for projects that involve an employee whose last name is ‘Smith’, either as a worker or as a manager of the department that controls the project

strategy:  union the projects that Smith works on with those projects for which he is a manager of the controlling department

Q3. Find the names of employees who work on all the projects controlled by department number 5

      strategy:  divide the ESSN and PNO by all the projects controlled by department 5