Project possibility
This
project requires some knowledge of the stock market (or a eagerness to learn)
The
Relational Model: Operations
The Relational
Algebra
Sections 6.1-6.5
Importance of relational algebra
We study the relational algebra for several reasons
It provides a
formal foundation for relational model operations, mainly queries
It is used as a basis for implementing and optimizing
queries in DBMS
Some of its concepts are incorporated into SQL
Basically, in relational algebra we will specify HOW to find the result of a query; in SQL, we just specify
WHAT we want, not how to do it
Operations to manipulate the data
Queries in algebra are composed using a collection of operators.
A fundamental property is that
every operator accepts relation(s) as input,
and
returns a relation as the result.
Relational algebra operations
Unary
operations
Select limits
the number of tuples
Project limits
the number of attributes
Set Operations
Union
Intersection
Set difference subtracts one relation from another
Cartesian product-- cross product of two relations
Complex binary
operations
join
Division
Additional operations (not part of the original algebra)
Rename renames
a relation
Aggregate
functions
Grouping
Select
Selects tuples that satisfy a given predicate.
It operates on one relation.
For select we use
the Greek letter sigma F
Form: F
condition (RELATION NAME)
The condition is a boolean expression
Examples of Select:
F branch_name="Perryridge"
(LOAN)
F amount
> 1200 (LOAN)
The Booleans AND/OR can be used to connect conditions
The selection condition is applied independently to
each tuple in the relation
If the condition
evaluated to TRUE, then the tuple is selected
The degree
of the resulting relation is the same as the original one (same # of
attributes)
Project
Limits the
attributes to those specified
A unary operator
The Greek letter pi B is the
selection operator
Form: B attribute
list (RELATION NAME)
Example of Project:
B loan_number,
amount (LOAN)
The degree is equal to the number of attributes
named in the attribute list
They will be
listed in the same order as in the list
If the attribute list contains only nonkey attributes,
there may be duplicates
project removes
any duplicates, so the result will still be a relation
The number of tuples is always <= the number in the
original relation
Select and Project
If you think of a relation as a table,
the select operations selects some of the rows
and discards other;
the project operations keeps some of the
columns and throws the others away
This lets the user look at only the information they
are interested in.
Select and project work on only one table
Combining Relational Operators
Query: Find
those customers who live in Harrison.
B c_name
(F c_city = "Harrison" (CUSTOMER))
You need to visualize the intermediate table.
If you project before you select, you have lost all
the attributes but the one named.
You could rename
the intermediate table
City_h ฿ (F c_city = "Harrison" (CUSTOMER)
B c_name(City_h)
Set Theoretic Operations
These are the standard set operations
Union,
intersection and difference
A relation should be thought of as a set of
tuples
These operate on two sets, i.e. two relations, so are
used to merge the relations in different ways
All these
operations must be union-compatible
Union-compatible
Two requirements to union compatibility
The relations must be of the same degree (same # of
attributes)
The domains of corresponding attributes must be the
same
Note that the relations in a union-compatible
operation usually are intermediate, the result of other operations
Duplicate tuples are eliminated
Union
c
Puts two compatible sets together
A binary operator
Form: (relation1)
c (relation2)
Your text uses the convention that the resulting
relation has the same name as the first relation
Query: Find the names of all bank customers
who have either an account or loan or both.
We need information from two relations and we need to
make them union compatible
Names of all customers with a loan: Bc_name
(BORROWER)
Names of all customers with an account: B c_name
(DEPOSITOR)
The
c of these
two sets
B c_name
(BORROWER) c Bc_name(DEPOSITOR)
Or, you could rename the relations then do the union
r1 := B c_name (BORROWER)
r2 := B c_name (DEPOSITOR)
r1 c r2
Set Difference
Finds tuples that are in one relation but not in
another
Binary operator
Form: relation1 difference relation2
The relations must be union compatible
(attributes same number & type)
Query: Find all customers of the bank who
have an account but not a loan.
B
c_name (DEPOSITOR) difference B c_name (BORROWER)
Or name the intermediate result, the apply
the operation
r1
:= B
c_name (BORROWER)
r2 := B c_name (DEPOSITOR)
r1 difference r2
Queries
1) Find the loans from Perryridge where the amount of
the loan > 1200
F branch_name="Perryridge AND amount >
1200 (LOAN)
2) Find those customers who live in Harrison.
B c_name (F c_city = "Harrison" (CUSTOMER))
3) Are there any branches that have accounts but not
loans? Which one(s)? Loans but not account?
4) Find all
customers of the bank who have an either
an account or a loan or both.
Answers to queries
3) B branch_name
(ACCOUNT) difference
B branch_name
(LOAN)
4) r1 ฿ B c_name
(BORROWER)
r2 ฿ B c_name
(DEPOSITOR)
r1 c r2
Renaming
You can rename attributes as well as relations
Renaming relations
New_name ฿ (relations algebra expression)
Renaming attributes
temp ฿ Fbalance > 500(Loan)
High (account, bal)฿ Baccount-number, balance(temp)
Cartesian-Product X
Cross product of two relations (binary operator)
The relations do not have to be union-compatible
In the new relation
the number of
attributes is the sum of # of attributes in the two relations.
the number of
tuples is the product of the number of tuples in the two relations.
Each tuple
in the first is combined with every tuple in the second
Example:
borrower X loan
Attributes
of resultant relation
5 attributes
since the same attribute name appears in both they
must be qualified
borrower.loan_number and loan.loan_number
Tuples of
resultant relation
56 tuples
a tuple is constructed out of each possible pairs of
tuples; one from each relation
Cartesian Product uses
The X operation gives a lot of useless tuples, but it can
be used to put together relations that are not union compatible
Almost always select is then used to
eliminate the unwanted tuples
Since select is used almost universally with X, a special operation was created to specify this
sequence as a single operation
This is the join operation
Non Fundamental Operators
The previous fundamental operators of the relational
algebra are sufficient to express any relations-algebra query.
But if we define a few more operations, some of the
queries become simpler.
So the following operators do not give any more power
to the algebra, just make life easier.
Join ฎ
Forms a Cartesian product of two relations,
then performs a selection forcing =, < or >
on those attributes that appear in both relations, and then removes all
duplicates
Form: R1 ฎ condition R2
Example:
BORROWER ฎ B.loan-num=L.loan-num LOAN
Resultant relation of ฎ
5 attributes, two from Borrower, three from Loan
Note that the number of attributes is the same as X
Only those tuples where the condition was true are
included
Equijoin
If the join
condition is equality (the most common), the join is called an equijoin
In an equijoin, one or more pairs of attributes
have identical values
A natural join is an equijoin where one
of the identical columns is eliminated.
Natural Join
*
The standard
definition of a natural join requires the two join attributes have the same
name Borrower * Loan
If they dont have the same names one can be renamed
A nonstandard definition allows you to specify the
join attributes from each relation
The names do not
have to be the same
Employee * (DNO,
DNUMBER) Department
Query: Find the names and amount of all customers
who have loans at the bank.
B c_name, amount (BORROWER * LOAN)
Query: Find the names of all branches with
customers living in Harrison who have an account.
Find all customers living in Harrison
R1฿ BC_name( F c_city="Harrison" (CUSTOMER))
Find customers in Harrison with an account
R2 ฿ ( R1 * DEPOSITOR)
Now find which branch those accounts are in
.B branch_name
(R2 * ACCOUNT ))
You can do all this in one statement, just make sure
you get the order correct
Set Intersection 1
Finds tuples that are in both relation1 and relation2
Form: relation11 relation2
The relations must be union compatible
Query: Find all customers who have both a
loan and an account.
.B c_name
(BORROWER) 1 Bc_name(DEPOSITOR)
Query: Are there any customers who have neither
loans nor accounts? Give their names.
R1 ฿ B c_name
(BORROWER) c B c_name (DEPOSITOR)
B c_name (CUSTOMER)
R1
Division
Suited to queries that include the phrase "for
all"
Query:Find all customers who have an account at all
the branches in Brooklyn
First, get two intermediate relations:
1)Find all branches in Brooklyn
BinB฿B branch_name
(Fbranch_city = "Brooklyn"
(BRANCH))
2) Find all customer's
branches
.CB ฿ B
c_name, branch_name (DEPOSITOR*ACCOUNT)
3)
divide CB ๗ BinB
The division operation
It is applied to two relations ( R and Sub) where one
is a subset of the other (Sub a subset of R)
R(X, Y) is divided by Sub(X);
The result relation will have attribute(s) Y
For a tuple to appear in the result of the division
the values must appear in R in combination with every tuple in Sub
Retrieve the names of employees who work on
all the projects that John Smith works on
We have to have two relations here
Employees and the
projects they work on
The projects that
John Smith works on
Employees and the projects they work on
From the WORKS_ON
table, project out ESSN and PNO
The projects that John Smith works on
Join the EMPLOYEE
and WORKS_ON tables, select name= JohnSmith, then project out PNO
Divide the first by the second, then join the results
back with the EMPLOYEE table to get the names
Additional Relational Operations
Relational algebra cannot perform some operations that
are needed by a DBMS.
Extra operations have been defined that extend the
power of the relational algebra
Aggregate
functions
Sum
Average
Maximum
Minimum
Count
Grouping
Aggregate Functions
A mathematical functions that takes a collection of
values and returns a single value as a result
Form function
attribute (RELATION)
Query: Find the sum of salaries of all the part
time employees;
sum
salary (PT-WORKS)
Query: find
the average salary.
avg
salary (PT-WORKS)
the result of both these queries will be a relation
with one attribute and one tuple
Aggregate Functions (cont.)
Naming the attributes in the resultant relation
either rename, or
concatenation of the function name with the attribute
name
if we want to eliminate duplicates the word distinct
can be added
Query: Find the number of branches
appearing in the pt-works relation
count distinct branch_name (PT-WORKS)
Grouping Functions
Another type of request involves.
1) grouping the tuples by an attribute and then
2) applying an aggregate function independently to
each group of tuples.
Form:
<grouping attribute> group <function-attribute-pair
list> Relation
The resulting relation has the grouping attribute plus
one attribute for each element in the function list
Query:
Find the total salary of all part-time employees at each branch
individually.
b_name group sum salary (pt-works)
this means group by b_name and sum salary by those
groups
Query:
For each branch find the sum of the salaries and the maximum salary for
part-time employees.
Result (b_name,
sum, max) ฿
( b_name group sum salary, max salary (PT-WORKS)
Queries
1.What is the total amount of outstanding
loans at each branch?
2. Give the branch names of those branches
that have assets over 1,000,000
3. List the customer names who have accounts
at Brighton
4. In what branches does Johnson have an
account?
Working with the Company database
Query: Retrieve the social security numbers of all
employees who either work in department 5 or directly supervise an employee who
works in department 5
.dept5_emps <-- F dno=5 (employee)
result1 <--
B ssn (dept5_emps)
result2 (ssn) <-- B superssn(dep5_emps)
Result <--
result1 union result2
More queries
Q7.
List the names of managers who have at least one dependent
strategy:
intersect manages SSN with the SSN of employees with dependents
Q6.
Retrieve the names of employees who have no dependents
strategy: subtract the SSN of employees with dependents
from all employees SSN
Q5.
List the names of all employees with two or more dependents
strategy:
use an aggregate function to count the dependent of each employee
Q4.
Make a list of project numbers for projects that involve an employee whose last
name is Smith, either as a worker or as a manager of the department that
controls the project
strategy:
union the projects that Smith works on with those projects for which he
is a manager of the controlling department
Q3.
Find the names of employees who work on all
the projects controlled by department number 5
strategy:
divide the ESSN and PNO by all the projects controlled by department 5