Query
Optimization
Chapter
15
From
SQL to machine language
In a high level language like SQL, a query must be
scanned
identifies the tokens
parsed checks the query syntax
validated checks the table and attribute names are valid
Then an internal representation of the query is
created
The query must be translated into relational
algebra
A query tree
is built with the relational operators
The DBMS then must devise an execution strategy for
retrieving the result of the query
To do this it
optimizes the query tree
Steps
in processing a SQL query
Scanning,
parsing and validating
Query
optimizer module
Translate the SQL
into relational algebra
Build a query
tree, then optimize it
Produce an
execution plan from a query tree
Code generator
Generate the code
to execute the plan
Use algorithms
available in the DBMS for each relational algebra operator
Runtime
database processor
Run the query
code to produce the result
May
be either compiled or interpreted
The query optimizer
Disclaimer:
it cannot really produce the optimal strategy, just a reasonably
efficient strategy
First, translate SQL into relational algebra
Nested queries
are decomposed into query blocks
Then
the select, from ,
where, group by and having
clauses are translated into their relational algebra equivalent
A query tree is then built
Then heuristic rules are applied to optimize the
tree
After you have an efficient tree, algorithms to
implement the relational algebra operations are applied
Example
of an algorithm to implement a relational operator
There must be algorithms to implement every relational
algebra operation
Methods for implementing the JOIN operation
Nested-loop join (brute force)
Retrieve every record from both relations
Test whether the two records satisfy the join condition
Single-loop join (one relation has an index on the
attribute in the join condition)
Retrieve each record in the unindexed
relation
Use the index to retrieve only those records in the
second relation that match
Sort-merge join (both relations can be searched for
the join attribute)
If
the files are not sorted on the join attribute, they are sorted
Both
relations are scanned concurrently in order of the join attribute
If
the join attributes match, the records are joined
Hash-join ( the records of
the two relations are both hashed to the same hash file using the same hashing
function on the join attributes)
Heuristic
rules
One of the main rules is to apply SELECT and PROJECT
operations before applying JOIN.
SELECT and PROJECT operations reduce the size of a
file
JOIN usually multiplies the size of the file
The further down the tree an operations appears, the
earlier it is applied
Query trees
A query tree is
a tree that corresponds to a relational algebra expression
The input relations are the leaf nodes
of the tree
The relational
algebra operations are the internal nodes
The query parser will generate a standard initial
query tree (without optimization)
How the initial tree is built:
Apply Cartesian
cross for relations in the FROM clause
Then the
conditions (join & selections) from WHERE clause are added
Then projections
from the SELECT clause
Example
query
Find the last names of employees born after 1957 who
work on a project names Aquarius.
SQL query
SELECT lname
FROM employee, works_on, project
WHERE pname = Aquarius
AND pnumber
= pno
AND essn
= ssn
AND bdate
> 1957-12-31;
Optimization
of Query trees
Making the
initial tree more efficient
Many different relational algebra expressions
correspond to the same query
The trick is to take the initial tree and convert it
into an equivalent, more efficient tree
There are rules to help do this
Basically, we want to move select and project as
far down the tree as possible and still maintain equivalence
This is because they limit the size of the relation
The
evolving query tree
1) First the initial query tree with Cartesian crosses
2) Then move the select operations down the tree
3) Then apply the more restrictive operations first
pname
is a key attribute, so will retrieve only one tuple
4) Replace Cartesian product and select with join
operations
5) Move project operations down the query tree
Transformation
rules for relational algebra operations
See overhead or page 519 of text
Practice
optimizing a query
Draw Draw the initial and
optimized query trees for the following queries
Select fname, lname address
from employee, department
where dname = research and dnumber
= dno;
Select E.lname S.lname
from employee as E, employee as S
Where E.superssn = S.ssn
and S.sex =
F;
Converting
the query tree into a query execution plan
The execution plan must include information about the access
methods that exist for each relation
Each relation is stored
ordered on one field; there may be other indexes into the relation
It must also know the algorithms that exist for the relational
operators represented in the tree
There must be a
different algorithm for each access into the relation
With this information, it can choose the algorithm to
implement the operation with the given data
Query
code generator
The query tree gives the order of execution of
each relational algebra operation; the execution plan involves choosing the
correct algorithm to implement the operation
Temporary results are computed, then used as input for
the next operation in the query tree
The RDBMS must include algorithms to implement
the different types of relational operations
There are often several algorithms to implement the
same operation, since one may apply only to a particular storage structure and
access path
Typical
algorithms
Any query that specifies order by must use an external sort algorithm
Implementing the select operation
These can include
linear search, binary search, using a primary key or hash table, using a
clustering index or a secondary key (usually a B+ tree)
Implementing a join operation
These can include
nested-loop join, single-loop join, sort-merge join, hash-join
Implementing project operations
This just involves eliminating some of the columns in
a table