Practical
Database Design
and
Tuning
Chapter 16
Brief chapter contents
Factors
influencing physical design
Analyze the
attributes used by the queries
Analyze the
expected frequency of queries
Analyze the time
constraints and update expectations
Physical
design decisions
Decisions about
indexing
Using lower
normal forms to speed up queries
Tuning the
system after it is operational
Tuning the
indexes
Tuning the DB
design
Tuning queries
Factors influencing physical design
The goal here is to guarantee good performance
For a given conceptual schema, most DBMS offer many
physical design alternatives
It is not possible to make good physical design
decisions until we know
the queries,
the transactions,
the applications,
that are expected to run on the DB
Analyzing the Queries
For each expected query, we should analyze the
following
The files that will be accessed by the query
Here, we assume that each relations is kept in one file
The attributes on which selection
conditions are specified.
The attributes on which join conditions are specified
The attributes whose values will be retrieved by the
query
Analyzing the frequency of queries
The frequency of the attributes used above will
help determine which attributes to index.
Generally, the 20-80 rule applies here
That is, 80% of the processing is accounted for by 20%
of the queries
This means that usually data on all the queries is not
really necessary; only the 20% or so of the most important one.
Analyze time constrains
Some queries must produce results in a specified
amount of time
For example, an airline query may have to produce
results with five seconds 95% of the time, and never take more than 20 seconds.
Knowing these constraints help decide on whether to
build access paths.
Analyze expected frequency of updates
Updates slow down the update operation
This means that a minimum number of access paths
should be specified on files that are updated frequently.
Analyze the uniqueness of attributes
Access paths should be specified on all candidate key
attributes
If there is an index, only the index needs to be
searched when looking for a specific record
Physical design decisions
Once all the above has been analyzed, decisions about
access paths can
be made.
The access path options include
Deciding on the primary file organization for each
file (relation)
Deciding which attributes should be indexed
There can be at most one primary index (or clustering
index)
There can be any number of secondary indexes
Design decisions about indexes
Some sort of access path is required for those
attributes:
whose values are required in selection operations
(conditions)
that are keys
that participate in join conditions (i.e
foreign keys)
The performance of queries largely depends on what
indexes exist for the attributes involved.
Categories for index decisions
Should an attribute be indexed?
If it is a key,
or used in a selection condition or a join of a frequent query, it should be
indexed
Other attributes
may be indexed because sometimes queries can be processed just by scanning the
index without retrieving data.
Should your index on multiple attributes?
If multiple
attributes are involved together in several queries, this may save time.
Should a clustering index be set up?
These are good if
many queries contain ranges
Should a hash table be used instead of a tree index
Most DBMS use
B+-trees
Hash index work
well with equality conditions
Denormalization as a design decision
When analyzing the queries, you may find that it
doesnt make sense to keep the DB in BCNF
When you make this decision, you must be aware that
data redundancy will exist
Data redundancy leads to update anomalies.
Usually, the application program will then have to
check for inconsistence states in the DB
Tuning the database
This is done after the DB has been running for a while
All the design decisions we talked about earlier still
apply here
In addition, the DBMS can internally collect
statistics to help with the decisions.
These include:
Sizes of
individual tables
Number of
distinct values in a column
The number of
time a particular query is executed in an interval of time
The time required
for different phases of query processing
Tuning indexes
Using the statistics above, you may want to rethink
the indexes
Certain queries may take too long for lack of an index
Some indexes may not get used much
Some indexes may be causing too much overhead because
it is on a attribute that is updated frequently