Practical Database Design
and
Tuning

Chapter 16

Brief chapter contents

•      Factors influencing physical design

–    Analyze the attributes used by the queries

–    Analyze the expected frequency of queries

–    Analyze the time constraints and update expectations

•      Physical design decisions

–    Decisions about indexing

–    Using lower normal forms to speed up queries

•      Tuning the system after it is operational

–    Tuning the indexes

–    Tuning the DB design

–    Tuning queries

 

Factors influencing physical design

•      The goal here is to guarantee good performance

•      For a given conceptual schema, most DBMS offer many physical design alternatives

•      It is not possible to make good physical design decisions until we know
        the queries,
     the transactions,
     the applications,
that are expected to run on the DB

 

Analyzing the Queries

•      For each expected query, we should analyze the following

–   The files that will be accessed by the query

•   Here, we assume that each relations is kept in one file

–   The attributes on which selection conditions are specified.

–   The attributes on which join conditions are specified

–   The attributes whose values will be retrieved by the query

 

Analyzing the frequency of queries

•      The frequency of the attributes used above will help determine which attributes to index.

•      Generally, the “20-80” rule applies here

–   That is, 80% of the processing is accounted for by 20% of the queries

•      This means that usually data on all the queries is not really necessary; only the 20% or so of the most important one.

 

Analyze time constrains

•      Some queries must produce results in a specified amount of time

•      For example, an airline query may have to produce results with five seconds 95% of the time, and never take more than 20 seconds.

•      Knowing these constraints help decide on whether to build access paths.

 

Analyze expected frequency of updates

•      Updates slow down the update operation

•      This means that a minimum number of access paths should be specified on files that are updated frequently.

 

Analyze the uniqueness of attributes

•      Access paths should be specified on all candidate key attributes

•      If there is an index, only the index needs to be searched when looking for a specific record

 

Physical design decisions

•      Once all the above has been analyzed, decisions about access paths  can be made.

•      The access path options include

–   Deciding on the primary file organization for each file (relation)

–   Deciding which attributes should be indexed

•   There can be at most one primary index (or clustering index)

•   There can be any number of secondary indexes

 

Design decisions about indexes

•      Some sort of access path is required for those attributes:

–   whose values are required in selection operations (conditions)

–   that are keys

–   that participate in join conditions (i.e foreign keys)

•      The performance of queries largely depends on what indexes exist for the attributes involved.

 

Categories for index decisions

•      Should an attribute be indexed?

–    If it is a key, or used in a selection condition or a join of a frequent query, it should be indexed

–    Other attributes may be indexed because sometimes queries can be processed just by scanning the index without retrieving data.

•      Should your index on multiple attributes?

–    If multiple attributes are involved together in several queries, this may save time.

•      Should a clustering index be set up?

–    These are good if many queries contain ranges

•      Should a hash table be used instead of a tree index

–    Most DBMS use B+-trees

–    Hash index work well with equality conditions

 

Denormalization as a design decision

•      When analyzing the queries, you may find that it doesn’t make sense to keep the DB in BCNF

•      When you make this decision, you must be aware that data redundancy will exist

•      Data redundancy leads to update anomalies.

•      Usually, the application program will then have to check for inconsistence states in the DB

 

Tuning the database

•      This is done after the DB has been running for a while

•      All the design decisions we talked about earlier still apply here

•      In addition, the DBMS can internally collect statistics to help with the decisions.  These include:

–    Sizes of individual tables

–    Number of distinct values in a column

–    The number of time a particular query is executed in an interval of time

–    The time required for different phases of query processing

 

Tuning indexes

•      Using the statistics above, you may want to rethink the indexes

•      Certain queries may take too long for lack of an index

•      Some indexes may not get used much

•      Some indexes may be causing too much overhead because it is on a attribute that is updated frequently