Data Storage

Chapter 13

 

Storing the Data for Databases

•         Databases are stored physically as files of records

•         Today we will look at some of the aspects of physical storage.  This includes

–        The fact that it is much slower than electronic storage

–        A little about how it works

–        Some ways to speed it up

 

Memory hierarchy

•      Primary storage is the fastest and most expensive

–    It includes cache memory(static RAM) and main memory(DRAM)

•      Flash memory is EEPROM (electronically erasable programmable ROM)

–    Advantages is the fast access speed

–    Disadvantage is that an entire block must be erased and written over at a time

•      Secondary storage is MUCH slower and less expensive.  

 

Types of secondary storage

•      The main types of secondary storage include (in order of decreasing speed)

–    magnetic disks,

–    optical storage and

–    magnetic tapes

•      Your book gives a great description of disks and disk drives and tapes and.

–    Read it if you have not covered this material earlier

•      One of the relative newcomers to the field is RAID technology

 

Disks

•      Disks are organized into tracks, then sectors, then blocks(or pages)

•    a block has a specific hardware address

•    this is its surface number, track number, and block number

•      Transfer from disk  to RAM is in BLOCK units

•      An entire block is transferred into a buffer in RAM with a READ command

•      A WRITE command copies the the contents of the buffer to a disk block

•      Sometimes several contiguous blocks, called a cluster, are transferred as a unit

 

Time to transfer a disk block

•      Seek time -- find the correct track

•      Rotational delay-- find the correct sector

–    together these two take about 15-60 milliseconds

•      Block transfer time --

–    Takes about 1-2 milliseconds for each block

–    Note that this is much less than the seek& rotational delay

•      So, it takes roughly 10-60 milliseconds (.01 to .06 seconds) to read or write a block on disk

–    In that time, a typical machine can execute several million instructions

•      The time to retrieve data from secondary storage is very high compared to CPU speeds

•      Result:  I/O is a major bottleneck

 

Secondary storage speed

•      There is a great need  to improve the performance of disk access

•      The performance of RAM has increased MUCH faster than disk accesses

–    The capacities and speed of RAM double or quadruple every few years

–    Speed of disk storage improves at less than 10% per year

–    Transfer rates improve at ~ 20% per year

–    Disk capacity improve at about 50% per year

 

Data striping

•      One way to improve the speed of disk access is data striping

•      Data striping distributes data transparently over multiple disks to make them appear as a single large, fast disk.

•      It improves overall I/O because it allows multiple I/Os in parallel

–    This improves overall transfer rates since the data is being transferred from many disks

•      Data striping also helps balance the load among disks.

•      In addition,  the data is often stored redundantly, with error checking, improving reliability.

•      This architecture (called RAID) can improve both speed and reliability.

 

Redundant Arrays of Independent Disks    RAID technology

•      The main goal of RAID is to even out the widely different rates of electronic storage and physical storage

•      It does this by using smaller parallel disks acting as a single logical disk

•      The data is striped on the smaller disks, so it can be read in parallel, thus speeding it up

 

Improving reliability with RAID

•      For an array of n disks, the likelihood of failure is n times as much as for that of one disk

–    If the Mean Time To Failure (MTTF) of a disk drive is 200,000 hours (about 22.8 years) then that of a bank of 100 disks becomes only 2000 hours (83.3 days)

–    So the system must be prepared for failure.

•      An obvious solution is to keep the data in more than one place

–    There are many disadvantages to this which must be overcome

•      One technique for introducing redundancy is called mirroring.

 

Data mirroring

•       Data is written redundantly to two disks that are treated as one logical disk

•      If a disk fails, the other disk is used until the first is repaired

•      Disk mirroring doubles the rate at which read requests are handled, since a read can go to either disk

•      Using mirror disks does not speed up writing

–     but neither does it slow it down, when compared with a single disk, if the writing takes place in parallel.

 

Improving Performance with RAID

•      Data striping gives a higher transfer rate, since data is transferred in parallel

•      Data striping may be applied at different levels of granularity

•      Fine grained disk arrays use small data blocks so that all requests are serviced using all the disks at the same time

–    The idea is to maximize the data transfer rate

•      Course grain disk array use large data blocks so that small requests can be serviced in parallel since they will access only a few disks

–    Large requests can still benefit from high transfer rates by using may disks

 

RAID architecture – level 0 to level 6

•      Level 0 has no redundancy (which gets the highest write performance) while  level 6which keeps two redundant disks

•      The granularity of the striping varies from bit-level striping to block-level striping

•      Designers of RAID setup have to confront many design decisions

–    The level of RAID

–    The number of disks

–    The choice of parity schemes

–    The grouping of disks for striping

•      Reference on data striping and RAID

 

Buffering of Blocks

•      When several blocks are transferred from disk to RAM and all the block addresses are known, a buffer is needed for each block

•      The CPU can be reading one buffer while another is being transferred

–   This is possible because usually there is a separate I/O processor

•      Double buffering eliminates the seek time and rotational delay for all but the first block

 

Putting file records on disk

•      Data is usually stored in the form of records (we called them tuples)

–    The record is made of fields of related data

–    Each field has a specified data type, which may be of fixed size, or variable

–    Most DBMS systems also allow images, digitized video ro audio streams or free text

–    These are referred to as BLOBs (Binary Large Objects)

•      A file is a sequence of records

•      The records may be either fixed length of variable length

–    Fixed length records make locating a record relatively easy

•    The system must know the address of the block of the first record, and since it knows the length of each record, it can find the start of all the records

 

Variable length records

•      Why records may be of variable length

–    a field itself can be of variable length

–    a field may be optional

–    there may be some repeating groups

–    a block may contain records from different files

•      How the file system finds each variable length record

–    The system must know the address (track, sector, block) of the first record

–    Separator characters must be used in the record in several ways

•    Can be used to terminate variable length fields, or

•    If some fields are optional, can be used to separate pairs of <attribute, value> pairs, then

•    Can be used to indicate the end of the record

 

Record blocking

•      Since the block is the unit of data transfer, record addresses are block addresses

•      If the block is larger than a record

–    multiple records per block

–    blocking factor is the # of records/block

–    if records do not completely fill the block, space is wasted

–    This space can be used by a partial record, which spans two blocks

Spanned vs. Unspanned Records

•      Records that span blocks

–    fractions of a record can be stored on a block to use unfilled space

–    then, a pointer at the end of the block points to the next block

•      If a record is larger than a block, it must be spanned

 

Allocating files blocks on disk

•      If a file is larger than one disk block, several blocks must be allocated to store the file.  This can be done several ways:

–    Contiguous allocation

•    reading the whole file is very fast, using double buffering

•    expanding the file is difficult

–    Linked allocation

•    easy to expand the file

•    slower  to read the whole file

•      Allocating clusters of consecutive blocks

–    the clusters are linked together

–    clusters are sometimes called segments or extents.

•      Indexed allocation

–    index blocks contain pointers to the actual file blocks

 

File Headers(descriptors)

•      The file header contains information needed by the programs that access the file records.

•      It includes:

–    Information to determine the disk address of the file blocks

•    This includes the track and sector

–    Record format descriptions

•    field length & order of fields for unspanned fixed-length records

•    May include other information, such as a table for external hashing

 

To find a record on disk

•      One or more blocks are copied into RAM buffers

•      Using the information in the file header a search is done for the desired record within the buffer

•      If the address of the block that contains the desired records is not known a linear search through the file blocks must be done

–    This can be very time consuming!

•      So, a good file organization makes it easier to find the address of the block desired

•      The goal is to have as few block transfers as possible

 

‘Good’ file organizations

•      What is meant by ‘a good file organization’ depends on how the file will be used.

•      Usually we expect that some search conditions will be used more often than others.

•      A good file organization should perform as efficiently as possible the operations we expect to apply frequently to a file

•      Example

–    The EMPLOYEE file may be search frequently by SSN to modify records, so an organization that makes finding records by SSN is important

–    But it may also be used frequently by payroll who must access the employees by department

 

Some possible file organizations

•      Unordered records (heap files) 

•      Sorted files

•      Hashed files

•      When deciding on a file organization, need to consider:

–    How fast will retrievals be (read only)

–    How fast will updates be (read/write)

 

Unordered or heap files

•      Sometimes called a sequential or pile file

–    It is not the heap that is used for priority queues

–    Indexes can be used with this organization

–    Inserting records is very efficient

•    Just retrieve the last block in the file, add the record, then write it back to disk

–    Searching requires a linear search

•    Must search through the file, transferring block after block into RAM

–    Deleting is also expensive since it must find the record

•    First the block where the record is kept must be found and copied into RAM, then the record deleted from the buffer, then rewrite the block back to disk

 

Ordered files (sorted)

•      Must be ordered on one field in the record (often a key field)

•      Retrievals are often faster than unordered

–    Especially if need to read the records in order

–    It may be possible to use a binary search (but this doesn’t work nearly as well as searching an ordered array)

•      Access to any but the ordering field requires a sequential search of all the blocks until found

•      Inserting records is expensive, since the file ordering must be maintained

–    Sometimes an overflow file is used; periodically it is sorted and merged with the original file

•      Ordered files are rarely used without a primary index.

 

Hashing

•      Can result in retrievals in constant time

•      The idea is to have a function (the hash function) that takes a key field and turns it into the address where the record is stored

•      Then a search for the record within the block is done

•      For most records, only one block access is needed

•      Types of hashing

–    Internal hashing

–    External hashing

–    Hashing with dynamic file expansion

 

Hash functions

•      The function should give us key in ΰ index out

•      We need to be sure that the the index is within the range of our array

•      The mod function will do this for us

•      Suppose the key field is SS#

–    If we want 99 records in our array, we could divide by 99 and use the remainder.

•      Suppose the key field is a name, or other string

–    We could add together the ASCII codes of the letters then divide by 99 and use the remainder

 

Internal hashing

•      Used when the database can be fit into memory

•      The information is kept in an array of records

•      Each record has a unique key field (name, number, whatever)

•      A hash function relates the key to the index

–    the hash function maps to the key

•      A collision occurs when two keys hash to the same index

–    Open addressing —check the array for an open position, and put the record there

–    Chaining —the record has a pointer field; when a collision occurs, it points to the space where the record can be found

–    Double hashing – apply a second hash function if the first results in a collision

 

External hashing

•      External hashing is used when the database is kept on disk

•      The address space is made of buckets, each of which holds multiple records

•      A bucket may be one disk block, or a cluster of contiguous blocks

•      The hash function maps a key into a bucket number

•      Then a table in the file header converts the bucket number into the corresponding disk block address.

•      Since each bucket can hold multiple records, the collision problem is less severe than internal hashing

–    When a bucket becomes full it can point to another bucket where records that hash to the bucket are kept

•      Searching for a value in a field that is not the hashed field requires a sequential search

–    This means all the blocks have to be transferred to RAM, the sequentially searched