Multi-Way Search Trees

Another way to balance search trees

 

What are Multi-way search trees?

      Multi-way search trees generalize binary search trees into m-ary search trees

      They allow more than one key to be stored in a  node

      There will always be one more child pointer than key

    So if a node has two keys, it would have three pointers, with possibility three children

      Increasing the number of children decreases the height of a tree, given the same number of nodes

      The keys in a node are maintained in order

      Look at a  2-3 tree on the overhead

    2 keys, 3 child pointers max for each node

 

The ordering principal in multiway search trees

      The ordering principal of binary search trees still holds

   Anything to the left is smaller than its parent key, to the right is larger than its parent key

   If more than one key in the node, the pointer between two keys will point to values between the two keys

   If more than one key in a node, the keys will be in sequential order

 

Definitions of multiway search trees

      There are several variations of the way these trees are implemented

      We will define them this way:

   Every node has at most m children (m-ary tree)

   It has at most m-1 keys

   k1< k2 <…< km-1

   All the external nodes have the same depth

 

Searching a multiway search tree

   Searching is much like a binary tree search, except you may have more than one key in a node

   If the item you are searching for is less than the leftmost key, go left

   If it is greater than the leftmost key, consider the next key; if it is between the two keys, follow the pointer to the right of the leftmost key

   If it is greater than the second key, consider the next key, following the pointer to the left of the first key it is less than

 

Inserting items into a multiway search tree

      First, find the position to insert the new item must be found

      Items are always inserted into leaves

      Insert values into a node until there are more than than m-1 keys, keeping values sorted,

      when number of values > m-1, split the node

   The middle value goes to a node above

   This may cause the node above to split, or it may make creation of a new level necessary

      Build this  2-3 three
      20  40  60  10  80  100  5

 

 

Another  concrete example

      Assume we are building a 4-ary search tree

    each node can have a maximum of four children

    This mean the maximum number of keys in a node is three.

      Our nodes will have a maximum of four pointers and three keys with these properties k1 < k2 < k3

      Insert these keys into the tree:
4, 6, 14, 15, 3, 5, 9, 8, 11, 21, 34, 30 10

      If a node must split, the middle item before the insertion is the one to go up

 

Build this 4-ary tree

      Inserting values into a 4-ary tree
Insert these values
 53, 27, 75, 25, 70, 41, 38, 16, 59, 36, 73, 65, 60, 46, 55

Deletions in a multi-way search tree

      The  resultant tree must observe the rules

    All the leaves are on the same level

    Each node must have between 1 and m-1 keys

    It must remain a search tree

      If a deletion removes all the keys from a node, sibling nodes must be merged

    This means a key must be moved from a parent

      There are several different ways of doing deletions.

    In some implementations, adoption from a sibling is allowed

      A deletion may even force a height reduction

    This is avoided if possible, since an insertion may again force a height increase

 

Analysis of multi-way search trees

      The advantage is their easy-to-maintain balance

      Not their shorter height

      The reduction in height is offset by the increased number of comparisons that the search may require at each node

 

B-Trees and databases

      The reason for using a database is because a tremendous amount of data must be stored and manipulated.

    Much more than can be kept in RAM at one time

      To effectively manipulate a database, we must have a system that minimizes disk accesses.

      A B-Tree is a multi-way search tree where a node represents a disk block.

    If done this way, every disk access takes us one level deeper into the tree

    The node itself may contain hundreds of keys, but the search time is negligible compare to disk access time

    The keys in the node are kept sorted, so a binary search can be used

 

Time complexity and disk access

      When we talk about time complexity, we are talking about how fast an algorithm executes in RAM.

      If all the data does not fit in RAM at one time, the disk must be accessed.

      A fast disk access allows about 120 disk accesses in one second.

      If a machine executes 25 million instructions per second, one disk access is worth about 200,000 instructions.

      So we are willing to do lots of calculation just to save a disk access

    In most cases the number of disk accesses will dominate the running time

 

B-Trees

      There are a lot of variations on standard B-Trees.

      A B-Tree of order m is an m-ary search tree with these properties:

1) The data items are stored at the leaves

2)The nonleaf nodes store up to m-1 keys

3) The root is either a leaf or has between two and m children 

4) All nonleaf nodes (except the root) have between m/2 and m children (the pointers must be at least half full)

5) All leaves are the same depth and have between L/2 and L children for some L (the leaves must be at least half full)

      In practice, a node may contain a thousand or more keys

      A binary search may be used instead of a sequential search when the correct node is found

 

Example of B-Tree node calculation

    Find the number of pointers & keys in an interior node

      m (the number of pointers) is determined by the size of a disk access block and the size of a key into the database.

      Suppose our record size is  256 bytes, with the key itself 32 bytes. A pointer is 4 bytes.

      Suppose also the size of a disk access block is 8,192 b

      Assuming that each key-pointer pair is 36 bytes we can fit 8192/36 key-pointer pairs into one disk block (plus we need one extra pointer)

      So we choose m = 228; 227 keys in each node.

      For leaves, 8192/256 = 32, so each leaf contains 32 records.

 

How many disk accesses

      If our database has 10,000,000 records, how deep is our tree at max?

    First, how many leaves?

    If every leaf contained 32 records, it would be 312,500

    But each leaf may be only half full, so it is 625,000 possible leaves

    Each internal node (except the root) branches at least 114 ways

    (114)2 = 12,996;  (114)3 = 1,481,544

      So, in the worst case, leaves would be on level 4

    Usually, the root and level 1 can be cached in RAM, so it would take two disk accesses to find an item in a DB of 10,000,000