Multi-Way Search Trees

Another way to balance search trees

What are Multi-way search trees?

• Multi-way search trees generalize binary search trees into m-ary search trees

• They allow more than one key to be stored in a node

• There will always be one more child pointer than key

– So if a node has two keys, it would have three pointers, with possibility three children

• Increasing the number of children decreases the height of a tree, given the same number of nodes

• The keys in a node are maintained in order

• Look at a 2-3 tree on the overhead

– 2 keys, 3 child pointers max for each node

The ordering principal in multiway search trees

• The ordering principal of binary search trees still holds

– Anything to the left is smaller than its parent key, to the right is larger than its parent key

– If more than one key in the node, the pointer between two keys will point to values between the two keys

– If more than one key in a node, the keys will be in sequential order

Definitions of multiway search trees

• There are several variations of the way these trees are implemented

• We will define them this way:

– Every node has at most m children (m-ary tree)

– It has at most m-1 keys

• k₁< k₂<…< k_m-1

– All the external nodes have the same depth

Searching a multiway search tree

– Searching is much like a binary tree search, except you may have more than one key in a node

– If the item you are searching for is less than the leftmost key, go left

– If it is greater than the leftmost key, consider the next key; if it is between the two keys, follow the pointer to the right of the leftmost key

– If it is greater than the second key, consider the next key, following the pointer to the left of the first key it is less than

Inserting items into a multiway search tree

• First, find the position to insert the new item must be found

• Items are always inserted into leaves

• Insert values into a node until there are more than than m-1 keys, keeping values sorted,

• when number of values > m-1, split the node

– The middle value goes to a node above

– This may cause the node above to split, or it may make creation of a new level necessary

• Build this 2-3 three
20 40 60 10 80 100 5

Another concrete example

• Assume we are building a 4-ary search tree

– each node can have a maximum of four children

– This mean the maximum number of keys in a node is three.

• Our nodes will have a maximum of four pointers and three keys with these properties k1 < k2 < k3

• Insert these keys into the tree:
4, 6, 14, 15, 3, 5, 9, 8, 11, 21, 34, 30 10

• If a node must split, the middle item before the insertion is the one to go up

Build this 4-ary tree

• Inserting values into a 4-ary tree
Insert these values
53, 27, 75, 25, 70, 41, 38, 16, 59, 36, 73, 65, 60, 46, 55

Deletions in a multi-way search tree

• The resultant tree must observe the rules

– All the leaves are on the same level

– Each node must have between 1 and m-1 keys

– It must remain a search tree

• If a deletion removes all the keys from a node, sibling nodes must be merged

– This means a key must be moved from a parent

• There are several different ways of doing deletions.

– In some implementations, adoption from a sibling is allowed

• A deletion may even force a height reduction

– This is avoided if possible, since an insertion may again force a height increase

Analysis of multi-way search trees

• The advantage is their easy-to-maintain balance

• Not their shorter height

• The reduction in height is offset by the increased number of comparisons that the search may require at each node

B-Trees and databases

• The reason for using a database is because a tremendous amount of data must be stored and manipulated.

– Much more than can be kept in RAM at one time

• To effectively manipulate a database, we must have a system that minimizes disk accesses.

• A B-Tree is a multi-way search tree where a node represents a disk block.

– If done this way, every disk access takes us one level deeper into the tree

– The node itself may contain hundreds of keys, but the search time is negligible compare to disk access time

– The keys in the node are kept sorted, so a binary search can be used

Time complexity and disk access

• When we talk about time complexity, we are talking about how fast an algorithm executes in RAM.

• If all the data does not fit in RAM at one time, the disk must be accessed.

• A fast disk access allows about 120 disk accesses in one second.

• If a machine executes 25 million instructions per second, one disk access is worth about 200,000 instructions.

• So we are willing to do lots of calculation just to save a disk access

– In most cases the number of disk accesses will dominate the running time

B-Trees

• There are a lot of variations on standard B-Trees.

• A B-Tree of order m is an m-ary search tree with these properties:

1) The data items are stored at the leaves

2)The nonleaf nodes store up to m-1 keys

3) The root is either a leaf or has between two and m children

4) All nonleaf nodes (except the root) have between m/2 and m children (the pointers must be at least half full)

5) All leaves are the same depth and have between L/2 and L children for some L (the leaves must be at least half full)

• In practice, a node may contain a thousand or more keys

• A binary search may be used instead of a sequential search when the correct node is found

Example of B-Tree node calculation

– Find the number of pointers & keys in an interior node

• m (the number of pointers) is determined by the size of a disk access block and the size of a key into the database.

• Suppose our record size is 256 bytes, with the key itself 32 bytes. A pointer is 4 bytes.

• Suppose also the size of a disk access block is 8,192 b

• Assuming that each key-pointer pair is 36 bytes we can fit 8192/36 key-pointer pairs into one disk block (plus we need one extra pointer)

• So we choose m = 228; 227 keys in each node.

• For leaves, 8192/256 = 32, so each leaf contains 32 records.

How many disk accesses

• If our database has 10,000,000 records, how deep is our tree at max?

– First, how many leaves?

• If every leaf contained 32 records, it would be 312,500

• But each leaf may be only half full, so it is 625,000 possible leaves

– Each internal node (except the root) branches at least 114 ways

• (114)² = 12,996; (114)³ = 1,481,544

• So, in the worst case, leaves would be on level 4

– Usually, the root and level 1 can be cached in RAM, so it would take two disk accesses to find an item in a DB of 10,000,000