Multi-Way Search
Trees
Another
way to balance search trees
What are Multi-way search trees?
• Multi-way search trees generalize binary search
trees into m-ary search trees
• They allow more than one key to be stored in a node
•
There will always
be one more child pointer than key
–
So if a node has
two keys, it would have three pointers, with possibility three children
• Increasing the number of children decreases the
height of a tree, given the same number of nodes
• The keys in a node are maintained in order
• Look at a 2-3 tree on the overhead
–
2 keys, 3 child
pointers max for each node
The ordering principal in multiway search trees
• The ordering principal of binary search trees
still holds
– Anything to the left is smaller than its parent key,
to the right is larger than its parent key
– If more than one key in the node, the pointer between
two keys will point to values between the two keys
– If more than one key in a node, the keys will be in
sequential order
Definitions of multiway
search trees
• There are several variations of the way these trees
are implemented
• We will define them this way:
– Every node has at most m children (m-ary tree)
– It has at most m-1 keys
•
k1< k2
<…< km-1
– All the external nodes have the same depth
Searching a multiway
search tree
– Searching is much like a binary tree search, except
you may have more than one key in a node
– If the item you are searching for is less than the
leftmost key, go left
– If it is greater than the leftmost key, consider the
next key; if it is between the two keys, follow the pointer to the right of the
leftmost key
– If it is greater than the second key, consider the
next key, following the pointer to the left of the first key it is less than
Inserting items into a multiway
search tree
• First, find the position to insert the new item must
be found
• Items are always inserted into leaves
• Insert values into a node until there are more than than m-1 keys, keeping values sorted,
• when number of values > m-1, split the node
– The middle value goes to a node above
– This may cause the node above to split, or it may make
creation of a new level necessary
• Build this 2-3 three
20
40 60 10
80 100 5
Another concrete example
• Assume we are building a 4-ary search tree
–
each node can have a maximum of four children
–
This mean the
maximum number of keys in a node is three.
• Our nodes will have a maximum of four pointers and
three keys with these properties k1 < k2 < k3
• Insert these keys into the tree:
4, 6, 14, 15, 3, 5, 9, 8, 11, 21, 34, 30 10
• If a node must split, the middle item before the
insertion is the one to go up
Build this 4-ary tree
• Inserting values into a 4-ary tree
Insert these values
53, 27, 75, 25, 70, 41, 38, 16, 59, 36,
73, 65, 60, 46, 55
Deletions in a multi-way search tree
• The
resultant tree must observe
the rules
–
All the leaves
are on the same level
–
Each node must
have between 1 and m-1 keys
–
It must remain a
search tree
• If a deletion removes all the keys from a node,
sibling nodes must be merged
–
This means a key
must be moved from a parent
• There are several different ways of doing deletions.
–
In some
implementations, adoption from a sibling is allowed
• A deletion may even force a height reduction
–
This is avoided
if possible, since an insertion may again force a height increase
Analysis of multi-way search trees
• The advantage is their easy-to-maintain balance
• Not their shorter height
• The reduction in height is offset by the increased
number of comparisons that the search may require at each node
B-Trees and databases
• The reason for using a database is because a
tremendous amount of data must be stored and manipulated.
–
Much more than
can be kept in RAM at one time
• To effectively manipulate a database, we must have a
system that minimizes disk accesses.
• A B-Tree is a multi-way search tree where a node
represents a disk block.
–
If done this way,
every disk access takes us one level deeper into the tree
–
The node itself
may contain hundreds of keys, but the search time is negligible compare to disk
access time
–
The keys in the
node are kept sorted, so a binary search can be used
Time complexity and disk access
• When we talk about time complexity, we are talking
about how fast an algorithm executes in RAM.
• If all the data does not fit in RAM at one time, the
disk must be accessed.
• A fast disk access allows about 120 disk accesses in one second.
• If a machine executes 25 million instructions per
second, one disk access is worth about 200,000 instructions.
• So we are willing to do lots of calculation just to
save a disk access
–
In most cases the
number of disk accesses will dominate the running time
B-Trees
• There are a lot of variations on standard B-Trees.
• A B-Tree of order m is an m-ary search tree with these properties:
1) The data items are stored at the leaves
2)The nonleaf nodes store up to m-1 keys
3) The root is either a leaf or has between
two and m children
4) All nonleaf
nodes (except the root) have between m/2 and m children
(the pointers must be at least half full)
5) All leaves are the same depth and have
between L/2 and L children for
some L (the leaves must be at least half full)
• In practice, a node may contain a thousand or more
keys
• A binary search may be used instead of a sequential
search when the correct node is found
Example of B-Tree node calculation
–
Find the number
of pointers & keys in an interior node
•
m
(the number of pointers) is determined by the size of a disk access block
and the size of a key into the database.
•
Suppose our
record size is 256 bytes, with the key
itself 32 bytes. A pointer is 4 bytes.
•
Suppose also the
size of a disk access block is 8,192 b
•
Assuming that
each key-pointer pair is 36 bytes we can fit 8192/36 key-pointer pairs into one disk block (plus we need
one extra pointer)
•
So we choose m = 228; 227 keys in each node.
•
For leaves, 8192/256 = 32, so each leaf contains 32 records.
How many disk accesses
• If our database has 10,000,000 records, how deep is
our tree at max?
–
First, how many
leaves?
• If
every leaf contained 32 records, it would be 312,500
• But
each leaf may be only half full, so it is 625,000 possible leaves
–
Each internal
node (except the root) branches at least 114 ways
• (114)2
= 12,996; (114)3
= 1,481,544
• So, in the worst case, leaves would be on level 4
–
Usually, the root
and level 1 can be cached in RAM, so it would take two disk accesses to find an
item in a DB of 10,000,000