B-Trees
• There are a lot of variations on standard B-Trees.
• A B-Tree of order m is an m-ary search tree with these properties:
1) The root is either a leaf or has between
two and m children
2) The nonleaf
nodes store up to m-1 keys
3) All nonleaf
nodes (except the root) have between m/2 and m children
(the pointers must be at least half full)
4) The leaves contain the key and a pointer
to the data item
5) All leaves are the same depth and have
between L/2 and L children for
some L (the leaves must be at least half full)
6) The data records are stored at the leaves
• A binary search may be used instead of a sequential
search when the correct node is found
Example of B-Tree node calculation
• How to find the number of pointers & keys in an
interior node
•
m
(the number of pointers) is determined by the size of a disk access block
and the size of a key into the database.
– With every disk access, we want to go one deeper into
the tree
•
Suppose our
record size is 256 bytes, with the key
itself 32 bytes. A pointer is 4 bytes.
•
Suppose also the
size of a disk access block is 8,192 bytes
•
Assuming that
each key-pointer pair is 36 bytes we can fit 8192/36 key-pointer pairs into one disk block (plus we
need one extra pointer)
•
So we choose m = 228; 227 keys in each node.
•
For leaves, 8192/256 = 32, so each leaf contains 32 records.
How many disk accesses
• If our database has 10,000,000 records, how deep is
our tree at max?
–
First, how many
leaves?
• If
every leaf contained 32 records, there would be 312,500 leaves
•
But each leaf may be only half full, so it is 625,000
possible leaves
–
Each internal
node (except the root) branches at least 114 ways (half of the number of
pointers in each node)
• (114)2
= 12,996; (114)3
= 1,481,544
• So, in the worst case, leaves would be on level 4
–
Usually, the root
and level 1 can be cached in RAM, so it would take two disk accesses to find an
item in a DB of 10,000,000
Variations of B-Trees
• In a B-Tree every key appears once (along with the
tree pointers)
• In a B+Tree all the keys and the
block pointers are stored only at the leaf nodes
– Every value appears as a leaf node
– So a value may appear both as a leaf node and as an
internal node
– Block pointers are kept only at leaf nodes
• The leaf nodes are usually linked together to make sequential
access faster
Stability in B-Trees and B+ Trees
• It has been shown by analysis and simulation, that
after numerous insertions and deletions, the nodes are about 69% full when the
number of values in the tree stabilizes