B-Trees

• There are a lot of variations on standard B-Trees.

• A B-Tree of order m is an m-ary search tree with these properties:

1) The root is either a leaf or has between two and m children

2) The nonleaf nodes store up to m-1 keys

3) All nonleaf nodes (except the root) have between m/2 and m children (the pointers must be at least half full)

4) The leaves contain the key and a pointer to the data item

5) All leaves are the same depth and have between L/2 and L children for some L (the leaves must be at least half full)

6) The data records are stored at the leaves

• A binary search may be used instead of a sequential search when the correct node is found

Example of B-Tree node calculation

• How to find the number of pointers & keys in an interior node

• m (the number of pointers) is determined by the size of a disk access block and the size of a key into the database.

– With every disk access, we want to go one deeper into the tree

• Suppose our record size is 256 bytes, with the key itself 32 bytes. A pointer is 4 bytes.

• Suppose also the size of a disk access block is 8,192 bytes

• Assuming that each key-pointer pair is 36 bytes we can fit 8192/36 key-pointer pairs into one disk block (plus we need one extra pointer)

• So we choose m = 228; 227 keys in each node.

• For leaves, 8192/256 = 32, so each leaf contains 32 records.

How many disk accesses

• If our database has 10,000,000 records, how deep is our tree at max?

– First, how many leaves?

• If every leaf contained 32 records, there would be 312,500 leaves

• But each leaf may be only half full, so it is 625,000 possible leaves

– Each internal node (except the root) branches at least 114 ways (half of the number of pointers in each node)

• (114)² = 12,996; (114)³ = 1,481,544

• So, in the worst case, leaves would be on level 4

– Usually, the root and level 1 can be cached in RAM, so it would take two disk accesses to find an item in a DB of 10,000,000

Variations of B-Trees

• In a B-Tree every key appears once (along with the tree pointers)

• In a B+Tree all the keys and the block pointers are stored only at the leaf nodes

– Every value appears as a leaf node

– So a value may appear both as a leaf node and as an internal node

– Block pointers are kept only at leaf nodes

• The leaf nodes are usually linked together to make sequential access faster

Stability in B-Trees and B+ Trees

• It has been shown by analysis and simulation, that after numerous insertions and deletions, the nodes are about 69% full when the number of values in the tree stabilizes