B-Trees

      There are a lot of variations on standard B-Trees.

      A B-Tree of order m is an m-ary search tree with these properties:

1) The root is either a leaf or has between two and m children        

2) The nonleaf nodes store up to m-1 keys

3) All nonleaf nodes (except the root) have between m/2 and m children (the pointers must be at least half full)

4) The leaves contain the key and a pointer to the data item

5) All leaves are the same depth and have between L/2 and L children for some L (the leaves must be at least half full)

6) The data records are stored at the leaves

      A binary search may be used instead of a sequential search when the correct node is found

 

Example of B-Tree node calculation

      How to find the number of pointers & keys in an interior node

      m (the number of pointers) is determined by the size of a disk access block and the size of a key into the database.

    With every disk access, we want to go one deeper into the tree

      Suppose our record size is  256 bytes, with the key itself 32 bytes. A pointer is 4 bytes.

      Suppose also the size of a disk access block is 8,192 bytes

      Assuming that each key-pointer pair is 36 bytes we can fit 8192/36 key-pointer pairs into one disk block (plus we need one extra pointer)

      So we choose m = 228; 227 keys in each node.

      For leaves, 8192/256 = 32, so each leaf contains 32 records.

 

How many disk accesses

      If our database has 10,000,000 records, how deep is our tree at max?

    First, how many leaves?

    If every leaf contained 32 records, there would be 312,500 leaves

    But each leaf may be only half full, so it is 625,000 possible leaves

    Each internal node (except the root) branches at least 114 ways (half of the number of pointers in each node)

    (114)2 = 12,996;  (114)3 = 1,481,544

      So, in the worst case, leaves would be on level 4

    Usually, the root and level 1 can be cached in RAM, so it would take two disk accesses to find an item in a DB of 10,000,000

 

Variations of B-Trees

      In a B-Tree every key appears once (along with the tree pointers)

      In a B+Tree all the keys and the block pointers are stored only at the leaf nodes

   Every value appears as a leaf node

   So a value may appear both as a leaf node and as an internal node

   Block pointers are kept only at leaf nodes

      The leaf nodes are usually linked together to make  sequential access faster

 

Stability in B-Trees and B+ Trees

      It has been shown by analysis and simulation, that after numerous insertions and deletions, the nodes are about 69% full when the number of values in the tree stabilizes