B-Trees and B+ Trees

Disk-friendly balanced trees for indexing and storage.

1. Node structure and order

Understanding the fundamental organization of B-Trees:

Order (m): Maximum number of children a node can have
- Also called branching factor or degree
- Common values: 100-1000 for disk-based systems
Node properties:
- Each node contains up to m-1 keys
- Each node has up to m children
- Keys are stored in sorted order
- Internal nodes: keys + child pointers
- Leaf nodes: keys + data (or pointers to data)

B-Tree node structure:

class BTreeNode:
    def __init__(self, order, is_leaf=False):
        self.order = order              # Maximum children
        self.keys = []                  # Sorted keys
        self.children = []              # Child pointers
        self.is_leaf = is_leaf          # Leaf flag
        self.n = 0                      # Current number of keys

Node constraints:
- Root: 1 to m-1 keys, 2 to m children (if not leaf)
- Internal nodes: ⌈m/2⌉-1 to m-1 keys, ⌈m/2⌉ to m children
- Leaf nodes: ⌈m/2⌉-1 to m-1 keys
- All leaves at same depth (balanced)
Example: B-Tree of order 5 (m=5): ``` Node structure:
- Min keys (internal): ⌈5/2⌉-1 = 2
- Max keys: 4
- Min children (internal): ⌈5/2⌉ = 3
- Max children: 5
Example node: [10 | 20 | 30 | 40] / | | |
<10 10-20 20-30 30-40 >40 ```
Memory layout considerations:
- Node size typically matches disk block size (4KB, 8KB)
- Maximizes keys per node for given block size
- Example: 4KB block, 8-byte keys, 8-byte pointers
  - Available space: ~4000 bytes
  - Order m ≈ 250 (250 pointers + 249 keys)

2. Insertion, split, and deletion

Core operations for maintaining B-Tree properties:

Insertion:

Search for leaf: Traverse tree to find appropriate leaf node
Insert in leaf: Add key in sorted position
Check overflow: If node has m keys (too many), split
Split operation:
- Create new node
- Move upper half of keys to new node
- Promote middle key to parent
- If parent overflows, split recursively up to root

Insertion algorithm:

def insert(root, key):
    # If root is full, split it
    if root.n == root.order - 1:
        new_root = BTreeNode(root.order)
        new_root.children.append(root)
        split_child(new_root, 0)
        root = new_root
      
    insert_non_full(root, key)
    return root
  
def insert_non_full(node, key):
    if node.is_leaf:
        # Insert key in sorted position
        node.keys.insert(bisect_left(node.keys, key), key)
        node.n += 1
    else:
        # Find child to insert into
        i = bisect_left(node.keys, key)
        if node.children[i].n == node.order - 1:
            split_child(node, i)
            if key > node.keys[i]:
                i += 1
        insert_non_full(node.children[i], key)
  
def split_child(parent, index):
    full_child = parent.children[index]
    new_child = BTreeNode(full_child.order, full_child.is_leaf)
      
    mid = full_child.order // 2
    # Promote middle key to parent
    parent.keys.insert(index, full_child.keys[mid])
    parent.children.insert(index + 1, new_child)
      
    # Split keys and children
    new_child.keys = full_child.keys[mid+1:]
    full_child.keys = full_child.keys[:mid]
      
    if not full_child.is_leaf:
        new_child.children = full_child.children[mid+1:]
        full_child.children = full_child.children[:mid+1]

Deletion:

More complex than insertion, three cases:

Key in leaf: Simply remove
Key in internal node: Replace with predecessor/successor, then delete from leaf
Underflow handling: If node has fewer than ⌈m/2⌉-1 keys:
- Borrow from sibling: If sibling has extra keys
- Merge with sibling: If sibling is at minimum

Deletion cases:

Case 1: Delete from leaf (simple)
Before: [10 | 20 | 30]
Delete 20
After:  [10 | 30]
  
Case 2: Delete from internal (replace with predecessor)
Before:      [20]
            /    \
        [10]      [30]
Delete 20 → Replace with 10
After:       [10]
            /    \
        []        [30]
Then handle underflow in left child
  
Case 3: Merge siblings (underflow)
Before:      [30]
            /    \
        [10]      [40 | 50]
Delete 10 → Underflow in left
After:   [40 | 50]  (merged, 30 pulled down)

3. Height and complexity

Analyzing performance characteristics:

Height formula:
- For n keys and order m:
- Minimum height: log_m(n+1) (fully packed)
- Maximum height: log_⌈m/2⌉((n+1)/2) (minimum occupancy)
Height example:
- 1 million keys, order m=100
- Height ≤ log₅₀(1,000,000) ≈ 3.5 → 4 levels
- Only 4 disk accesses to find any key!
Time complexity: | Operation | Time Complexity | Disk I/Os | |———–|—————-|———–| | Search | O(log_m n) | O(log_m n) | | Insert | O(log_m n) | O(log_m n) | | Delete | O(log_m n) | O(log_m n) | | Range scan | O(log_m n + k) | O(log_m n + k/b) |

Where:
- n = number of keys
- m = order (branching factor)
- k = number of results
- b = keys per block
Space complexity: O(n)
- Minimum 50% space utilization (except root)
- Average 67-75% utilization in practice

Comparison with binary search trees:

Tree Type	Height for 1M keys	Disk I/Os
Binary tree (balanced)	log₂(1M) ≈ 20	20
B-Tree (m=100)	log₁₀₀(1M) ≈ 3	3
B-Tree (m=1000)	log₁₀₀₀(1M) ≈ 2	2

Why B-Trees are efficient:
- High branching factor → shallow tree
- Fewer disk accesses (dominant cost)
- Each node fits in one disk block
- Sequential access within nodes (cache-friendly)

4. Differences: B-Tree vs B+ Tree

Understanding the two main variants:

B-Tree:

Structure:
- Keys and data in all nodes (internal + leaf)
- Each key appears exactly once
- No linked list between leaves
Advantages:
- Better for exact-match queries (may find in internal node)
- Slightly less space (no duplicate keys)
Disadvantages:
- Range queries less efficient
- Variable-size records complicate node management

B+ Tree:

Structure:
- All data in leaf nodes only
- Internal nodes contain only keys (for routing)
- Keys may be duplicated (in internal + leaf)
- Leaves linked in sorted order (doubly linked list)
Advantages:
- Excellent for range queries (scan linked leaves)
- More keys per internal node (no data)
- Sequential access via leaf chain
- Consistent performance (always reach leaf)
Disadvantages:
- Duplicate keys use extra space
- Always traverse to leaf (even if key in internal node)

Visual comparison:

B-Tree (order 3):
         [20]
        /    \
    [10]      [30, 40]
    
All nodes contain data.

B+ Tree (order 3):
         [20, 30]
        /    |    \
    [10]  [20,25] [30,40]
     ↔      ↔       ↔
    
Only leaves contain data.
Leaves linked for range scans.

Detailed comparison table:

Feature	B-Tree	B+ Tree
Data location	All nodes	Leaf nodes only
Internal nodes	Keys + data	Keys only (routing)
Key duplication	No	Yes (internal + leaf)
Leaf linkage	No	Yes (linked list)
Keys per internal node	Fewer	More
Range queries	O(log n + k)	O(log n + k/b) better
Point queries	Faster (may stop early)	Always to leaf
Sequential access	Poor	Excellent
Use case	General purpose	Databases, filesystems

Why databases prefer B+ Trees:

Range queries: Common in SQL (SELECT * WHERE age BETWEEN 20 AND 30)
Sequential scans: Full table scans via leaf chain
Higher fanout: More keys per internal node → shorter tree
Predictable performance: Always same depth to leaf
Easier concurrency: Lock leaves independently

5. Range scans and storage locality

Optimizing for sequential access patterns:

Range query in B+ Tree:

Find start key: O(log_m n) - traverse to leaf
Scan leaves: Follow linked list until end key
Total cost: O(log_m n + k) where k = results

Example query: SELECT * FROM users WHERE age BETWEEN 25 AND 35

Step 1: Search for age=25 → reach leaf L1
Step 2: Scan L1 → L2 → L3 until age > 35
Step 3: Return all records found
  
Disk I/Os: 1 (root) + 1 (internal) + 1 (leaf) + k/b (scan)

Range query in B-Tree:

Must perform in-order traversal
Jump between internal and leaf nodes
Less efficient: O(log_m n + k log_m n)

Storage locality benefits:

Sequential disk access:
- Leaves stored contiguously on disk
- Prefetching: OS loads adjacent blocks
- Minimizes seek time (mechanical disks)
Cache-friendly:
- Scanning leaves keeps data in cache
- No random jumps between levels
Bulk loading:
- Build B+ Tree bottom-up
- Sort all keys, create leaves left-to-right
- Then build internal levels
- Result: 100% space utilization, optimal locality

Optimization techniques:

Leaf node chaining:

class BPlusTreeLeaf:
    def __init__(self):
        self.keys = []
        self.values = []
        self.next = None      # Next leaf
        self.prev = None      # Previous leaf

Sibling pointers in internal nodes:
- Faster splits and merges
- Better concurrency control

Bulk loading algorithm:

def bulk_load(sorted_keys):
    # Create leaf level
    leaves = []
    for i in range(0, len(sorted_keys), LEAF_SIZE):
        leaf = create_leaf(sorted_keys[i:i+LEAF_SIZE])
        leaves.append(leaf)
       
    # Link leaves
    for i in range(len(leaves)-1):
        leaves[i].next = leaves[i+1]
       
    # Build internal levels bottom-up
    return build_internal_levels(leaves)

Prefix compression:
- Store only distinguishing prefix in internal nodes
- Example: [“apple”, “application”] → [“app”, “appl”]
- Saves space, increases fanout

6. Database and filesystem applications

Real-world usage of B-Trees and B+ Trees:

Database indexes:

Primary index: B+ Tree on primary key
- Leaf nodes contain actual data rows (clustered index)
- Example: MySQL InnoDB primary key index
Secondary index: B+ Tree on non-primary key
- Leaf nodes contain pointers to primary key
- Example: Index on email column
Composite index: Multi-column B+ Tree
- Keys are tuples: (last_name, first_name)
- Supports queries on prefix: WHERE last_name = 'Smith'

Database systems using B+ Trees:

Database	Index Type	Details
MySQL InnoDB	B+ Tree	Clustered primary key, secondary indexes
PostgreSQL	B-Tree (actually B+ Tree)	Default index type
SQLite	B+ Tree	Both table and index storage
Oracle	B+ Tree	Index-organized tables
SQL Server	B+ Tree	Clustered and non-clustered indexes

Filesystem applications:

File allocation:
- B+ Tree maps file blocks to disk blocks
- Fast random access within files
Directory structure:
- B+ Tree for large directories
- Efficient filename lookups

Filesystem examples:

Filesystem	Usage
ext4	HTree (B-Tree variant) for directories
XFS	B+ Trees for free space, inodes
Btrfs	B-Trees for all metadata
NTFS	B+ Trees for file records (MFT)
HFS+	B-Trees for catalog file

Specific use cases:

MySQL InnoDB:

-- Clustered index (B+ Tree on primary key)
CREATE TABLE users (
    id INT PRIMARY KEY,      -- B+ Tree root
    name VARCHAR(100),
    email VARCHAR(100),
    INDEX idx_email (email)  -- Secondary B+ Tree
);
   
-- Range query (efficient)
SELECT * FROM users WHERE id BETWEEN 1000 AND 2000;
-- Uses B+ Tree leaf chain

PostgreSQL B-Tree index:

-- Create index
CREATE INDEX idx_users_age ON users(age);
   
-- Index scan
EXPLAIN SELECT * FROM users WHERE age > 25;
-- Index Scan using idx_users_age
-- Filter: (age > 25)

Filesystem directory lookup:

Directory: /usr/bin (10,000 files)
Lookup: find "python3"
   
Without B-Tree: O(n) linear scan
With B-Tree: O(log n) ≈ 4 disk accesses

Performance characteristics:

Insert performance:
- Random inserts: ~1000-10000 ops/sec
- Bulk inserts: ~100000+ ops/sec (bulk loading)
Query performance:
- Point query: 1-3 disk I/Os (typical)
- Range query: 1-3 + k/b I/Os (k results, b per block)
Space overhead:
- 50-75% space utilization
- Internal nodes: ~1-2% of total space
- Leaf nodes: ~98-99% of total space

Advanced features:

Write-ahead logging (WAL): Crash recovery
MVCC (Multi-Version Concurrency Control): Concurrent access
Compression: Prefix compression, page compression
Partitioning: Distribute B+ Tree across multiple disks
Caching: Buffer pool for hot nodes