B-Trees and B+ Trees
Disk-friendly balanced trees for indexing and storage.
1. Node structure and order
Understanding the fundamental organization of B-Trees:
- Order (m): Maximum number of children a node can have
- Also called branching factor or degree
- Common values: 100-1000 for disk-based systems
- Node properties:
- Each node contains up to
m-1keys - Each node has up to
mchildren - Keys are stored in sorted order
- Internal nodes: keys + child pointers
- Leaf nodes: keys + data (or pointers to data)
- Each node contains up to
- B-Tree node structure:
class BTreeNode: def __init__(self, order, is_leaf=False): self.order = order # Maximum children self.keys = [] # Sorted keys self.children = [] # Child pointers self.is_leaf = is_leaf # Leaf flag self.n = 0 # Current number of keys - Node constraints:
- Root: 1 to m-1 keys, 2 to m children (if not leaf)
- Internal nodes: ⌈m/2⌉-1 to m-1 keys, ⌈m/2⌉ to m children
- Leaf nodes: ⌈m/2⌉-1 to m-1 keys
- All leaves at same depth (balanced)
- Example: B-Tree of order 5 (m=5):
```
Node structure:
- Min keys (internal): ⌈5/2⌉-1 = 2
- Max keys: 4
- Min children (internal): ⌈5/2⌉ = 3
- Max children: 5
Example node: [10 | 20 | 30 | 40] / | | |
<10 10-20 20-30 30-40 >40 ``` - Memory layout considerations:
- Node size typically matches disk block size (4KB, 8KB)
- Maximizes keys per node for given block size
- Example: 4KB block, 8-byte keys, 8-byte pointers
- Available space: ~4000 bytes
- Order m ≈ 250 (250 pointers + 249 keys)
2. Insertion, split, and deletion
Core operations for maintaining B-Tree properties:
Insertion:
- Search for leaf: Traverse tree to find appropriate leaf node
- Insert in leaf: Add key in sorted position
- Check overflow: If node has m keys (too many), split
- Split operation:
- Create new node
- Move upper half of keys to new node
- Promote middle key to parent
- If parent overflows, split recursively up to root
- Insertion algorithm:
def insert(root, key): # If root is full, split it if root.n == root.order - 1: new_root = BTreeNode(root.order) new_root.children.append(root) split_child(new_root, 0) root = new_root insert_non_full(root, key) return root def insert_non_full(node, key): if node.is_leaf: # Insert key in sorted position node.keys.insert(bisect_left(node.keys, key), key) node.n += 1 else: # Find child to insert into i = bisect_left(node.keys, key) if node.children[i].n == node.order - 1: split_child(node, i) if key > node.keys[i]: i += 1 insert_non_full(node.children[i], key) def split_child(parent, index): full_child = parent.children[index] new_child = BTreeNode(full_child.order, full_child.is_leaf) mid = full_child.order // 2 # Promote middle key to parent parent.keys.insert(index, full_child.keys[mid]) parent.children.insert(index + 1, new_child) # Split keys and children new_child.keys = full_child.keys[mid+1:] full_child.keys = full_child.keys[:mid] if not full_child.is_leaf: new_child.children = full_child.children[mid+1:] full_child.children = full_child.children[:mid+1]
Deletion:
More complex than insertion, three cases:
- Key in leaf: Simply remove
- Key in internal node: Replace with predecessor/successor, then delete from leaf
- Underflow handling: If node has fewer than ⌈m/2⌉-1 keys:
- Borrow from sibling: If sibling has extra keys
- Merge with sibling: If sibling is at minimum
- Deletion cases:
Case 1: Delete from leaf (simple) Before: [10 | 20 | 30] Delete 20 After: [10 | 30] Case 2: Delete from internal (replace with predecessor) Before: [20] / \ [10] [30] Delete 20 → Replace with 10 After: [10] / \ [] [30] Then handle underflow in left child Case 3: Merge siblings (underflow) Before: [30] / \ [10] [40 | 50] Delete 10 → Underflow in left After: [40 | 50] (merged, 30 pulled down)
3. Height and complexity
Analyzing performance characteristics:
- Height formula:
- For n keys and order m:
- Minimum height:
log_m(n+1)(fully packed) - Maximum height:
log_⌈m/2⌉((n+1)/2)(minimum occupancy)
- Height example:
- 1 million keys, order m=100
- Height ≤ log₅₀(1,000,000) ≈ 3.5 → 4 levels
- Only 4 disk accesses to find any key!
-
Time complexity: | Operation | Time Complexity | Disk I/Os | |———–|—————-|———–| | Search | O(log_m n) | O(log_m n) | | Insert | O(log_m n) | O(log_m n) | | Delete | O(log_m n) | O(log_m n) | | Range scan | O(log_m n + k) | O(log_m n + k/b) |
Where:
- n = number of keys
- m = order (branching factor)
- k = number of results
- b = keys per block
- Space complexity: O(n)
- Minimum 50% space utilization (except root)
- Average 67-75% utilization in practice
-
Comparison with binary search trees:
Tree Type Height for 1M keys Disk I/Os Binary tree (balanced) log₂(1M) ≈ 20 20 B-Tree (m=100) log₁₀₀(1M) ≈ 3 3 B-Tree (m=1000) log₁₀₀₀(1M) ≈ 2 2 - Why B-Trees are efficient:
- High branching factor → shallow tree
- Fewer disk accesses (dominant cost)
- Each node fits in one disk block
- Sequential access within nodes (cache-friendly)
4. Differences: B-Tree vs B+ Tree
Understanding the two main variants:
B-Tree:
- Structure:
- Keys and data in all nodes (internal + leaf)
- Each key appears exactly once
- No linked list between leaves
- Advantages:
- Better for exact-match queries (may find in internal node)
- Slightly less space (no duplicate keys)
- Disadvantages:
- Range queries less efficient
- Variable-size records complicate node management
B+ Tree:
- Structure:
- All data in leaf nodes only
- Internal nodes contain only keys (for routing)
- Keys may be duplicated (in internal + leaf)
- Leaves linked in sorted order (doubly linked list)
- Advantages:
- Excellent for range queries (scan linked leaves)
- More keys per internal node (no data)
- Sequential access via leaf chain
- Consistent performance (always reach leaf)
- Disadvantages:
- Duplicate keys use extra space
- Always traverse to leaf (even if key in internal node)
Visual comparison:
B-Tree (order 3):
[20]
/ \
[10] [30, 40]
All nodes contain data.
B+ Tree (order 3):
[20, 30]
/ | \
[10] [20,25] [30,40]
↔ ↔ ↔
Only leaves contain data.
Leaves linked for range scans.
Detailed comparison table:
| Feature | B-Tree | B+ Tree |
|---|---|---|
| Data location | All nodes | Leaf nodes only |
| Internal nodes | Keys + data | Keys only (routing) |
| Key duplication | No | Yes (internal + leaf) |
| Leaf linkage | No | Yes (linked list) |
| Keys per internal node | Fewer | More |
| Range queries | O(log n + k) | O(log n + k/b) better |
| Point queries | Faster (may stop early) | Always to leaf |
| Sequential access | Poor | Excellent |
| Use case | General purpose | Databases, filesystems |
Why databases prefer B+ Trees:
- Range queries: Common in SQL (
SELECT * WHERE age BETWEEN 20 AND 30) - Sequential scans: Full table scans via leaf chain
- Higher fanout: More keys per internal node → shorter tree
- Predictable performance: Always same depth to leaf
- Easier concurrency: Lock leaves independently
5. Range scans and storage locality
Optimizing for sequential access patterns:
Range query in B+ Tree:
- Find start key: O(log_m n) - traverse to leaf
- Scan leaves: Follow linked list until end key
- Total cost: O(log_m n + k) where k = results
- Example query:
SELECT * FROM users WHERE age BETWEEN 25 AND 35Step 1: Search for age=25 → reach leaf L1 Step 2: Scan L1 → L2 → L3 until age > 35 Step 3: Return all records found Disk I/Os: 1 (root) + 1 (internal) + 1 (leaf) + k/b (scan)
Range query in B-Tree:
- Must perform in-order traversal
- Jump between internal and leaf nodes
- Less efficient: O(log_m n + k log_m n)
Storage locality benefits:
- Sequential disk access:
- Leaves stored contiguously on disk
- Prefetching: OS loads adjacent blocks
- Minimizes seek time (mechanical disks)
- Cache-friendly:
- Scanning leaves keeps data in cache
- No random jumps between levels
- Bulk loading:
- Build B+ Tree bottom-up
- Sort all keys, create leaves left-to-right
- Then build internal levels
- Result: 100% space utilization, optimal locality
Optimization techniques:
- Leaf node chaining:
class BPlusTreeLeaf: def __init__(self): self.keys = [] self.values = [] self.next = None # Next leaf self.prev = None # Previous leaf - Sibling pointers in internal nodes:
- Faster splits and merges
- Better concurrency control
- Bulk loading algorithm:
def bulk_load(sorted_keys): # Create leaf level leaves = [] for i in range(0, len(sorted_keys), LEAF_SIZE): leaf = create_leaf(sorted_keys[i:i+LEAF_SIZE]) leaves.append(leaf) # Link leaves for i in range(len(leaves)-1): leaves[i].next = leaves[i+1] # Build internal levels bottom-up return build_internal_levels(leaves) - Prefix compression:
- Store only distinguishing prefix in internal nodes
- Example: [“apple”, “application”] → [“app”, “appl”]
- Saves space, increases fanout
6. Database and filesystem applications
Real-world usage of B-Trees and B+ Trees:
Database indexes:
- Primary index: B+ Tree on primary key
- Leaf nodes contain actual data rows (clustered index)
- Example: MySQL InnoDB primary key index
- Secondary index: B+ Tree on non-primary key
- Leaf nodes contain pointers to primary key
- Example: Index on email column
- Composite index: Multi-column B+ Tree
- Keys are tuples: (last_name, first_name)
- Supports queries on prefix:
WHERE last_name = 'Smith'
Database systems using B+ Trees:
| Database | Index Type | Details |
|---|---|---|
| MySQL InnoDB | B+ Tree | Clustered primary key, secondary indexes |
| PostgreSQL | B-Tree (actually B+ Tree) | Default index type |
| SQLite | B+ Tree | Both table and index storage |
| Oracle | B+ Tree | Index-organized tables |
| SQL Server | B+ Tree | Clustered and non-clustered indexes |
Filesystem applications:
- File allocation:
- B+ Tree maps file blocks to disk blocks
- Fast random access within files
- Directory structure:
- B+ Tree for large directories
- Efficient filename lookups
-
Filesystem examples:
Filesystem Usage ext4 HTree (B-Tree variant) for directories XFS B+ Trees for free space, inodes Btrfs B-Trees for all metadata NTFS B+ Trees for file records (MFT) HFS+ B-Trees for catalog file
Specific use cases:
- MySQL InnoDB:
-- Clustered index (B+ Tree on primary key) CREATE TABLE users ( id INT PRIMARY KEY, -- B+ Tree root name VARCHAR(100), email VARCHAR(100), INDEX idx_email (email) -- Secondary B+ Tree ); -- Range query (efficient) SELECT * FROM users WHERE id BETWEEN 1000 AND 2000; -- Uses B+ Tree leaf chain - PostgreSQL B-Tree index:
-- Create index CREATE INDEX idx_users_age ON users(age); -- Index scan EXPLAIN SELECT * FROM users WHERE age > 25; -- Index Scan using idx_users_age -- Filter: (age > 25) - Filesystem directory lookup:
Directory: /usr/bin (10,000 files) Lookup: find "python3" Without B-Tree: O(n) linear scan With B-Tree: O(log n) ≈ 4 disk accesses
Performance characteristics:
- Insert performance:
- Random inserts: ~1000-10000 ops/sec
- Bulk inserts: ~100000+ ops/sec (bulk loading)
- Query performance:
- Point query: 1-3 disk I/Os (typical)
- Range query: 1-3 + k/b I/Os (k results, b per block)
- Space overhead:
- 50-75% space utilization
- Internal nodes: ~1-2% of total space
- Leaf nodes: ~98-99% of total space
Advanced features:
- Write-ahead logging (WAL): Crash recovery
- MVCC (Multi-Version Concurrency Control): Concurrent access
- Compression: Prefix compression, page compression
- Partitioning: Distribute B+ Tree across multiple disks
- Caching: Buffer pool for hot nodes