Trees
Hierarchical structures with parent-child relationships.
1. Binary trees and properties
Understanding the fundamental tree structure:
Definition:
- Tree: Hierarchical data structure with nodes connected by edges
- Binary Tree: Each node has at most two children (left and right)
- Root: Top node with no parent
- Leaf: Node with no children
- Height: Longest path from root to leaf
- Depth: Distance from root to node
Node structure:
class TreeNode:
def __init__(self, val=0):
self.val = val
self.left = None
self.right = None
Types of binary trees:
1. Full Binary Tree:
- Every node has 0 or 2 children
1 / \ 2 3 / \ 4 5
2. Complete Binary Tree:
- All levels filled except possibly last, which is filled left to right
1 / \ 2 3 / \ / 4 5 6
3. Perfect Binary Tree:
- All internal nodes have 2 children, all leaves at same level
1 / \ 2 3 / \ / \ 4 5 6 7
4. Balanced Binary Tree:
- Height difference between left and right subtrees ≤ 1 for all nodes
1 / \ 2 3 / 4 Height difference: |2 - 1| = 1 ✓
5. Degenerate (Skewed) Tree:
- Each node has only one child (essentially a linked list)
1 \ 2 \ 3 \ 4
Properties:
# Maximum nodes at level i: 2^i
# Maximum nodes in tree of height h: 2^(h+1) - 1
# Minimum height for n nodes: log₂(n+1) - 1
def max_nodes_at_level(level):
return 2 ** level
def max_nodes_in_tree(height):
return 2 ** (height + 1) - 1
def min_height(n):
import math
return math.ceil(math.log2(n + 1)) - 1
# Example: n=7 nodes
# Min height = 2 (perfect tree)
# Max height = 6 (skewed tree)
Basic operations:
def height(node):
"""Calculate height of tree"""
if not node:
return -1
return 1 + max(height(node.left), height(node.right))
def size(node):
"""Count total nodes"""
if not node:
return 0
return 1 + size(node.left) + size(node.right)
def is_balanced(node):
"""Check if tree is balanced"""
if not node:
return True
left_height = height(node.left)
right_height = height(node.right)
if abs(left_height - right_height) > 1:
return False
return is_balanced(node.left) and is_balanced(node.right)
2. Binary Search Trees (BSTs)
Ordered binary trees for efficient searching:
BST Property:
- Left subtree: All values < node value
- Right subtree: All values > node value
- Recursive: Property holds for all subtrees
Example:
8
/ \
3 10
/ \ \
1 6 14
/ \ /
4 7 13
Operations:
Search:
def search(root, target):
"""Search for value in BST - O(h)"""
if not root or root.val == target:
return root
if target < root.val:
return search(root.left, target)
else:
return search(root.right, target)
# Iterative version
def search_iterative(root, target):
while root and root.val != target:
if target < root.val:
root = root.left
else:
root = root.right
return root
Insert:
def insert(root, val):
"""Insert value into BST - O(h)"""
if not root:
return TreeNode(val)
if val < root.val:
root.left = insert(root.left, val)
else:
root.right = insert(root.right, val)
return root
Delete:
def delete(root, val):
"""Delete value from BST - O(h)"""
if not root:
return None
if val < root.val:
root.left = delete(root.left, val)
elif val > root.val:
root.right = delete(root.right, val)
else:
# Node to delete found
# Case 1: No children (leaf)
if not root.left and not root.right:
return None
# Case 2: One child
if not root.left:
return root.right
if not root.right:
return root.left
# Case 3: Two children
# Find inorder successor (smallest in right subtree)
successor = find_min(root.right)
root.val = successor.val
root.right = delete(root.right, successor.val)
return root
def find_min(node):
"""Find minimum value node"""
while node.left:
node = node.left
return node
Find min/max:
def find_minimum(root):
"""Find minimum value - O(h)"""
if not root:
return None
while root.left:
root = root.left
return root.val
def find_maximum(root):
"""Find maximum value - O(h)"""
if not root:
return None
while root.right:
root = root.right
return root.val
Validate BST:
def is_valid_bst(root, min_val=float('-inf'), max_val=float('inf')):
"""Check if tree is valid BST"""
if not root:
return True
if root.val <= min_val or root.val >= max_val:
return False
return (is_valid_bst(root.left, min_val, root.val) and
is_valid_bst(root.right, root.val, max_val))
3. Traversals: inorder, preorder, postorder
Different ways to visit all nodes:
Inorder (Left-Root-Right):
- Result: Sorted order for BST
- Use: Get sorted elements, validate BST
def inorder(root):
"""Inorder traversal - recursive"""
if not root:
return []
result = []
result.extend(inorder(root.left))
result.append(root.val)
result.extend(inorder(root.right))
return result
# Iterative with stack
def inorder_iterative(root):
result = []
stack = []
current = root
while current or stack:
# Go to leftmost node
while current:
stack.append(current)
current = current.left
# Process node
current = stack.pop()
result.append(current.val)
# Go to right subtree
current = current.right
return result
Example:
Tree: 4
/ \
2 6
/ \ / \
1 3 5 7
Inorder: [1, 2, 3, 4, 5, 6, 7]
Preorder (Root-Left-Right):
- Result: Root before children
- Use: Copy tree, prefix expressions
def preorder(root):
"""Preorder traversal - recursive"""
if not root:
return []
result = [root.val]
result.extend(preorder(root.left))
result.extend(preorder(root.right))
return result
# Iterative
def preorder_iterative(root):
if not root:
return []
result = []
stack = [root]
while stack:
node = stack.pop()
result.append(node.val)
# Push right first (so left is processed first)
if node.right:
stack.append(node.right)
if node.left:
stack.append(node.left)
return result
Example:
Tree: 4
/ \
2 6
/ \ / \
1 3 5 7
Preorder: [4, 2, 1, 3, 6, 5, 7]
Postorder (Left-Right-Root):
- Result: Children before root
- Use: Delete tree, postfix expressions
def postorder(root):
"""Postorder traversal - recursive"""
if not root:
return []
result = []
result.extend(postorder(root.left))
result.extend(postorder(root.right))
result.append(root.val)
return result
# Iterative (two stacks)
def postorder_iterative(root):
if not root:
return []
stack1 = [root]
stack2 = []
while stack1:
node = stack1.pop()
stack2.append(node)
if node.left:
stack1.append(node.left)
if node.right:
stack1.append(node.right)
result = []
while stack2:
result.append(stack2.pop().val)
return result
Example:
Tree: 4
/ \
2 6
/ \ / \
1 3 5 7
Postorder: [1, 3, 2, 5, 7, 6, 4]
Level-order (BFS):
- Result: Level by level
- Use: Find level, shortest path
from collections import deque
def level_order(root):
"""Level-order traversal (BFS)"""
if not root:
return []
result = []
queue = deque([root])
while queue:
node = queue.popleft()
result.append(node.val)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
return result
# By levels
def level_order_by_levels(root):
"""Return list of lists, one per level"""
if not root:
return []
result = []
queue = deque([root])
while queue:
level_size = len(queue)
level = []
for _ in range(level_size):
node = queue.popleft()
level.append(node.val)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
result.append(level)
return result
Example:
Tree: 4
/ \
2 6
/ \ / \
1 3 5 7
Level-order: [4, 2, 6, 1, 3, 5, 7]
By levels: [[4], [2, 6], [1, 3, 5, 7]]
4. Heaps: structure and heap operations
Binary trees with heap property:
Heap Property:
- Max Heap: Parent ≥ children
- Min Heap: Parent ≤ children
- Complete Binary Tree: All levels filled except last
Array representation:
# For node at index i:
# Left child: 2*i + 1
# Right child: 2*i + 2
# Parent: (i - 1) // 2
class MinHeap:
def __init__(self):
self.heap = []
def parent(self, i):
return (i - 1) // 2
def left_child(self, i):
return 2 * i + 1
def right_child(self, i):
return 2 * i + 2
Heapify operations:
def heapify_up(self, i):
"""Bubble up element at index i"""
while i > 0:
parent_idx = self.parent(i)
if self.heap[i] < self.heap[parent_idx]:
self.heap[i], self.heap[parent_idx] = \
self.heap[parent_idx], self.heap[i]
i = parent_idx
else:
break
def heapify_down(self, i):
"""Bubble down element at index i"""
n = len(self.heap)
while True:
smallest = i
left = self.left_child(i)
right = self.right_child(i)
if left < n and self.heap[left] < self.heap[smallest]:
smallest = left
if right < n and self.heap[right] < self.heap[smallest]:
smallest = right
if smallest != i:
self.heap[i], self.heap[smallest] = \
self.heap[smallest], self.heap[i]
i = smallest
else:
break
Heap operations:
def insert(self, val):
"""Insert value - O(log n)"""
self.heap.append(val)
self.heapify_up(len(self.heap) - 1)
def extract_min(self):
"""Remove and return minimum - O(log n)"""
if not self.heap:
return None
if len(self.heap) == 1:
return self.heap.pop()
min_val = self.heap[0]
self.heap[0] = self.heap.pop()
self.heapify_down(0)
return min_val
def peek(self):
"""Get minimum without removing - O(1)"""
return self.heap[0] if self.heap else None
def build_heap(self, arr):
"""Build heap from array - O(n)"""
self.heap = arr[:]
# Start from last non-leaf node
for i in range(len(self.heap) // 2 - 1, -1, -1):
self.heapify_down(i)
Using Python’s heapq:
import heapq
# Min heap
heap = []
heapq.heappush(heap, 3)
heapq.heappush(heap, 1)
heapq.heappush(heap, 4)
min_val = heapq.heappop(heap) # Returns 1
# Build heap from list
arr = [3, 1, 4, 1, 5, 9, 2, 6]
heapq.heapify(arr) # O(n)
# Max heap (negate values)
max_heap = []
heapq.heappush(max_heap, -3)
heapq.heappush(max_heap, -1)
max_val = -heapq.heappop(max_heap) # Returns 3
5. Balanced trees: AVL, Red-Black
Self-balancing BSTs for guaranteed O(log n) operations:
AVL Trees:
- Balance Factor: height(left) - height(right) ∈ {-1, 0, 1}
- Rebalancing: Rotations after insert/delete
- Stricter: More balanced than Red-Black
Rotations:
class AVLNode:
def __init__(self, val):
self.val = val
self.left = None
self.right = None
self.height = 1
def get_height(node):
return node.height if node else 0
def get_balance(node):
return get_height(node.left) - get_height(node.right) if node else 0
def update_height(node):
node.height = 1 + max(get_height(node.left), get_height(node.right))
# Right rotation
def rotate_right(y):
"""
y x
/ \ / \
x C --> A y
/ \ / \
A B B C
"""
x = y.left
B = x.right
x.right = y
y.left = B
update_height(y)
update_height(x)
return x
# Left rotation
def rotate_left(x):
"""
x y
/ \ / \
A y --> x C
/ \ / \
B C A B
"""
y = x.right
B = y.left
y.left = x
x.right = B
update_height(x)
update_height(y)
return y
AVL Insert:
def avl_insert(root, val):
"""Insert with rebalancing"""
# Standard BST insert
if not root:
return AVLNode(val)
if val < root.val:
root.left = avl_insert(root.left, val)
else:
root.right = avl_insert(root.right, val)
# Update height
update_height(root)
# Get balance factor
balance = get_balance(root)
# Left-Left case
if balance > 1 and val < root.left.val:
return rotate_right(root)
# Right-Right case
if balance < -1 and val > root.right.val:
return rotate_left(root)
# Left-Right case
if balance > 1 and val > root.left.val:
root.left = rotate_left(root.left)
return rotate_right(root)
# Right-Left case
if balance < -1 and val < root.right.val:
root.right = rotate_right(root.right)
return rotate_left(root)
return root
Red-Black Trees:
- Properties:
- Every node is red or black
- Root is black
- Leaves (NULL) are black
- Red node has black children
- All paths from node to leaves have same number of black nodes
Advantages over AVL:
- Fewer rotations on insert/delete
- Faster for frequent modifications
- Used in many standard libraries (C++ std::map, Java TreeMap)
Comparison:
| Feature | AVL | Red-Black |
|---|---|---|
| Balance | Stricter | Looser |
| Height | ≤ 1.44 log n | ≤ 2 log n |
| Rotations (insert) | ≤ 2 | ≤ 2 |
| Rotations (delete) | O(log n) | ≤ 3 |
| Search | Faster | Slightly slower |
| Insert/Delete | Slower | Faster |
| Use case | Read-heavy | Write-heavy |
6. Use cases: indexes, parsers, hierarchies
Real-world applications of trees:
Database Indexes:
# B-Tree index for database
class BTreeIndex:
"""
Used in: MySQL, PostgreSQL, SQLite
Why: Optimized for disk I/O, high fanout
"""
def search(self, key):
# O(log n) disk reads
pass
def range_query(self, start, end):
# Efficient range scans
pass
Expression Parsing:
# Parse arithmetic expression into tree
def parse_expression(expr):
"""
Expression: 3 + 4 * 2
Tree:
+
/ \
3 *
/ \
4 2
"""
pass
def evaluate(root):
"""Evaluate expression tree"""
if not root:
return 0
# Leaf node (operand)
if not root.left and not root.right:
return root.val
# Evaluate subtrees
left_val = evaluate(root.left)
right_val = evaluate(root.right)
# Apply operator
if root.val == '+':
return left_val + right_val
elif root.val == '-':
return left_val - right_val
elif root.val == '*':
return left_val * right_val
elif root.val == '/':
return left_val / right_val
File System Hierarchy:
class FileNode:
def __init__(self, name, is_file=True):
self.name = name
self.is_file = is_file
self.children = [] if not is_file else None
def add_child(self, child):
if not self.is_file:
self.children.append(child)
# Directory structure
root = FileNode("/", is_file=False)
home = FileNode("home", is_file=False)
user = FileNode("user", is_file=False)
doc = FileNode("document.txt", is_file=True)
root.add_child(home)
home.add_child(user)
user.add_child(doc)
Decision Trees (ML):
class DecisionNode:
def __init__(self, feature=None, threshold=None,
left=None, right=None, value=None):
self.feature = feature
self.threshold = threshold
self.left = left
self.right = right
self.value = value # For leaf nodes
def predict(node, sample):
"""Traverse tree to make prediction"""
if node.value is not None:
return node.value
if sample[node.feature] <= node.threshold:
return predict(node.left, sample)
else:
return predict(node.right, sample)
Huffman Coding (Compression):
# Build Huffman tree for data compression
import heapq
def build_huffman_tree(freq):
"""Build tree from character frequencies"""
heap = [[weight, [char, ""]] for char, weight in freq.items()]
heapq.heapify(heap)
while len(heap) > 1:
lo = heapq.heappop(heap)
hi = heapq.heappop(heap)
for pair in lo[1:]:
pair[1] = '0' + pair[1]
for pair in hi[1:]:
pair[1] = '1' + pair[1]
heapq.heappush(heap, [lo[0] + hi[0]] + lo[1:] + hi[1:])
return sorted(heapq.heappop(heap)[1:], key=lambda p: (len(p[-1]), p))
7. Complexity of operations
Performance characteristics of tree operations:
Binary Search Tree (unbalanced):
| Operation | Average | Worst Case |
|---|---|---|
| Search | O(log n) | O(n) |
| Insert | O(log n) | O(n) |
| Delete | O(log n) | O(n) |
| Min/Max | O(log n) | O(n) |
| Traversal | O(n) | O(n) |
Balanced BST (AVL, Red-Black):
| Operation | Time | Notes |
|---|---|---|
| Search | O(log n) | Guaranteed |
| Insert | O(log n) | With rebalancing |
| Delete | O(log n) | With rebalancing |
| Min/Max | O(log n) | Guaranteed |
Heap:
| Operation | Time | Notes |
|---|---|---|
| Insert | O(log n) | Heapify up |
| Extract Min/Max | O(log n) | Heapify down |
| Peek Min/Max | O(1) | Root element |
| Build Heap | O(n) | From array |
| Heapify | O(log n) | Single element |
Space complexity:
- Tree storage: O(n) for n nodes
- Recursive traversal: O(h) stack space where h = height
- Iterative traversal: O(h) for stack/queue
Height analysis:
# Best case (balanced): h = log₂ n
# Worst case (skewed): h = n
# Example: n = 1000 nodes
# Balanced: h ≈ 10 (very fast)
# Skewed: h = 1000 (slow as linked list)
def operations_comparison(n=1000):
"""Compare operations for different tree types"""
import math
balanced_height = math.ceil(math.log2(n))
skewed_height = n
print(f"Nodes: {n}")
print(f"Balanced tree height: {balanced_height}")
print(f"Skewed tree height: {skewed_height}")
print(f"Search operations:")
print(f" Balanced: ~{balanced_height} comparisons")
print(f" Skewed: ~{skewed_height//2} comparisons (average)")
# Output:
# Nodes: 1000
# Balanced tree height: 10
# Skewed tree height: 1000
# Search operations:
# Balanced: ~10 comparisons
# Skewed: ~500 comparisons (average)
Practical considerations:
- Use balanced trees when: Frequent insertions/deletions, need guaranteed O(log n)
- Use unbalanced BST when: Data arrives sorted, simple implementation sufficient
- Use heap when: Need min/max quickly, priority queue operations
- Use hash table when: Only need search/insert/delete, no ordering required