Algorithms with Data Structures
Combine structures and algorithms to solve problems efficiently.
Sorting algorithms overview
Sorting algorithms arrange elements in a specific order (ascending or descending). Different algorithms have different time/space complexities and stability properties.
Comparison-based sorting algorithms:
- Bubble Sort: Repeatedly swap adjacent elements if they’re in wrong order
- Time: O(n²) worst/average, O(n) best (already sorted)
- Space: O(1)
- Stable: Yes
- Use case: Educational purposes, very small datasets
- Selection Sort: Find minimum element and swap with first unsorted position
- Time: O(n²) in all cases
- Space: O(1)
- Stable: No (standard version)
- Use case: Small datasets, memory-constrained environments
- Insertion Sort: Build sorted array one element at a time by inserting each element into proper position
- Time: O(n²) worst/average, O(n) best
- Space: O(1)
- Stable: Yes
- Use case: Small datasets, nearly sorted data, online sorting
- Merge Sort: Divide-and-conquer algorithm that splits array, sorts halves, and merges them
- Time: O(n log n) in all cases
- Space: O(n)
- Stable: Yes
- Use case: External sorting, linked lists, guaranteed O(n log n)
- Quick Sort: Choose pivot, partition array around pivot, recursively sort partitions
- Time: O(n log n) average, O(n²) worst
- Space: O(log n) average (recursion stack)
- Stable: No (standard version)
- Use case: General-purpose sorting, in-place sorting needed
- Heap Sort: Build max-heap, repeatedly extract maximum
- Time: O(n log n) in all cases
- Space: O(1)
- Stable: No
- Use case: When O(n log n) guarantee needed with O(1) space
Non-comparison-based sorting:
- Counting Sort: Count occurrences of each value, reconstruct sorted array
- Time: O(n + k) where k is range of input
- Space: O(k)
- Stable: Yes
- Use case: Small integer range, duplicate values
- Radix Sort: Sort digit by digit using stable sort as subroutine
- Time: O(d * (n + k)) where d is number of digits
- Space: O(n + k)
- Stable: Yes
- Use case: Integers, strings with fixed length
- Bucket Sort: Distribute elements into buckets, sort each bucket
- Time: O(n + k) average, O(n²) worst
- Space: O(n + k)
- Stable: Can be
- Use case: Uniformly distributed data
Sorting algorithm comparison table:
| Algorithm | Best Time | Avg Time | Worst Time | Space | Stable | In-place |
|---|---|---|---|---|---|---|
| Bubble Sort | O(n) | O(n²) | O(n²) | O(1) | Yes | Yes |
| Selection Sort | O(n²) | O(n²) | O(n²) | O(1) | No | Yes |
| Insertion Sort | O(n) | O(n²) | O(n²) | O(1) | Yes | Yes |
| Merge Sort | O(n log n) | O(n log n) | O(n log n) | O(n) | Yes | No |
| Quick Sort | O(n log n) | O(n log n) | O(n²) | O(log n) | No | Yes |
| Heap Sort | O(n log n) | O(n log n) | O(n log n) | O(1) | No | Yes |
| Counting Sort | O(n + k) | O(n + k) | O(n + k) | O(k) | Yes | No |
| Radix Sort | O(d(n+k)) | O(d(n+k)) | O(d(n+k)) | O(n+k) | Yes | No |
| Bucket Sort | O(n + k) | O(n + k) | O(n²) | O(n) | Yes | No |
Example implementations:
# Quick Sort - Most commonly used general-purpose algorithm
def quick_sort(arr):
if len(arr) <= 1:
return arr
pivot = arr[len(arr) // 2]
left = [x for x in arr if x < pivot]
middle = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quick_sort(left) + middle + quick_sort(right)
# In-place version (more efficient)
def quick_sort_inplace(arr, low=0, high=None):
if high is None:
high = len(arr) - 1
if low < high:
pi = partition(arr, low, high)
quick_sort_inplace(arr, low, pi - 1)
quick_sort_inplace(arr, pi + 1, high)
return arr
def partition(arr, low, high):
pivot = arr[high]
i = low - 1
for j in range(low, high):
if arr[j] <= pivot:
i += 1
arr[i], arr[j] = arr[j], arr[i]
arr[i + 1], arr[high] = arr[high], arr[i + 1]
return i + 1
# Merge Sort - Guaranteed O(n log n) and stable
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
def merge(left, right):
result = []
i = j = 0
while i < len(left) and j < len(right):
if left[i] <= right[j]:
result.append(left[i])
i += 1
else:
result.append(right[j])
j += 1
result.extend(left[i:])
result.extend(right[j:])
return result
# Counting Sort - For small integer ranges
def counting_sort(arr):
if not arr:
return arr
min_val = min(arr)
max_val = max(arr)
range_size = max_val - min_val + 1
count = [0] * range_size
output = [0] * len(arr)
# Count occurrences
for num in arr:
count[num - min_val] += 1
# Cumulative count
for i in range(1, len(count)):
count[i] += count[i - 1]
# Build output (backward for stability)
for i in range(len(arr) - 1, -1, -1):
num = arr[i]
output[count[num - min_val] - 1] = num
count[num - min_val] -= 1
return output
# Heap Sort - O(n log n) with O(1) space
def heap_sort(arr):
n = len(arr)
# Build max heap
for i in range(n // 2 - 1, -1, -1):
heapify(arr, n, i)
# Extract elements from heap
for i in range(n - 1, 0, -1):
arr[0], arr[i] = arr[i], arr[0]
heapify(arr, i, 0)
return arr
def heapify(arr, n, i):
largest = i
left = 2 * i + 1
right = 2 * i + 2
if left < n and arr[left] > arr[largest]:
largest = left
if right < n and arr[right] > arr[largest]:
largest = right
if largest != i:
arr[i], arr[largest] = arr[largest], arr[i]
heapify(arr, n, largest)
# Example usage:
arr = [64, 34, 25, 12, 22, 11, 90]
print("Original:", arr)
print("Quick Sort:", quick_sort(arr.copy()))
print("Merge Sort:", merge_sort(arr.copy()))
print("Heap Sort:", heap_sort(arr.copy()))
print("Counting Sort:", counting_sort(arr.copy()))
Choosing the right sorting algorithm:
- Use Quick Sort when:
- General-purpose sorting
- Average O(n log n) is acceptable
- In-place sorting is needed
- Use Merge Sort when:
- Stability is required
- Guaranteed O(n log n) is needed
- Sorting linked lists (no extra space needed)
- External sorting (large datasets on disk)
- Use Heap Sort when:
- O(n log n) guarantee with O(1) space
- Priority queue operations needed
- Use Insertion Sort when:
- Array is small (< 10-20 elements)
- Array is nearly sorted
- Online sorting (elements arrive one at a time)
- Use Counting/Radix Sort when:
- Sorting integers with small range
- O(n) time is achievable
- Stability is needed
Hybrid approaches:
Many production sorting implementations use hybrid approaches:
# Timsort (Python's built-in sort) - hybrid of merge sort and insertion sort
# Used in Python's sorted() and list.sort()
def timsort_concept(arr):
"""
Simplified concept of Timsort:
1. Divide array into small runs
2. Sort each run with insertion sort (efficient for small arrays)
3. Merge runs using merge sort
"""
MIN_RUN = 32
# Sort individual runs with insertion sort
for start in range(0, len(arr), MIN_RUN):
end = min(start + MIN_RUN, len(arr))
insertion_sort_range(arr, start, end)
# Merge runs
size = MIN_RUN
while size < len(arr):
for start in range(0, len(arr), size * 2):
mid = min(start + size, len(arr))
end = min(start + size * 2, len(arr))
if mid < end:
merge_inplace(arr, start, mid, end)
size *= 2
return arr
def insertion_sort_range(arr, left, right):
for i in range(left + 1, right):
key = arr[i]
j = i - 1
while j >= left and arr[j] > key:
arr[j + 1] = arr[j]
j -= 1
arr[j + 1] = key
def merge_inplace(arr, left, mid, right):
left_arr = arr[left:mid]
right_arr = arr[mid:right]
i = j = 0
k = left
while i < len(left_arr) and j < len(right_arr):
if left_arr[i] <= right_arr[j]:
arr[k] = left_arr[i]
i += 1
else:
arr[k] = right_arr[j]
j += 1
k += 1
while i < len(left_arr):
arr[k] = left_arr[i]
i += 1
k += 1
while j < len(right_arr):
arr[k] = right_arr[j]
j += 1
k += 1
# Introsort (C++'s std::sort) - hybrid of quick sort, heap sort, insertion sort
def introsort_concept(arr):
"""
Simplified concept of Introsort:
1. Start with quick sort
2. Switch to heap sort if recursion depth exceeds threshold (avoid O(n²))
3. Use insertion sort for small subarrays
"""
import math
max_depth = 2 * math.floor(math.log2(len(arr)))
return introsort_helper(arr, 0, len(arr) - 1, max_depth)
def introsort_helper(arr, low, high, max_depth):
size = high - low + 1
# Use insertion sort for small arrays
if size < 16:
insertion_sort_range(arr, low, high + 1)
return arr
# Switch to heap sort if max depth exceeded
if max_depth == 0:
heap_sort_range(arr, low, high)
return arr
# Use quick sort
if low < high:
pi = partition(arr, low, high)
introsort_helper(arr, low, pi - 1, max_depth - 1)
introsort_helper(arr, pi + 1, high, max_depth - 1)
return arr
def heap_sort_range(arr, low, high):
# Heap sort on arr[low:high+1]
temp = arr[low:high+1]
heap_sort(temp)
arr[low:high+1] = temp
Searching algorithms overview
Searching algorithms find the position of a target element in a data structure. The choice depends on whether data is sorted and the data structure used.
Linear Search:
- Sequentially check each element
- Works on unsorted and sorted data
- Time: O(n)
- Space: O(1)
def linear_search(arr, target):
"""
Search for target in array.
Returns index if found, -1 otherwise.
"""
for i, val in enumerate(arr):
if val == target:
return i
return -1
# Sentinel linear search (one less comparison per iteration)
def sentinel_linear_search(arr, target):
n = len(arr)
if n == 0:
return -1
# Save last element and put target there
last = arr[n - 1]
arr[n - 1] = target
i = 0
while arr[i] != target:
i += 1
# Restore last element
arr[n - 1] = last
# Check if target was found before sentinel
if i < n - 1 or arr[n - 1] == target:
return i
return -1
Binary Search:
- Repeatedly divide sorted array in half
- Only works on sorted data
- Time: O(log n)
- Space: O(1) iterative, O(log n) recursive
# Iterative binary search (preferred)
def binary_search(arr, target):
"""
Search for target in sorted array.
Returns index if found, -1 otherwise.
"""
left, right = 0, len(arr) - 1
while left <= right:
mid = left + (right - left) // 2 # Avoid overflow
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
# Recursive binary search
def binary_search_recursive(arr, target, left=0, right=None):
if right is None:
right = len(arr) - 1
if left > right:
return -1
mid = left + (right - left) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
return binary_search_recursive(arr, target, mid + 1, right)
else:
return binary_search_recursive(arr, target, left, mid - 1)
# Binary search variants
def binary_search_leftmost(arr, target):
"""Find leftmost (first) occurrence of target."""
left, right = 0, len(arr)
while left < right:
mid = left + (right - left) // 2
if arr[mid] < target:
left = mid + 1
else:
right = mid
if left < len(arr) and arr[left] == target:
return left
return -1
def binary_search_rightmost(arr, target):
"""Find rightmost (last) occurrence of target."""
left, right = 0, len(arr)
while left < right:
mid = left + (right - left) // 2
if arr[mid] <= target:
left = mid + 1
else:
right = mid
if left > 0 and arr[left - 1] == target:
return left - 1
return -1
def binary_search_insert_position(arr, target):
"""Find position where target should be inserted to maintain sorted order."""
left, right = 0, len(arr)
while left < right:
mid = left + (right - left) // 2
if arr[mid] < target:
left = mid + 1
else:
right = mid
return left
Interpolation Search:
- Improved binary search for uniformly distributed sorted data
- Estimates position based on value
- Time: O(log log n) average, O(n) worst
- Space: O(1)
def interpolation_search(arr, target):
"""
Search in uniformly distributed sorted array.
More efficient than binary search when data is uniformly distributed.
"""
left, right = 0, len(arr) - 1
while left <= right and target >= arr[left] and target <= arr[right]:
# Avoid division by zero
if arr[left] == arr[right]:
if arr[left] == target:
return left
return -1
# Estimate position
pos = left + ((target - arr[left]) * (right - left) //
(arr[right] - arr[left]))
if arr[pos] == target:
return pos
elif arr[pos] < target:
left = pos + 1
else:
right = pos - 1
return -1
Exponential Search:
- Find range where target exists, then binary search
- Good for unbounded/infinite arrays
- Time: O(log n)
- Space: O(1)
def exponential_search(arr, target):
"""
Useful for unbounded arrays or when target is near beginning.
"""
if not arr:
return -1
if arr[0] == target:
return 0
# Find range for binary search
i = 1
while i < len(arr) and arr[i] <= target:
i *= 2
# Binary search in found range
return binary_search_range(arr, target, i // 2, min(i, len(arr) - 1))
def binary_search_range(arr, target, left, right):
while left <= right:
mid = left + (right - left) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
Jump Search:
- Jump ahead by fixed steps, then linear search
- Time: O(√n)
- Space: O(1)
import math
def jump_search(arr, target):
"""
Jump ahead by √n steps, then linear search in block.
Good balance between linear and binary search.
"""
n = len(arr)
step = int(math.sqrt(n))
prev = 0
# Find block where target may exist
while prev < n and arr[min(step, n) - 1] < target:
prev = step
step += int(math.sqrt(n))
if prev >= n:
return -1
# Linear search in block
while prev < n and arr[prev] < target:
prev += 1
if prev == min(step, n):
return -1
# Check if target found
if prev < n and arr[prev] == target:
return prev
return -1
Ternary Search:
- Divide array into three parts
- Time: O(log₃ n) ≈ O(log n) (but slower than binary search in practice)
- Space: O(1)
- Useful for finding maximum/minimum of unimodal functions
def ternary_search(arr, target):
"""
Divides array into three parts.
Generally slower than binary search for discrete arrays.
"""
left, right = 0, len(arr) - 1
while left <= right:
mid1 = left + (right - left) // 3
mid2 = right - (right - left) // 3
if arr[mid1] == target:
return mid1
if arr[mid2] == target:
return mid2
if target < arr[mid1]:
right = mid1 - 1
elif target > arr[mid2]:
left = mid2 + 1
else:
left = mid1 + 1
right = mid2 - 1
return -1
# Ternary search for finding maximum of unimodal function
def ternary_search_max(f, left, right, epsilon=1e-9):
"""
Find maximum of unimodal function f in range [left, right].
"""
while right - left > epsilon:
mid1 = left + (right - left) / 3
mid2 = right - (right - left) / 3
if f(mid1) < f(mid2):
left = mid1
else:
right = mid2
return (left + right) / 2
Hash-based Search:
- Use hash table for O(1) average search
- Time: O(1) average, O(n) worst
- Space: O(n)
def hash_search_setup(arr):
"""Build hash table for O(1) searches."""
hash_table = {}
for i, val in enumerate(arr):
if val not in hash_table:
hash_table[val] = []
hash_table[val].append(i)
return hash_table
def hash_search(hash_table, target):
"""Search using prebuilt hash table."""
return hash_table.get(target, [])
# Example:
arr = [4, 2, 7, 2, 9, 2]
ht = hash_search_setup(arr)
print(hash_search(ht, 2)) # [1, 3, 5]
Search algorithm comparison:
| Algorithm | Data Structure | Sorted? | Time (Avg) | Time (Worst) | Space | Best Use Case |
|---|---|---|---|---|---|---|
| Linear | Array/List | No | O(n) | O(n) | O(1) | Small/unsorted data |
| Binary | Array | Yes | O(log n) | O(log n) | O(1) | General sorted data |
| Interpolation | Array | Yes | O(log log n) | O(n) | O(1) | Uniformly distributed data |
| Jump | Array | Yes | O(√n) | O(√n) | O(1) | Jump cost < comparison cost |
| Exponential | Array | Yes | O(log n) | O(log n) | O(1) | Unbounded/target near start |
| Hash | Hash Table | No | O(1) | O(n) | O(n) | Multiple searches |
Practical search scenarios:
# Finding all occurrences in sorted array
def find_all_occurrences(arr, target):
"""Find all indices where target appears in sorted array."""
left = binary_search_leftmost(arr, target)
if left == -1:
return []
right = binary_search_rightmost(arr, target)
return list(range(left, right + 1))
# Finding closest element in sorted array
def find_closest(arr, target):
"""Find element closest to target in sorted array."""
if not arr:
return -1
pos = binary_search_insert_position(arr, target)
if pos == 0:
return 0
if pos == len(arr):
return len(arr) - 1
# Compare neighbors
if abs(arr[pos - 1] - target) <= abs(arr[pos] - target):
return pos - 1
return pos
# Finding peak element (element greater than neighbors)
def find_peak(arr):
"""Find any peak element in array using binary search."""
left, right = 0, len(arr) - 1
while left < right:
mid = left + (right - left) // 2
if arr[mid] < arr[mid + 1]:
left = mid + 1
else:
right = mid
return left
# Searching in rotated sorted array
def search_rotated(arr, target):
"""Search in rotated sorted array (e.g., [4,5,6,7,0,1,2])."""
left, right = 0, len(arr) - 1
while left <= right:
mid = left + (right - left) // 2
if arr[mid] == target:
return mid
# Determine which half is sorted
if arr[left] <= arr[mid]: # Left half is sorted
if arr[left] <= target < arr[mid]:
right = mid - 1
else:
left = mid + 1
else: # Right half is sorted
if arr[mid] < target <= arr[right]:
left = mid + 1
else:
right = mid - 1
return -1
# Example usage:
arr = [1, 2, 2, 2, 3, 4, 5]
print("All occurrences of 2:", find_all_occurrences(arr, 2)) # [1, 2, 3]
print("Closest to 2.7:", arr[find_closest(arr, 2.7)]) # 3
peaks = [1, 3, 20, 4, 1, 0]
print("Peak element index:", find_peak(peaks)) # 2 (value 20)
rotated = [4, 5, 6, 7, 0, 1, 2]
print("5 in rotated array at:", search_rotated(rotated, 5)) # 1
Graph algorithms: shortest path/MST
Graph algorithms solve problems on networks of connected nodes. Two fundamental problem categories are finding shortest paths and constructing minimum spanning trees.
Shortest Path Algorithms:
1. Dijkstra’s Algorithm (Single-source shortest paths, non-negative weights)
- Time: O((V + E) log V) with min-heap
- Space: O(V)
- Greedy approach: always expand nearest unvisited node
import heapq
from collections import defaultdict
def dijkstra(graph, start):
"""
Find shortest paths from start to all vertices.
Graph: adjacency list {node: [(neighbor, weight), ...]}
Returns: {node: (distance, parent)}
"""
distances = {start: 0}
parents = {start: None}
pq = [(0, start)] # (distance, node)
visited = set()
while pq:
dist, node = heapq.heappop(pq)
if node in visited:
continue
visited.add(node)
for neighbor, weight in graph[node]:
if neighbor in visited:
continue
new_dist = dist + weight
if neighbor not in distances or new_dist < distances[neighbor]:
distances[neighbor] = new_dist
parents[neighbor] = node
heapq.heappush(pq, (new_dist, neighbor))
return distances, parents
def reconstruct_path(parents, start, end):
"""Reconstruct shortest path from start to end."""
if end not in parents:
return None
path = []
current = end
while current is not None:
path.append(current)
current = parents[current]
return path[::-1]
# Example usage:
graph = {
'A': [('B', 4), ('C', 2)],
'B': [('C', 1), ('D', 5)],
'C': [('D', 8), ('E', 10)],
'D': [('E', 2)],
'E': []
}
distances, parents = dijkstra(graph, 'A')
print("Distances from A:", distances)
# {'A': 0, 'C': 2, 'B': 4, 'D': 8, 'E': 10}
print("Path A to E:", reconstruct_path(parents, 'A', 'E'))
# ['A', 'C', 'B', 'D', 'E'] or similar shortest path
2. Bellman-Ford Algorithm (Single-source, handles negative weights)
- Time: O(VE)
- Space: O(V)
- Can detect negative cycles
def bellman_ford(graph, start, num_vertices):
"""
Find shortest paths, can handle negative weights.
Graph: list of edges [(u, v, weight), ...]
Returns: (distances, parents) or None if negative cycle
"""
distances = {i: float('inf') for i in range(num_vertices)}
distances[start] = 0
parents = {i: None for i in range(num_vertices)}
# Relax edges V-1 times
for _ in range(num_vertices - 1):
for u, v, weight in graph:
if distances[u] != float('inf') and distances[u] + weight < distances[v]:
distances[v] = distances[u] + weight
parents[v] = u
# Check for negative cycles
for u, v, weight in graph:
if distances[u] != float('inf') and distances[u] + weight < distances[v]:
return None # Negative cycle detected
return distances, parents
# Example:
edges = [
(0, 1, 4), (0, 2, 2),
(1, 2, 1), (1, 3, 5),
(2, 3, 8), (2, 4, 10),
(3, 4, 2)
]
result = bellman_ford(edges, 0, 5)
if result:
distances, parents = result
print("Distances:", distances)
else:
print("Negative cycle detected")
3. Floyd-Warshall Algorithm (All-pairs shortest paths)
- Time: O(V³)
- Space: O(V²)
- Finds shortest paths between all pairs
def floyd_warshall(graph, num_vertices):
"""
Find shortest paths between all pairs of vertices.
Graph: adjacency matrix (graph[i][j] = weight or inf)
Returns: distance matrix and next matrix for path reconstruction
"""
dist = [[float('inf')] * num_vertices for _ in range(num_vertices)]
next_node = [[None] * num_vertices for _ in range(num_vertices)]
# Initialize
for i in range(num_vertices):
dist[i][i] = 0
for i in range(num_vertices):
for j in range(num_vertices):
if graph[i][j] != float('inf'):
dist[i][j] = graph[i][j]
next_node[i][j] = j
# Floyd-Warshall main loop
for k in range(num_vertices):
for i in range(num_vertices):
for j in range(num_vertices):
if dist[i][k] + dist[k][j] < dist[i][j]:
dist[i][j] = dist[i][k] + dist[k][j]
next_node[i][j] = next_node[i][k]
return dist, next_node
def reconstruct_path_floyd(next_node, start, end):
"""Reconstruct path from Floyd-Warshall next matrix."""
if next_node[start][end] is None:
return None
path = [start]
while start != end:
start = next_node[start][end]
path.append(start)
return path
# Example:
INF = float('inf')
graph = [
[0, 4, 2, INF],
[INF, 0, 1, 5],
[INF, INF, 0, 8],
[INF, INF, INF, 0]
]
dist, next_node = floyd_warshall(graph, 4)
print("All-pairs shortest distances:")
for row in dist:
print(row)
print("Path 0 to 3:", reconstruct_path_floyd(next_node, 0, 3))
4. A* Algorithm (Heuristic-guided shortest path)
- Time: O(E) best case, O(b^d) worst (b=branching, d=depth)
- Space: O(V)
- Uses heuristic to guide search
def a_star(graph, start, goal, heuristic):
"""
Find shortest path using A* with heuristic.
heuristic: function(node, goal) -> estimated distance
"""
from heapq import heappush, heappop
open_set = [(0, start)] # (f_score, node)
came_from = {}
g_score = {start: 0}
f_score = {start: heuristic(start, goal)}
while open_set:
_, current = heappop(open_set)
if current == goal:
# Reconstruct path
path = []
while current in came_from:
path.append(current)
current = came_from[current]
path.append(start)
return path[::-1]
for neighbor, weight in graph[current]:
tentative_g = g_score[current] + weight
if neighbor not in g_score or tentative_g < g_score[neighbor]:
came_from[neighbor] = current
g_score[neighbor] = tentative_g
f_score[neighbor] = tentative_g + heuristic(neighbor, goal)
heappush(open_set, (f_score[neighbor], neighbor))
return None # No path found
# Example: Grid pathfinding with Manhattan distance
def manhattan_distance(pos1, pos2):
return abs(pos1[0] - pos2[0]) + abs(pos1[1] - pos2[1])
# Grid graph
grid_graph = {
(0, 0): [((0, 1), 1), ((1, 0), 1)],
(0, 1): [((0, 0), 1), ((1, 1), 1), ((0, 2), 1)],
(0, 2): [((0, 1), 1), ((1, 2), 1)],
(1, 0): [((0, 0), 1), ((2, 0), 1), ((1, 1), 1)],
(1, 1): [((0, 1), 1), ((1, 0), 1), ((1, 2), 1), ((2, 1), 1)],
(1, 2): [((0, 2), 1), ((1, 1), 1), ((2, 2), 1)],
(2, 0): [((1, 0), 1), ((2, 1), 1)],
(2, 1): [((2, 0), 1), ((1, 1), 1), ((2, 2), 1)],
(2, 2): [((2, 1), 1), ((1, 2), 1)]
}
path = a_star(grid_graph, (0, 0), (2, 2), manhattan_distance)
print("A* path:", path) # [(0, 0), (1, 1), (2, 2)] or similar
Minimum Spanning Tree (MST) Algorithms:
An MST connects all vertices with minimum total edge weight, forming a tree (no cycles).
1. Kruskal’s Algorithm (Edge-based, uses Union-Find)
- Time: O(E log E) or O(E log V)
- Space: O(V)
- Sort edges, add if doesn’t create cycle
class UnionFind:
def __init__(self, n):
self.parent = list(range(n))
self.rank = [0] * n
def find(self, x):
if self.parent[x] != x:
self.parent[x] = self.find(self.parent[x])
return self.parent[x]
def union(self, x, y):
root_x = self.find(x)
root_y = self.find(y)
if root_x == root_y:
return False
if self.rank[root_x] < self.rank[root_y]:
self.parent[root_x] = root_y
elif self.rank[root_x] > self.rank[root_y]:
self.parent[root_y] = root_x
else:
self.parent[root_y] = root_x
self.rank[root_x] += 1
return True
def kruskal(num_vertices, edges):
"""
Find MST using Kruskal's algorithm.
edges: list of (weight, u, v)
Returns: list of edges in MST and total weight
"""
edges = sorted(edges) # Sort by weight
uf = UnionFind(num_vertices)
mst = []
total_weight = 0
for weight, u, v in edges:
if uf.union(u, v):
mst.append((u, v, weight))
total_weight += weight
if len(mst) == num_vertices - 1:
break
return mst, total_weight
# Example:
edges = [
(1, 0, 1), (2, 0, 2), (3, 1, 2),
(4, 1, 3), (5, 2, 3), (6, 2, 4),
(7, 3, 4)
]
mst, weight = kruskal(5, edges)
print("MST edges:", mst)
print("Total weight:", weight)
2. Prim’s Algorithm (Vertex-based, uses priority queue)
- Time: O((V + E) log V) with min-heap
- Space: O(V)
- Grow MST one vertex at a time
def prim(graph, start=0):
"""
Find MST using Prim's algorithm.
graph: adjacency list {node: [(neighbor, weight), ...]}
Returns: list of edges in MST and total weight
"""
import heapq
visited = set()
mst = []
total_weight = 0
# Start with arbitrary vertex
visited.add(start)
edges = [(weight, start, neighbor) for neighbor, weight in graph[start]]
heapq.heapify(edges)
while edges and len(visited) < len(graph):
weight, u, v = heapq.heappop(edges)
if v in visited:
continue
# Add edge to MST
mst.append((u, v, weight))
total_weight += weight
visited.add(v)
# Add new edges from v
for neighbor, edge_weight in graph[v]:
if neighbor not in visited:
heapq.heappush(edges, (edge_weight, v, neighbor))
return mst, total_weight
# Example:
graph = {
0: [(1, 1), (2, 2)],
1: [(0, 1), (2, 3), (3, 4)],
2: [(0, 2), (1, 3), (3, 5), (4, 6)],
3: [(1, 4), (2, 5), (4, 7)],
4: [(2, 6), (3, 7)]
}
mst, weight = prim(graph)
print("MST edges:", mst)
print("Total weight:", weight)
Shortest Path vs MST Comparison:
| Algorithm | Problem Type | Graph Type | Time Complexity | Space | Notes |
|---|---|---|---|---|---|
| Dijkstra | Single-source | Weighted, ≥0 | O((V+E) log V) | O(V) | Greedy, min-heap |
| Bellman-Ford | Single-source | Weighted, any | O(VE) | O(V) | Detects negative cycles |
| Floyd-Warshall | All-pairs | Weighted, any | O(V³) | O(V²) | Dense graphs |
| A* | Single-pair | Weighted, ≥0 | Varies | O(V) | Heuristic-guided |
| Kruskal | MST | Weighted | O(E log E) | O(V) | Edge-based, Union-Find |
| Prim | MST | Weighted | O((V+E) log V) | O(V) | Vertex-based, min-heap |
Practical applications:
# Network routing: Find shortest path with bandwidth constraints
def dijkstra_with_bandwidth(graph, start, min_bandwidth):
"""Modified Dijkstra that only uses edges with sufficient bandwidth."""
distances = {start: 0}
parents = {start: None}
pq = [(0, start)]
visited = set()
while pq:
dist, node = heapq.heappop(pq)
if node in visited:
continue
visited.add(node)
for neighbor, weight, bandwidth in graph[node]:
if bandwidth < min_bandwidth or neighbor in visited:
continue
new_dist = dist + weight
if neighbor not in distances or new_dist < distances[neighbor]:
distances[neighbor] = new_dist
parents[neighbor] = node
heapq.heappush(pq, (new_dist, neighbor))
return distances, parents
# Road network: Find alternative routes (k shortest paths)
def k_shortest_paths(graph, start, end, k):
"""Find k shortest paths from start to end."""
import heapq
# This is a simplified version using Yen's algorithm concept
paths = []
potential_paths = [(0, [start], set())]
while len(paths) < k and potential_paths:
dist, path, removed_edges = heapq.heappop(potential_paths)
if path[-1] == end:
paths.append((dist, path))
continue
current = path[-1]
for neighbor, weight in graph[current]:
if neighbor not in path:
new_path = path + [neighbor]
new_dist = dist + weight
heapq.heappush(potential_paths, (new_dist, new_path, removed_edges))
return paths
# MST with degree constraints
def mst_degree_constrained(graph, max_degree):
"""
Find MST where no vertex has degree > max_degree.
This is NP-hard in general; this is a greedy approximation.
"""
# Track degree of each vertex
degree = defaultdict(int)
mst = []
total_weight = 0
uf = UnionFind(len(graph))
# Sort edges by weight
all_edges = []
for u in graph:
for v, weight in graph[u]:
if u < v: # Avoid duplicates
all_edges.append((weight, u, v))
all_edges.sort()
for weight, u, v in all_edges:
# Check degree constraint and cycle
if (degree[u] < max_degree and degree[v] < max_degree and
uf.union(u, v)):
mst.append((u, v, weight))
total_weight += weight
degree[u] += 1
degree[v] += 1
if len(mst) == len(graph) - 1:
break
return mst, total_weight
Dynamic programming with structures
Dynamic Programming (DP) optimizes by storing solutions to subproblems, avoiding redundant computation. The choice of data structure significantly impacts DP efficiency.
Common DP data structures:
- 1D Array/List: For problems with single-dimensional state
- 2D Array/Matrix: For problems with two-dimensional state
- Hash Table/Dictionary: For sparse state space or complex state representation
- Custom structures: Trees, graphs for hierarchical/networked states
Classic DP problems with data structure usage:
1. Fibonacci (1D array)
# Top-down with memoization (dict)
def fib_memo(n, memo=None):
if memo is None:
memo = {}
if n in memo:
return memo[n]
if n <= 1:
return n
memo[n] = fib_memo(n-1, memo) + fib_memo(n-2, memo)
return memo[n]
# Bottom-up with array
def fib_dp(n):
if n <= 1:
return n
dp = [0] * (n + 1)
dp[1] = 1
for i in range(2, n + 1):
dp[i] = dp[i-1] + dp[i-2]
return dp[n]
# Space-optimized (constant space)
def fib_optimized(n):
if n <= 1:
return n
prev2, prev1 = 0, 1
for _ in range(2, n + 1):
current = prev1 + prev2
prev2, prev1 = prev1, current
return prev1
2. 0/1 Knapsack (2D array)
def knapsack_01(weights, values, capacity):
"""
0/1 Knapsack: maximize value with weight constraint.
dp[i][w] = max value using first i items with capacity w
"""
n = len(weights)
dp = [[0] * (capacity + 1) for _ in range(n + 1)]
for i in range(1, n + 1):
for w in range(capacity + 1):
# Don't take item i-1
dp[i][w] = dp[i-1][w]
# Take item i-1 if it fits
if weights[i-1] <= w:
dp[i][w] = max(dp[i][w],
dp[i-1][w - weights[i-1]] + values[i-1])
return dp[n][capacity]
# Space-optimized version (1D array)
def knapsack_01_optimized(weights, values, capacity):
dp = [0] * (capacity + 1)
for i in range(len(weights)):
# Traverse backwards to avoid using updated values
for w in range(capacity, weights[i] - 1, -1):
dp[w] = max(dp[w], dp[w - weights[i]] + values[i])
return dp[capacity]
# With item tracking
def knapsack_with_items(weights, values, capacity):
n = len(weights)
dp = [[0] * (capacity + 1) for _ in range(n + 1)]
# Fill DP table
for i in range(1, n + 1):
for w in range(capacity + 1):
dp[i][w] = dp[i-1][w]
if weights[i-1] <= w:
dp[i][w] = max(dp[i][w],
dp[i-1][w - weights[i-1]] + values[i-1])
# Reconstruct solution
w = capacity
items = []
for i in range(n, 0, -1):
if dp[i][w] != dp[i-1][w]:
items.append(i-1)
w -= weights[i-1]
return dp[n][capacity], items[::-1]
# Example:
weights = [2, 3, 4, 5]
values = [3, 4, 5, 6]
capacity = 8
max_value, items = knapsack_with_items(weights, values, capacity)
print(f"Max value: {max_value}, Items: {items}")
3. Longest Common Subsequence (2D array)
def lcs(s1, s2):
"""
Find length of longest common subsequence.
dp[i][j] = LCS length of s1[:i] and s2[:j]
"""
m, n = len(s1), len(s2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i-1] == s2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
return dp[m][n]
# With subsequence reconstruction
def lcs_with_string(s1, s2):
m, n = len(s1), len(s2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Fill DP table
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i-1] == s2[j-1]:
dp[i][j] = dp[i-1][j-1] + 1
else:
dp[i][j] = max(dp[i-1][j], dp[i][j-1])
# Reconstruct LCS
lcs_str = []
i, j = m, n
while i > 0 and j > 0:
if s1[i-1] == s2[j-1]:
lcs_str.append(s1[i-1])
i -= 1
j -= 1
elif dp[i-1][j] > dp[i][j-1]:
i -= 1
else:
j -= 1
return dp[m][n], ''.join(reversed(lcs_str))
# Example:
s1, s2 = "ABCDGH", "AEDFHR"
length, lcs_str = lcs_with_string(s1, s2)
print(f"LCS length: {length}, LCS: {lcs_str}") # 3, "ADH"
4. Edit Distance (2D array)
def edit_distance(s1, s2):
"""
Minimum edits (insert, delete, replace) to transform s1 to s2.
dp[i][j] = min edits to transform s1[:i] to s2[:j]
"""
m, n = len(s1), len(s2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
# Base cases
for i in range(m + 1):
dp[i][0] = i # Delete all characters
for j in range(n + 1):
dp[0][j] = j # Insert all characters
# Fill DP table
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i-1] == s2[j-1]:
dp[i][j] = dp[i-1][j-1] # No operation needed
else:
dp[i][j] = 1 + min(
dp[i-1][j], # Delete from s1
dp[i][j-1], # Insert to s1
dp[i-1][j-1] # Replace in s1
)
return dp[m][n]
# With operation tracking
def edit_distance_with_ops(s1, s2):
m, n = len(s1), len(s2)
dp = [[0] * (n + 1) for _ in range(m + 1)]
ops = [[None] * (n + 1) for _ in range(m + 1)]
# Base cases
for i in range(m + 1):
dp[i][0] = i
if i > 0:
ops[i][0] = 'delete'
for j in range(n + 1):
dp[0][j] = j
if j > 0:
ops[0][j] = 'insert'
# Fill DP table
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i-1] == s2[j-1]:
dp[i][j] = dp[i-1][j-1]
ops[i][j] = 'match'
else:
costs = [
(dp[i-1][j] + 1, 'delete'),
(dp[i][j-1] + 1, 'insert'),
(dp[i-1][j-1] + 1, 'replace')
]
min_cost, op = min(costs)
dp[i][j] = min_cost
ops[i][j] = op
# Reconstruct operations
operations = []
i, j = m, n
while i > 0 or j > 0:
op = ops[i][j]
if op == 'match':
i -= 1
j -= 1
elif op == 'delete':
operations.append(f"Delete '{s1[i-1]}' at {i-1}")
i -= 1
elif op == 'insert':
operations.append(f"Insert '{s2[j-1]}' at {i}")
j -= 1
elif op == 'replace':
operations.append(f"Replace '{s1[i-1]}' with '{s2[j-1]}' at {i-1}")
i -= 1
j -= 1
return dp[m][n], operations[::-1]
# Example:
s1, s2 = "kitten", "sitting"
dist, ops = edit_distance_with_ops(s1, s2)
print(f"Edit distance: {dist}")
print("Operations:", ops)
5. Coin Change (1D array or dict)
# Minimum coins needed
def coin_change_min(coins, amount):
"""
Minimum number of coins to make amount.
dp[i] = min coins to make amount i
"""
dp = [float('inf')] * (amount + 1)
dp[0] = 0
for i in range(1, amount + 1):
for coin in coins:
if coin <= i:
dp[i] = min(dp[i], dp[i - coin] + 1)
return dp[amount] if dp[amount] != float('inf') else -1
# Number of ways to make amount
def coin_change_ways(coins, amount):
"""
Number of ways to make amount with given coins.
dp[i] = number of ways to make amount i
"""
dp = [0] * (amount + 1)
dp[0] = 1
for coin in coins:
for i in range(coin, amount + 1):
dp[i] += dp[i - coin]
return dp[amount]
# With coin tracking
def coin_change_with_coins(coins, amount):
dp = [float('inf')] * (amount + 1)
dp[0] = 0
parent = [-1] * (amount + 1)
for i in range(1, amount + 1):
for coin in coins:
if coin <= i and dp[i - coin] + 1 < dp[i]:
dp[i] = dp[i - coin] + 1
parent[i] = coin
if dp[amount] == float('inf'):
return -1, []
# Reconstruct coins used
result = []
curr = amount
while curr > 0:
coin = parent[curr]
result.append(coin)
curr -= coin
return dp[amount], result
# Example:
coins = [1, 5, 10, 25]
amount = 63
min_coins, coin_list = coin_change_with_coins(coins, amount)
print(f"Min coins: {min_coins}, Coins used: {coin_list}")
ways = coin_change_ways(coins, amount)
print(f"Number of ways: {ways}")
6. DP with Hash Table (Climbing Stairs with variable steps)
def climbing_stairs_memo(n, allowed_steps, memo=None):
"""
Count ways to climb n stairs with allowed step sizes.
Uses hash table for memoization with arbitrary allowed steps.
"""
if memo is None:
memo = {}
if n in memo:
return memo[n]
if n == 0:
return 1
if n < 0:
return 0
ways = 0
for step in allowed_steps:
ways += climbing_stairs_memo(n - step, allowed_steps, memo)
memo[n] = ways
return ways
# Bottom-up with array
def climbing_stairs_dp(n, allowed_steps):
dp = [0] * (n + 1)
dp[0] = 1
for i in range(1, n + 1):
for step in allowed_steps:
if i >= step:
dp[i] += dp[i - step]
return dp[n]
# Example:
n = 10
allowed_steps = [1, 2, 3]
print(f"Ways to climb {n} stairs: {climbing_stairs_dp(n, allowed_steps)}")
7. DP on Trees (Tree diameter)
class TreeNode:
def __init__(self, val=0):
self.val = val
self.children = []
def tree_diameter(root):
"""
Find diameter (longest path between any two nodes) of tree.
Uses DP on tree structure.
"""
diameter = [0]
def dfs(node):
if not node.children:
return 0
# Get two longest paths from children
max_depths = []
for child in node.children:
max_depths.append(dfs(child) + 1)
max_depths.sort(reverse=True)
# Update diameter with path through this node
if len(max_depths) >= 2:
diameter[0] = max(diameter[0], max_depths[0] + max_depths[1])
elif len(max_depths) == 1:
diameter[0] = max(diameter[0], max_depths[0])
return max_depths[0] if max_depths else 0
dfs(root)
return diameter[0]
# Example:
# 1
# /|\
# 2 3 4
# /| |
# 5 6 7
root = TreeNode(1)
root.children = [TreeNode(2), TreeNode(3), TreeNode(4)]
root.children[0].children = [TreeNode(5), TreeNode(6)]
root.children[2].children = [TreeNode(7)]
print(f"Tree diameter: {tree_diameter(root)}") # 4 (5->2->1->4->7)
DP Structure Selection Guide:
| Problem Characteristics | Recommended Structure | Example |
|---|---|---|
| Single parameter state | 1D array | Fibonacci, Climbing Stairs |
| Two parameter state | 2D array/matrix | LCS, Edit Distance, Knapsack |
| Sparse state space | Hash table/dict | Large range with few states |
| Negative/non-sequential indices | Hash table/dict | Subarray sum problems |
| Tree/graph structure | Custom (with dict/array) | Tree DP, Graph DP |
| State is complex object | Hash table with tuple keys | State machines, game states |
| Fixed small state count | Array with state encoding | Bitmask DP |
Space optimization techniques:
# Rolling array: Use only last k rows of 2D DP
def lcs_space_optimized(s1, s2):
"""LCS using only two rows instead of full matrix."""
m, n = len(s1), len(s2)
prev = [0] * (n + 1)
curr = [0] * (n + 1)
for i in range(1, m + 1):
for j in range(1, n + 1):
if s1[i-1] == s2[j-1]:
curr[j] = prev[j-1] + 1
else:
curr[j] = max(prev[j], curr[j-1])
prev, curr = curr, prev
return prev[n]
# State compression: Use bit manipulation for states
def tsp_bitmask_dp(dist):
"""
Traveling Salesman Problem using bitmask DP.
State: (current_city, visited_set) encoded as (city, mask)
"""
n = len(dist)
ALL_VISITED = (1 << n) - 1
# dp[mask][city] = min cost to visit cities in mask ending at city
dp = [[float('inf')] * n for _ in range(1 << n)]
dp[1][0] = 0 # Start at city 0
for mask in range(1 << n):
for u in range(n):
if dp[mask][u] == float('inf'):
continue
for v in range(n):
if mask & (1 << v): # Already visited
continue
new_mask = mask | (1 << v)
dp[new_mask][v] = min(dp[new_mask][v],
dp[mask][u] + dist[u][v])
# Find minimum cost to visit all cities and return to start
min_cost = float('inf')
for city in range(1, n):
min_cost = min(min_cost, dp[ALL_VISITED][city] + dist[city][0])
return min_cost
# Example:
dist = [
[0, 10, 15, 20],
[10, 0, 35, 25],
[15, 35, 0, 30],
[20, 25, 30, 0]
]
print(f"TSP min cost: {tsp_bitmask_dp(dist)}")
Complexity-driven design
Complexity-driven design means choosing data structures and algorithms based on time/space constraints and input characteristics.
Decision framework:
1. Analyze constraints:
- Input size (n)
- Time limit
- Space limit
- Query frequency
2. Calculate required complexity:
- 10^8 operations per second (rough estimate)
- If n ≤ 10: O(n!) might work
- If n ≤ 20: O(2^n) might work
- If n ≤ 500: O(n^3) might work
- If n ≤ 5000: O(n^2) might work
- If n ≤ 10^6: O(n log n) required
- If n ≤ 10^8: O(n) or O(log n) required
3. Choose structure/algorithm:
- Match complexity to constraints
- Consider trade-offs
Example scenarios:
Scenario 1: Frequent membership queries
# Bad: O(n) per query
def membership_list(items, queries):
"""Linear search for each query - O(m*n) total."""
results = []
for q in queries:
results.append(q in items) # O(n) for list
return results
# Good: O(1) average per query
def membership_set(items, queries):
"""Hash set for each query - O(n + m) total."""
item_set = set(items) # O(n) setup
results = []
for q in queries:
results.append(q in item_set) # O(1) average
return results
# Analysis:
# If n = 10^6, m = 10^6:
# List approach: 10^12 operations (too slow!)
# Set approach: 2*10^6 operations (fast!)
Scenario 2: Range queries
# Bad: O(n) per query
def range_sum_naive(arr, queries):
"""Calculate sum for each query - O(m*n) total."""
results = []
for left, right in queries:
results.append(sum(arr[left:right+1]))
return results
# Good: O(1) per query with O(n) preprocessing
def range_sum_prefix(arr, queries):
"""Prefix sum array - O(n + m) total."""
n = len(arr)
prefix = [0] * (n + 1)
# Build prefix sum: O(n)
for i in range(n):
prefix[i + 1] = prefix[i] + arr[i]
# Answer queries: O(1) each
results = []
for left, right in queries:
results.append(prefix[right + 1] - prefix[left])
return results
# For dynamic updates: use Segment Tree or Fenwick Tree
class FenwickTree:
def __init__(self, n):
self.n = n
self.tree = [0] * (n + 1)
def update(self, i, delta):
"""Add delta to element at index i. O(log n)"""
i += 1 # 1-indexed
while i <= self.n:
self.tree[i] += delta
i += i & (-i)
def query(self, i):
"""Sum of elements [0, i]. O(log n)"""
i += 1
s = 0
while i > 0:
s += self.tree[i]
i -= i & (-i)
return s
def range_query(self, left, right):
"""Sum of elements [left, right]. O(log n)"""
if left == 0:
return self.query(right)
return self.query(right) - self.query(left - 1)
# Complexity comparison:
# Static array, m queries:
# Naive: O(m*n)
# Prefix sum: O(n + m)
#
# Dynamic array (u updates, m queries):
# Naive: O(u + m*n)
# Fenwick/Segment Tree: O((u + m) log n)
Scenario 3: Sorted data operations
# Design choice based on operations
class SortedStructure:
"""
Choose implementation based on operation frequency:
- Mostly searches: Sorted array + binary search
- Mostly insertions: Balanced BST or sorted list
- Mix: Balanced BST
"""
def __init__(self, use_bst=True):
if use_bst:
# Use balanced BST (e.g., sortedcontainers.SortedList in Python)
from sortedcontainers import SortedList
self.data = SortedList()
else:
# Use sorted array
self.data = []
def insert(self, val):
if isinstance(self.data, list):
# Sorted array: O(n) insertion
import bisect
bisect.insort(self.data, val)
else:
# Balanced BST: O(log n) insertion
self.data.add(val)
def search(self, val):
if isinstance(self.data, list):
# Sorted array: O(log n) search
import bisect
idx = bisect.bisect_left(self.data, val)
return idx < len(self.data) and self.data[idx] == val
else:
# Balanced BST: O(log n) search
return val in self.data
# Decision matrix:
# Operations (n total):
# - 90% search, 10% insert: Use sorted array
# - 50% search, 50% insert: Use balanced BST
# - Bulk inserts then only searches: Sort once, then binary search
Scenario 4: Graph algorithm selection
def choose_shortest_path_algorithm(graph_info):
"""
Select algorithm based on graph characteristics.
"""
num_vertices = graph_info['vertices']
num_edges = graph_info['edges']
has_negative_weights = graph_info['negative_weights']
is_sparse = num_edges < num_vertices * num_vertices / 2
query_type = graph_info['query_type'] # 'single', 'all-pairs'
if query_type == 'single':
if has_negative_weights:
return "Bellman-Ford", "O(VE)"
else:
return "Dijkstra", "O((V+E) log V)"
else: # all-pairs
if is_sparse and not has_negative_weights:
return "Dijkstra V times", "O(V(V+E) log V)"
else:
return "Floyd-Warshall", "O(V³)"
# Example decision:
graphs = [
{'vertices': 1000, 'edges': 5000, 'negative_weights': False, 'query_type': 'single'},
{'vertices': 100, 'edges': 4000, 'negative_weights': False, 'query_type': 'all-pairs'},
{'vertices': 1000, 'edges': 2000, 'negative_weights': True, 'query_type': 'single'}
]
for g in graphs:
algo, complexity = choose_shortest_path_algorithm(g)
print(f"Graph: V={g['vertices']}, E={g['edges']}")
print(f" Choose: {algo} ({complexity})\n")
Scenario 5: String matching
def choose_string_matching(pattern_info):
"""
Select string matching algorithm based on use case.
"""
pattern_count = pattern_info['patterns']
pattern_length = pattern_info['pattern_length']
text_length = pattern_info['text_length']
query_frequency = pattern_info['queries']
if pattern_count == 1:
if query_frequency == 'once':
return "KMP or Boyer-Moore", "O(n + m)"
else:
return "Build suffix array/tree", "O(n) build, O(m) per query"
else:
if query_frequency == 'once':
return "Aho-Corasick", "O(n + m + matches)"
else:
return "Trie or suffix tree", "O(total pattern length) build"
# Practical implementations
class StringMatcher:
# KMP for single pattern, multiple searches
def kmp_build_table(self, pattern):
"""Build KMP failure function. O(m)"""
m = len(pattern)
table = [0] * m
j = 0
for i in range(1, m):
while j > 0 and pattern[i] != pattern[j]:
j = table[j - 1]
if pattern[i] == pattern[j]:
j += 1
table[i] = j
return table
def kmp_search(self, text, pattern, table):
"""Search using KMP. O(n)"""
n, m = len(text), len(pattern)
matches = []
j = 0
for i in range(n):
while j > 0 and text[i] != pattern[j]:
j = table[j - 1]
if text[i] == pattern[j]:
j += 1
if j == m:
matches.append(i - m + 1)
j = table[j - 1]
return matches
# Aho-Corasick for multiple patterns
def build_aho_corasick(self, patterns):
"""Build Aho-Corasick automaton."""
from collections import deque, defaultdict
# Build trie
trie = {}
output = defaultdict(list)
node_id = [0]
def add_pattern(pattern, pattern_id):
node = 0
for char in pattern:
if (node, char) not in trie:
node_id[0] += 1
trie[(node, char)] = node_id[0]
node = trie[(node, char)]
output[node].append(pattern_id)
for i, pattern in enumerate(patterns):
add_pattern(pattern, i)
# Build failure links (simplified)
# In production, use proper Aho-Corasick implementation
return trie, output
Complexity-driven design principles:
- Analyze first, optimize later
- Measure actual performance
- Identify bottlenecks
- Don’t prematurely optimize
- Trade-offs
- Time vs Space: Hash tables use more space for faster lookups
- Preprocessing vs Query: Precompute when many queries expected
- Average vs Worst case: Quick sort (fast average) vs Merge sort (guaranteed)
- Amortized analysis
- Dynamic array append: O(1) amortized
- Union-Find with path compression: O(α(n)) amortized
- Data structure augmentation
- Add metadata to improve specific operations
- Example: Size field in tree nodes for rank queries
class AugmentedBST:
"""BST with subtree size for order statistics."""
class Node:
def __init__(self, val):
self.val = val
self.left = None
self.right = None
self.size = 1 # Augmentation: size of subtree
def __init__(self):
self.root = None
def size(self, node):
return node.size if node else 0
def update_size(self, node):
if node:
node.size = 1 + self.size(node.left) + self.size(node.right)
def kth_smallest(self, k):
"""Find kth smallest element. O(log n) with augmentation."""
def helper(node, k):
if not node:
return None
left_size = self.size(node.left)
if k == left_size + 1:
return node.val
elif k <= left_size:
return helper(node.left, k)
else:
return helper(node.right, k - left_size - 1)
return helper(self.root, k)
Practical optimization patterns
Common algorithmic patterns that frequently appear in problem-solving.
1. Two Pointers
Used for problems involving arrays or linked lists where you need to track two positions.
# Pattern: Opposite direction pointers
def two_sum_sorted(arr, target):
"""Find pair that sums to target in sorted array."""
left, right = 0, len(arr) - 1
while left < right:
current_sum = arr[left] + arr[right]
if current_sum == target:
return (left, right)
elif current_sum < target:
left += 1
else:
right -= 1
return None
# Pattern: Same direction pointers (fast/slow)
def remove_duplicates(arr):
"""Remove duplicates from sorted array in-place."""
if not arr:
return 0
slow = 0
for fast in range(1, len(arr)):
if arr[fast] != arr[slow]:
slow += 1
arr[slow] = arr[fast]
return slow + 1
# Pattern: Slow and fast pointers (cycle detection)
def has_cycle(head):
"""Detect cycle in linked list."""
slow = fast = head
while fast and fast.next:
slow = slow.next
fast = fast.next.next
if slow == fast:
return True
return False
# Pattern: Partition (Dutch National Flag)
def sort_colors(arr):
"""Sort array of 0s, 1s, 2s in-place. O(n) one pass."""
low, mid, high = 0, 0, len(arr) - 1
while mid <= high:
if arr[mid] == 0:
arr[low], arr[mid] = arr[mid], arr[low]
low += 1
mid += 1
elif arr[mid] == 1:
mid += 1
else: # arr[mid] == 2
arr[mid], arr[high] = arr[high], arr[mid]
high -= 1
return arr
# Example:
arr = [2, 0, 2, 1, 1, 0]
print(sort_colors(arr)) # [0, 0, 1, 1, 2, 2]
2. Sliding Window
Maintain a window over contiguous elements, useful for subarray/substring problems.
# Pattern: Fixed-size window
def max_sum_subarray(arr, k):
"""Maximum sum of subarray of size k."""
if len(arr) < k:
return None
# Initial window
window_sum = sum(arr[:k])
max_sum = window_sum
# Slide window
for i in range(k, len(arr)):
window_sum = window_sum - arr[i - k] + arr[i]
max_sum = max(max_sum, window_sum)
return max_sum
# Pattern: Variable-size window
def longest_substring_k_distinct(s, k):
"""Longest substring with at most k distinct characters."""
from collections import defaultdict
char_count = defaultdict(int)
left = 0
max_length = 0
for right in range(len(s)):
char_count[s[right]] += 1
# Shrink window if too many distinct chars
while len(char_count) > k:
char_count[s[left]] -= 1
if char_count[s[left]] == 0:
del char_count[s[left]]
left += 1
max_length = max(max_length, right - left + 1)
return max_length
# Pattern: Expand window with condition
def min_window_substring(s, t):
"""Minimum window in s that contains all characters of t."""
from collections import Counter
if not s or not t:
return ""
required = Counter(t)
need = len(required)
formed = 0
window_counts = {}
left = 0
min_len = float('inf')
min_window = (0, 0)
for right in range(len(s)):
char = s[right]
window_counts[char] = window_counts.get(char, 0) + 1
if char in required and window_counts[char] == required[char]:
formed += 1
# Shrink window while valid
while formed == need and left <= right:
# Update result
if right - left + 1 < min_len:
min_len = right - left + 1
min_window = (left, right + 1)
# Remove leftmost character
char = s[left]
window_counts[char] -= 1
if char in required and window_counts[char] < required[char]:
formed -= 1
left += 1
return s[min_window[0]:min_window[1]] if min_len != float('inf') else ""
# Example:
s = "ADOBECODEBANC"
t = "ABC"
print(min_window_substring(s, t)) # "BANC"
3. Prefix Sum / Cumulative Sum
Precompute cumulative sums for efficient range queries.
# Pattern: Basic prefix sum
class PrefixSum:
def __init__(self, arr):
self.prefix = [0]
for num in arr:
self.prefix.append(self.prefix[-1] + num)
def range_sum(self, left, right):
"""Sum of arr[left:right+1]. O(1)"""
return self.prefix[right + 1] - self.prefix[left]
# Pattern: 2D prefix sum
class Matrix2DPrefixSum:
def __init__(self, matrix):
if not matrix or not matrix[0]:
self.prefix = []
return
m, n = len(matrix), len(matrix[0])
self.prefix = [[0] * (n + 1) for _ in range(m + 1)]
for i in range(1, m + 1):
for j in range(1, n + 1):
self.prefix[i][j] = (matrix[i-1][j-1] +
self.prefix[i-1][j] +
self.prefix[i][j-1] -
self.prefix[i-1][j-1])
def region_sum(self, r1, c1, r2, c2):
"""Sum of submatrix from (r1,c1) to (r2,c2). O(1)"""
return (self.prefix[r2+1][c2+1] -
self.prefix[r1][c2+1] -
self.prefix[r2+1][c1] +
self.prefix[r1][c1])
# Pattern: Prefix sum with hash map
def subarray_sum_equals_k(arr, k):
"""Count subarrays with sum equal to k."""
from collections import defaultdict
prefix_sum = 0
count = 0
sum_count = defaultdict(int)
sum_count[0] = 1 # Empty prefix
for num in arr:
prefix_sum += num
# If (prefix_sum - k) exists, we found a subarray
if prefix_sum - k in sum_count:
count += sum_count[prefix_sum - k]
sum_count[prefix_sum] += 1
return count
# Example:
arr = [1, 2, 3, 4, 5]
ps = PrefixSum(arr)
print(ps.range_sum(1, 3)) # 2+3+4 = 9
arr2 = [1, 1, 1, 1]
print(subarray_sum_equals_k(arr2, 2)) # 3 subarrays: [1,1], [1,1], [1,1]
4. Monotonic Stack/Queue
Maintain elements in monotonic order for efficient min/max queries.
# Pattern: Next greater element
def next_greater_element(arr):
"""For each element, find next greater element to the right."""
n = len(arr)
result = [-1] * n
stack = [] # Monotonic decreasing stack (indices)
for i in range(n):
while stack and arr[stack[-1]] < arr[i]:
idx = stack.pop()
result[idx] = arr[i]
stack.append(i)
return result
# Pattern: Largest rectangle in histogram
def largest_rectangle_histogram(heights):
"""Find largest rectangle area in histogram."""
stack = [] # Monotonic increasing stack
max_area = 0
heights = heights + [0] # Sentinel
for i, h in enumerate(heights):
while stack and heights[stack[-1]] > h:
height_idx = stack.pop()
height = heights[height_idx]
width = i if not stack else i - stack[-1] - 1
max_area = max(max_area, height * width)
stack.append(i)
return max_area
# Pattern: Sliding window maximum
def sliding_window_maximum(arr, k):
"""Maximum in each window of size k."""
from collections import deque
dq = deque() # Monotonic decreasing deque (indices)
result = []
for i in range(len(arr)):
# Remove elements outside window
while dq and dq[0] <= i - k:
dq.popleft()
# Maintain decreasing order
while dq and arr[dq[-1]] < arr[i]:
dq.pop()
dq.append(i)
# Add to result when window is full
if i >= k - 1:
result.append(arr[dq[0]])
return result
# Example:
arr = [4, 5, 2, 10, 8]
print("Next greater:", next_greater_element(arr))
# [5, 10, 10, -1, -1]
heights = [2, 1, 5, 6, 2, 3]
print("Largest rectangle:", largest_rectangle_histogram(heights))
# 10 (5*2)
arr = [1, 3, -1, -3, 5, 3, 6, 7]
print("Sliding max:", sliding_window_maximum(arr, 3))
# [3, 3, 5, 5, 6, 7]
5. Binary Search Patterns
Binary search on answer space, not just for finding elements.
# Pattern: Binary search on answer
def min_eating_speed(piles, h):
"""
Minimum speed k to eat all bananas in h hours.
Binary search on the answer (speed).
"""
def can_finish(speed):
hours = 0
for pile in piles:
hours += (pile + speed - 1) // speed # Ceiling division
return hours <= h
left, right = 1, max(piles)
while left < right:
mid = left + (right - left) // 2
if can_finish(mid):
right = mid # Try smaller speed
else:
left = mid + 1 # Need faster speed
return left
# Pattern: Binary search for range
def search_range(arr, target):
"""Find first and last position of target in sorted array."""
def binary_search_bound(find_first):
left, right = 0, len(arr)
while left < right:
mid = left + (right - left) // 2
if arr[mid] > target or (find_first and arr[mid] == target):
right = mid
else:
left = mid + 1
return left
first = binary_search_bound(True)
if first == len(arr) or arr[first] != target:
return [-1, -1]
last = binary_search_bound(False) - 1
return [first, last]
# Pattern: Binary search on 2D matrix
def search_matrix(matrix, target):
"""Search in row and column sorted matrix."""
if not matrix or not matrix[0]:
return False
# Start from top-right corner
row, col = 0, len(matrix[0]) - 1
while row < len(matrix) and col >= 0:
if matrix[row][col] == target:
return True
elif matrix[row][col] > target:
col -= 1 # Move left
else:
row += 1 # Move down
return False
# Example:
piles = [3, 6, 7, 11]
h = 8
print("Min eating speed:", min_eating_speed(piles, h)) # 4
arr = [5, 7, 7, 8, 8, 10]
print("Range of 8:", search_range(arr, 8)) # [3, 4]
6. Greedy Patterns
Make locally optimal choices for globally optimal solution.
# Pattern: Interval scheduling
def max_non_overlapping_intervals(intervals):
"""Maximum number of non-overlapping intervals."""
if not intervals:
return 0
# Sort by end time
intervals.sort(key=lambda x: x[1])
count = 1
end = intervals[0][1]
for i in range(1, len(intervals)):
if intervals[i][0] >= end:
count += 1
end = intervals[i][1]
return count
# Pattern: Greedy with sorting
def min_arrows_burst_balloons(points):
"""Minimum arrows needed to burst all balloons."""
if not points:
return 0
points.sort(key=lambda x: x[1])
arrows = 1
end = points[0][1]
for start, finish in points[1:]:
if start > end:
arrows += 1
end = finish
return arrows
# Pattern: Two-pass greedy
def candy_distribution(ratings):
"""Minimum candies to distribute based on ratings."""
n = len(ratings)
candies = [1] * n
# Left to right: ensure right neighbor gets more if higher rating
for i in range(1, n):
if ratings[i] > ratings[i-1]:
candies[i] = candies[i-1] + 1
# Right to left: ensure left neighbor gets more if higher rating
for i in range(n-2, -1, -1):
if ratings[i] > ratings[i+1]:
candies[i] = max(candies[i], candies[i+1] + 1)
return sum(candies)
# Example:
intervals = [[1,2], [2,3], [3,4], [1,3]]
print("Max non-overlapping:", max_non_overlapping_intervals(intervals)) # 3
balloons = [[10,16], [2,8], [1,6], [7,12]]
print("Min arrows:", min_arrows_burst_balloons(balloons)) # 2
ratings = [1, 0, 2]
print("Min candies:", candy_distribution(ratings)) # 5 (2+1+2)
Pattern selection guide:
| Pattern | When to Use | Complexity |
|---|---|---|
| Two Pointers | Array/list with sorted property or need to track two positions | O(n) |
| Sliding Window | Contiguous subarray/substring problems | O(n) |
| Prefix Sum | Multiple range sum queries on static array | O(1) query |
| Monotonic Stack | Next/previous greater/smaller element | O(n) |
| Binary Search | Sorted data or search answer space | O(log n) |
| Greedy | Locally optimal leads to global optimal | Varies |
| DP | Overlapping subproblems, optimal substructure | Varies |
| BFS/DFS | Graph/tree traversal, shortest path | O(V+E) |