Arrays

Contiguous memory blocks supporting O(1) indexing.

1. Static vs dynamic arrays

Understanding fixed-size and resizable array implementations:

  • Static arrays:
    • Definition: Fixed size determined at compile time or initialization
    • Memory: Allocated on stack (small) or heap (large)
    • Examples: C arrays int arr[100], Java arrays int[] arr = new int[100]
    • Advantages:
      • No overhead for resizing
      • Predictable memory usage
      • Slightly faster access (no indirection)
    • Disadvantages:
      • Cannot grow or shrink
      • Must know size in advance
      • Wasted space if overallocated
  • Dynamic arrays:
    • Definition: Resizable arrays that grow as needed
    • Examples: C++ std::vector, Java ArrayList, Python list
    • Implementation: Array with capacity tracking
      class DynamicArray:
          def __init__(self):
              self.capacity = 1
              self.size = 0
              self.array = [None] * self.capacity
      
    • Growth strategy: Double capacity when full (typical)
    • Advantages:
      • Flexible size
      • Easy to use
      • Amortized O(1) append
    • Disadvantages:
      • Occasional expensive resize operation
      • Extra memory overhead (capacity > size)
      • Pointer indirection
  • Comparison table:

    Operation Static Array Dynamic Array
    Access O(1) O(1)
    Append N/A O(1) amortized
    Insert O(n) O(n)
    Delete O(n) O(n)
    Memory Exact 1.5-2x actual size

2. Amortized analysis of dynamic resizing

Understanding the cost of growing dynamic arrays:

  • Resizing strategy:
    • When array is full, allocate new array with 2x capacity
    • Copy all elements to new array
    • Free old array
  • Cost analysis:
    • Single resize operation: O(n) where n = current size
    • But resizes happen infrequently
  • Amortized analysis (aggregate method):
    • Insert n elements starting from empty array
    • Resizes occur at sizes: 1, 2, 4, 8, 16, …, n
    • Total copy cost: 1 + 2 + 4 + 8 + … + n = 2n - 1 = O(n)
    • Average cost per insertion: O(n)/n = O(1)
    • Result: O(1) amortized time per append
  • Accounting method:
    • Charge $3 per insertion
    • $1 for actual insertion
    • $2 saved as “credit” for future resize
    • When resize happens, use accumulated credits
  • Growth factors:

    Factor Memory Overhead Resize Frequency
    1.5x Lower (~33%) More frequent
    2x Higher (~50%) Less frequent
    Trade-off: Space vs time    
  • Shrinking strategy:
    • Shrink when size < capacity/4 (not capacity/2 to avoid thrashing)
    • Reduce capacity to capacity/2

3. Common operations: insert/delete/search

Analyzing fundamental array operations:

  • Access by index:
    element = arr[i]  # O(1)
    
    • Direct memory address calculation: base_address + i * element_size
    • Constant time regardless of array size
  • Search:
    • Linear search: O(n) - check each element
      def linear_search(arr, target):
          for i in range(len(arr)):
              if arr[i] == target:
                  return i
          return -1
      
    • Binary search: O(log n) - only for sorted arrays
      def binary_search(arr, target):
          left, right = 0, len(arr) - 1
          while left <= right:
              mid = left + (right - left) // 2
              if arr[mid] == target:
                  return mid
              elif arr[mid] < target:
                  left = mid + 1
              else:
                  right = mid - 1
          return -1
      
  • Insert:
    • At end: O(1) amortized (if space available)
    • At position i: O(n) - shift elements right
      def insert(arr, index, value):
          arr.append(None)  # Ensure space
          for i in range(len(arr)-1, index, -1):
              arr[i] = arr[i-1]
          arr[index] = value
      
    • At beginning: O(n) - worst case, shift all elements
  • Delete:
    • At end: O(1) - just decrement size
    • At position i: O(n) - shift elements left
      def delete(arr, index):
          for i in range(index, len(arr)-1):
              arr[i] = arr[i+1]
          arr.pop()  # Remove last element
      
    • By value: O(n) - search + delete
  • Update:
    arr[i] = new_value  # O(1)
    

4. Multidimensional arrays and matrices

Working with arrays of arrays:

  • 2D arrays (matrices):
    • Declaration:
      # Python
      matrix = [[0] * cols for _ in range(rows)]
          
      # Java
      int[][] matrix = new int[rows][cols];
          
      # C++
      vector<vector<int>> matrix(rows, vector<int>(cols));
      
  • Memory layouts:
    • Row-major order (C, C++, Python):
      • Store rows contiguously: [row0][row1][row2]...
      • Address: base + (i * cols + j) * element_size
      • Better cache locality when iterating by rows
    • Column-major order (Fortran, MATLAB):
      • Store columns contiguously: [col0][col1][col2]...
      • Address: base + (j * rows + i) * element_size
      • Better for column operations
  • Common operations:
    • Transpose: O(rows × cols)
      def transpose(matrix):
          rows, cols = len(matrix), len(matrix[0])
          result = [[0] * rows for _ in range(cols)]
          for i in range(rows):
              for j in range(cols):
                  result[j][i] = matrix[i][j]
          return result
      
    • Matrix multiplication: O(n³) naive, O(n^2.37) Strassen
    • Rotation: 90° clockwise in-place
      def rotate_90(matrix):
          n = len(matrix)
          # Transpose
          for i in range(n):
              for j in range(i, n):
                  matrix[i][j], matrix[j][i] = matrix[j][i], matrix[i][j]
          # Reverse each row
          for i in range(n):
              matrix[i].reverse()
      
  • Jagged arrays: Arrays with varying row lengths
    jagged = [[1, 2], [3, 4, 5], [6]]
    
  • Higher dimensions:
    • 3D: arr[depth][row][col] - e.g., RGB images
    • 4D: arr[batch][depth][row][col] - e.g., video batches

5. Memory layout and cache locality

Understanding performance implications of memory access patterns:

  • Contiguous memory:
    • Arrays stored in consecutive memory locations
    • Address of arr[i] = base_address + i × sizeof(element)
    • Enables efficient sequential access
  • Cache hierarchy:
    • L1 cache: ~32-64 KB, ~1-4 cycles
    • L2 cache: ~256 KB - 1 MB, ~10-20 cycles
    • L3 cache: ~8-32 MB, ~40-75 cycles
    • RAM: GBs, ~200+ cycles
  • Cache lines:
    • Typical size: 64 bytes
    • Fetching one element loads entire cache line
    • Sequential access benefits from prefetching
  • Spatial locality:
    • Accessing nearby memory locations
    • Good: Iterating array sequentially
      # Good cache locality
      for i in range(n):
          sum += arr[i]
      
    • Bad: Random access patterns
      # Poor cache locality
      for i in random_indices:
          sum += arr[i]
      
  • Matrix traversal:
    • Row-major order (C/C++):
      // Good: iterate by rows
      for (int i = 0; i < rows; i++)
          for (int j = 0; j < cols; j++)
              sum += matrix[i][j];
          
      // Bad: iterate by columns (cache misses)
      for (int j = 0; j < cols; j++)
          for (int i = 0; i < rows; i++)
              sum += matrix[i][j];
      
  • Performance impact:
    • Good locality: 10-100x faster than poor locality
    • Critical for large datasets
  • Optimization techniques:
    • Loop blocking/tiling: Process data in cache-sized chunks
    • Structure of Arrays (SoA) vs Array of Structures (AoS):
      // AoS - may have poor locality for specific fields
      struct Point { int x, y, z; };
      Point points[1000];
          
      // SoA - better locality when accessing single field
      struct Points {
          int x[1000];
          int y[1000];
          int z[1000];
      };
      

6. Applications and limitations

When to use arrays and when to choose alternatives:

  • Applications:
    • Lookup tables: Fast O(1) access by index
      • ASCII character mappings
      • Precomputed values (factorials, primes)
    • Buffers: Fixed-size data storage
      • Circular buffers for streaming
      • I/O buffers
    • Stacks and queues: Array-based implementations
    • Dynamic programming: Memoization tables
    • Sorting algorithms: In-place sorting
    • Image processing: Pixel arrays (2D/3D)
    • Scientific computing: Vectors, matrices, tensors
  • Advantages:
    • ✓ O(1) random access by index
    • ✓ Cache-friendly (contiguous memory)
    • ✓ Simple and intuitive
    • ✓ Low memory overhead
    • ✓ Good for iteration
  • Limitations:
    • ✗ Fixed size (static) or resize overhead (dynamic)
    • ✗ O(n) insertion/deletion in middle
    • ✗ Wasted space if sparse
    • ✗ Cannot efficiently grow in middle
    • ✗ No O(1) insertion at arbitrary positions
  • When to use alternatives:

    Use Case Better Alternative Reason
    Frequent insertions/deletions Linked List O(1) insert/delete
    Unknown size, many insertions Dynamic Array Amortized O(1) append
    Key-value lookups Hash Table O(1) average lookup
    Sorted data with insertions Binary Search Tree O(log n) operations
    Priority queue Heap O(log n) insert/extract-min
    Sparse data Hash Table / Sparse Matrix Space efficient
  • Real-world examples:
    • ArrayList (Java): Dynamic array for general use
    • NumPy arrays (Python): Scientific computing
    • std::vector (C++): Default container choice
    • Circular buffer: Ring buffer for producers/consumers
    • Bitmap: Compact boolean arrays