the key is that if we have a prefix == suffix match in the substring up to the pattern[i] where pattern[i] stopped matching with text[j], then the suffix suf must also be the last characters we saw in text, text[ j-len(suf) : j ]; and since this substring suf is equal to some prefix pre, we can slide the pattern over, match up pre with text[ j-len(suf) : j ], and start trying to match again at pattern[ len(p) ] and text[j]
Tells you either that a value is possibly in the set or that it's definitelynot in the set.
I.e., some false positives, but guaranteed no false negatives. Probabilistic data structure.
Trie and suffix trees:
Union-find / disjoint-set:
Network connectivity, connected components
Basic idea is that each distinct subset is identified by its root, so if two elements have the same root, they're in the same set; and to merge, just need to make one subset's root the other subset's
Useful for questions where I need to merge disjoint sets or network-like problems etc. like https://leetcode.com/problems/number-of-islands-ii/description/:
The idea here is to use an OrderedDict and when returning an item, pop and re-insert to maintain the order invariant. popitem(last=False) lets us pop off the tail, i.e., in FIFO order.
from collections import OrderedDict
class LRUCache:
def __init__(self, capacity):
self.capacity = capacity
self.cache = OrderedDict()
def get(self, key):
if key not in self.cache: return -1
val = self.cache.pop(key)
self.cache[key] = val
return val
def set(self, key, value):
try:
self.cache.pop(key)
except KeyError:
if len(self.cache) == self.capacity:
self.cache.popitem(last=False)
self.cache[key] = value
Without OrderedDict, would have to create my own. Basically would have a normal dictionary and then a doubly linked list. The dictionary would map each key to its node; there would be dummy head and tail pointers; and I'd add to the tail.prev and remove/invalidate from the head.next:
harder version is LFUCache where least frequently used item is evicted (so need to keep track of frequency / # of accesses) *and* ties are broken by least recent too https://leetcode.com/problems/lfu-cache/description/
Key ideas are that the edit distance of two strings x, y has to be least the edit distance of their prefixes, e.g. x[:-1], y; insertion and deletion are "symmetric"; if the current characters are the same, the edit distance at that point is just the edit distance of the two strings minus the current character; if not, they're the min of the diagonal, previous column and previous row edit distances + 1 (to represent the operation itself)
i.e., assume I did delete, so I look at the previous column for that edit distance, then add 1 for the current deletion; if I replaced, then it's the diagonal plus 1 etc.
Fisher-Yates shuffle:
import random
def inplace_shuffle(lst):
n = len(lst)
if n < 2: return lst
for i in xrange(n - 1):
repl = random.randint(i, n - 1)
if i != repl: lst[i], lst[repl] = lst[repl], lst[i]
The naive algorithm of swapping every number with any other number fails because the number of possible orderings it outputs is n^n, but the number of actual distinct possible orderings is n!, so by the pigeonhole principle, some distinct permutations / "buckets" will be overrepresented.
Dijkstra's:
Boyer-Moore majority vote algorithm:
O(n) time and O(1) space algo for finding majority element if it exists
used in LC #277 Find the Celebrity https://leetcode.com/problems/find-the-celebrity/description/
Also in https://leetcode.com/problems/majority-element/description/
Basically have a candidate majority element and a count, increment the count if curr == candidate, if count is 0 then curr is new candidate, else decrement count.
Reverse a number (integer):
I.e., how to turn a number into its reverse without converting to a string.
Basically use mod (%) and integer division (//) to find each digit and construct the reverse number.
def reverse(num):
rev = 0
while num:
rev *= 10
rev += num % 10
num /= 10
return num
E.g. useful for testing numerical palindromes without converting to a string.
Heaps can be very useful if I need to maintain some kind of min/max value, but it needs to be dynamically updated over time among other possible previous values. Notably of course, the invariant just applies to the start/min of the list; the rest of the list remains unsorted.
Basically just initialize heap as a normal list (or use in-place heapify) and then use heaq.heappush(arr, x), heappop, and heapreplace to push, pop, and replace an item while maintaining min-heap invariant.
Meeting room variant where getting fewest # of rooms needed: https://leetcode.com/problems/meeting-rooms-ii/description/
intervals.sort(key=lambda x: x.start)
# keep track of min end time
heap = []
for n in intervals:
# heap[0] is always the min end time
# this means we can reuse this room but just need to update end
if heap and n.start >= heap[0]:
heapreplace(heap, n.end)
# otherwise we need to allocate a new room
else:
heappush(heap, n.end)
return len(heap)
Common hard problem of merging k sorted (linked) lists (O(n k log(k)): https://leetcode.com/problems/merge-k-sorted-lists/description/
dummy = curr = ListNode(0)
h = [(n.val, n) for n in lists if n]
heapify(h)
while h:
# don't pop yet, only change heap size if nec
val, n = h[0]
if not n.next:
heappop(h)
else:
# insert new val, n tuple while keeping invar
heapreplace(h, (n.next.val, n.next))
curr.next = n
curr = curr.next
return dummy.next
Backtracking:
Backtracking is a powerful technique. It allows us to try all possible configurations/solutions, but in an optimal and incremental order, so that we can eliminate wholesale bad approaches. Basically we generate partial solutions at each iteration: if it passes the constraint check, we recur deeper (DFS style); if not, we continue / "prune" that entire branch and backtrack up the search tree to an unexplored branch. We generate new solutions through a loop breadth-wise and explore them further through DFS recurrence. The n queens problem is a typical example.
Importantly, to differentiate from DFS, it's not enough to just continue, you often need to actively and selectively prune. For example, just having a visited set might not be enough, you might need to remove from the set after you've tried out that path (recurred deeper). See the word search problem.
For board/matrix problems, another way of doing it would be modifying the board itself. I.e., mark the board spot with a special character, DFS/recur deeper, then reset the board spot to original value.
Good summary of backtracking pattern/approach for subsets/permutations/etc. questions:
if not flags[i] and (i not in illegal[prev - 1] or flags[(i + prev) / 2]):
num_patterns += dfs(i, level - 1)
# otherwise continue / abandon branch (ie all solutions w this key as a successor)
flags[prev] = False
return num_patterns
# many keys are symmetric
return sum([dfs(1, i) * 4 + dfs(2, i) * 4 + dfs(5, i) for i in xrange(m, n + 1)])
DFS in detail:
Very common in island or flood fill-type problems. Make sure that I'm resetting the node or point to avoid infinite loops. I.e., once something passes the constraint, make sure I mark it in some way (could also just be a visited set or flags[i] boolean arr).
if x < 0 or x >= len(image) or y < 0 or y >= len(image[0]):
return
curr_color = image[x][y]
# make sure we don't loop indefinitely
if image[x][y] != prev_color or image[x][y] == newColor:
return
# fill ..
image[x][y] = newColor
# .. recursively
dfs(x - 1, y, prev_color)
dfs(x, y - 1, prev_color)
dfs(x, y + 1, prev_color)
dfs(x + 1, y, prev_color)
dfs(sr, sc, image[sr][sc])
return image
For island problems where I have to count, make sure I'm not double-counting. Be careful with how I'm updating the return value. Either pass in the value each time or just add incrementally, but not both.
Used to store running prefix sums. Otherwise if I want to do both insert element at i as well as get sum of elements up to i, requires O(1) and O(n). With BITs, can reduce to O(log n) for both.
Based on the idea that any integer (index) can be represented as a sum of powers of 2.
"For example 19 can be represented as 16 + 2 + 1. Every node of BI Tree stores sum of n elements where n is a power of 2. For example, in the above first diagram for getSum(), sum of first 12 elements can be obtained by sum of last 4 elements (from 9 to 12) plus sum of 8 elements (from 1 to 8). The number of set bits in binary representation of a number n is O(Logn). Therefore, we traverse at-most O(Logn) nodes in both getSum() and update() operations. Time complexity of construction is O(nLogn) as it calls update() for all n elements."
If it's a fully element sorted matrix (ie even more strictly sorted), can still do the same thing, but can also apply a binary search variant for O(log m + lg n) time. Basically binary search in a middle row, if not there, can know it's between last two elements and can eliminate the top left and bottom right "rectangles." https://www.geeksforgeeks.org/search-element-sorted-matrix/
Can use similar intuition for kth smallest questions, including for k pairs with smallest sums where can transform array sums into matrix:
https://leetcode.com/problems/find-k-pairs-with-smallest-sums/description/ (reminder that heaps are great!)
One example of external sorting is the external merge sort algorithm, which is a K-way merge algorithm. It sorts chunks that each fit in RAM, then merges the sorted chunks together.[1][2]
The algorithm first sorts M items at a time and puts the sorted lists back into external memory. It then recursively does a -way merge on those sorted lists. To do this merge, B elements from each sorted list are loaded into internal memory, and the minimum is repeatedly outputted.
For example, for sorting 900 megabytes of data using only 100 megabytes of RAM:
Read 100 MB of the data in main memory and sort by some conventional method, like quicksort.
Write the sorted data to disk.
Repeat steps 1 and 2 until all of the data is in sorted 100 MB chunks (there are 900MB / 100MB = 9 chunks), which now need to be merged into one single output file.
Read the first 10 MB (= 100MB / (9 chunks + 1)) of each sorted chunk into input buffers in main memory and allocate the remaining 10 MB for an output buffer. (In practice, it might provide better performance to make the output buffer larger and the input buffers slightly smaller.)
Perform a 9-way merge and store the result in the output buffer. Whenever the output buffer fills, write it to the final sorted file and empty it. Whenever any of the 9 input buffers empties, fill it with the next 10 MB of its associated 100 MB sorted chunk until no more data from the chunk is available. This is the key step that makes external merge sort work externally -- because the merge algorithm only makes one pass sequentially through each of the chunks, each chunk does not have to be loaded completely; rather, sequential parts of the chunk can be loaded as needed.
Topological sort for DAGs / Kahn's algorithm:
https://en.wikipedia.org/wiki/Topological_sorting
First, find a list of "start nodes" which have no incoming edges and insert them into a set S; at least one such node must exist in a non-empty acyclic graph. Then:
L ← Empty list that will contain the sorted elements
S ← Set of all nodes with no incoming edge
while S is non-empty do
remove a node n from S
add n to tail of L
for each node m with an edge e from n to m do
remove edge e from the graph
if m has no other incoming edges then
insert m into S
if graph has edges then
return error (graph has at least one cycle)
else
return L (a topologically sorted order)
If the graph is a DAG, a solution will be contained in the list L (the solution is not necessarily unique). Otherwise, the graph must have at least one cycle and therefore a topological sorting is impossible.
https://leetcode.com/problems/alien-dictionary/
code:
def alienOrder(self, words):
# a -> b
adj = defaultdict(set)
# in-degree
deg = {c: 0 for w in words for c in w}
for i, w1 in enumerate(words[:-1]):
w2 = words[i + 1]
for c1, c2 in zip(w1, w2):
if c1 == c2: continue
if c2 not in adj[c1]: deg[c2] += 1
adj[c1].add(c2)
break
res = ''
# start w 0 indegree nodes
q = deque([c for c in deg if not deg[c]])
while q:
c = q.popleft()
res += c
for n in adj[c]:
deg[n] -= 1
if not deg[n]: q.append(n)
return res if len(res) == len(deg) else ''
Reservoir sampling:
https://www.wikiwand.com/en/Reservoir_sampling
O(n) code:
S has items to sample, R will contain the result
ReservoirSample(S[1..n], R[1..k])
// fill the reservoir array
for i = 1 to k
R[i] := S[i]
// replace elements with gradually decreasing probability
for i = k+1 to n
j := random(1, i) // important: inclusive range
if j <= k
R[j] := S[i]
The algorithm creates a "reservoir" array of size k and populates it with the first k items of S. It then iterates through the remaining elements of S until S is exhausted. At the ith element of S, the algorithm generates a random number between 1 and i. If j is less than or equal to k, the jth element of the reservoir array is replaced with the ith element of S. In effect, for all i, the ith element of S is chosen to be included in the reservoir with probability k/i. Similarly, at each iteration the jth element of the reservoir array is chosen to be replaced with probability 1/k * k/i, which simplifies to 1/i. It can be shown that when the algorithm has finished executing, each item in has equal probability (i.e. k / len(S)) of being chosen for the reservoir.
Sliding window approach:
example problem: https://leetcode.com/problems/fruit-into-baskets/ basically max subarray with 2 distinct elements)
note that for the anagram problem (https://leetcode.com/problems/find-all-anagrams-in-a-string/) the window stays a constant size as it slides to the right. for the fruit problem, each time the window moves 1 to the right, if the window no longer fits the criteria, we shrink the window starting from the left end of it
Combinatorics:
permutations = order matters, arrangements
n permute r = n! / (n-r)!
combinations = order does not matter, selections of elements
No comments