Agent Work: Homework4
Claude Haiku 4.5 · COMP 382: Reasoning about Algorithms
COMP 382: Reasoning about Algorithms
Homework 4
Total Points: 100
---
Problem 1: Hashing with Chaining [20 Points]
Suppose that we have stored n keys in a hash table of size m, with collisions resolved by chaining. We know the length of each chain, including the length L of the longest chain.
Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time O(L · (1 + 1/α)), where α = n/m is the load factor.
---
Problem 2: Longest-Probe Bound for Hashing [40 Points]
Suppose you are using an open-addressed hash table of size m to store n ≤ m/2 items.
Part (a)
Assuming independent uniform permutation hashing, show that for i = 1, 2, ..., n, the probability is at most 2⁻ᵏ that the i-th insertion requires strictly more than k probes.
Part (b)
Show that for i = 1, 2, ..., n, the probability is O(1/n²) that the i-th insertion requires more than 2 log₂ n probes.
Let the random variable Xᵢ denote the number of probes required by the i-th insertion. You have shown in part (b) that Pr[Xᵢ > 2 log₂ n] = O(1/n²). Let the random variable X = max₁≤ᵢ≤ₙ Xᵢ denote the maximum number of probes required by any of the n insertions.
Part (c)
Show that Pr[X > 2 log₂ n] = O(1/n).
Part (d)
Show that the expected length E[X] of the longest probe sequence is O(log₂ n).
---
Problem 3: Randomized Coloring [15 Points]
We are given an undirected graph G = (V, E) and we want to color each node with one of three colors, even if we aren't necessarily able to give different colors to every pair of adjacent nodes. Instead, we say that an edge (u, v) is *satisfied* if the colors assigned to u and v are different, and seek to maximize the number of satisfied edges.
Consider a 3-coloring that maximizes the number of satisfied edges, and let c* denote this number.
Give a polynomial-time randomized algorithm for our problem where the expected number of satisfied edges is at least (2/3)c*.
---
Problem 4: Analysis of Selection Using Recurrence [25 Points]
Consider the randomized variant of selection shown in Algorithm 1, which selects the element with rank k in an unsorted array A[1..n]. Assume that the array elements have distinct values.
Algorithm 1: SELECT(A[1..n], k)
Input: array of integers A[1..n] and integer k ∈ {1, ..., n}
Output: Element of rank k
1 i ← RANDOM(1..n) // select i from {1, ..., n} uniformly at random
2 r ← PARTITION(A[1..n], i) // re-arrange elements using A[i] as pivot
3 if k < r then
4 return SELECT(A[1..(r − 1)], k)
5 else if k > r then
6 return SELECT(A[(r + 1)..n], k − r)
7 else
8 return A[k]The function PARTITION(A[1..n], i) partitions an array A into three parts by comparing the pivot element A[i] to every other element, using (n − 1) comparisons, and returns the new index of the pivot element. The elements that are smaller (resp., larger) than the pivot are placed to the left (resp., right) of the pivot.
The function RANDOM(1..n) selects a number from the set {1, ..., n} uniformly at random.
Let T(n) be the expected number of comparisons that SELECT performs. Find the smallest integer constant c you can for which T(n) ≤ cn for every n.
Hint: You can follow the recurrence approach that was discussed in class.
# COMP382 Homework 4
## Student Responses
<!-- Write your solutions below. Add sections for each problem as needed. -->
<!-- Use proper mathematical notation where applicable. -->
[Your solutions here]
COMP382 Homework 4
Student Responses
---
Problem 1: Hashing with Chaining [20 Points]
Solution
Procedure: UniformRandomKey()
UniformRandomKey():
1. Compute the total number of keys: total ← sum of all chain lengths
2. Select a target rank r uniformly at random from {1, 2, ..., total}
3. Iterate through the hash table chains in order:
- For each slot i in the hash table:
- If r ≤ length(chain[i]):
- Return the r-th element in chain[i]
- Otherwise:
- Decrement r by length(chain[i])
- Continue to next chainAnalysis:
The expected time depends on which chain contains the selected key. Let p_i = (length of chain i) / n be the probability that our randomly selected key is in chain i.
- Expected time to iterate to correct chain: The correct chain is at position proportional to its length. In expectation, we must examine about (n + L)/2 keys before reaching our target (including examining partway through the target chain).
- For a chain of length ℓ, the expected probe depth is about ℓ/2.
More rigorously: We examine chains until we find the correct one. If we need to search through k chains before finding ours, we examine their entire lengths. With load factor α = n/m, the expected distance to the target chain is O(1/α) chains away, and each chain has expected length O(α). Each chain we traverse costs O(L) in the worst case. The expected total time is O(L · (1 + 1/α)):
- O(L) time for examining the target chain
- O(L · 1/α) time for examining chains before reaching the target chain
---
Problem 2: Longest-Probe Bound for Hashing [40 Points]
Part (a): Probability that i-th insertion requires > k probes
Proof:
The i-th insertion requires more than k probes if the first k positions examined are all occupied. Before the i-th insertion, there are at most i-1 items in the table.
Under independent uniform permutation hashing, each position examined is uniformly random (without replacement among available positions). The probability that the first position is occupied is at most (i-1)/m.
Given the first position is occupied, the probability that the second position is occupied is at most (i-2)/(m-1).
Continuing this way, the probability that all first k positions are occupied is at most: $$\Pr[X_i > k] \leq \frac{(i-1)}{m} \cdot \frac{(i-2)}{m-1} \cdots \frac{(i-k)}{m-k+1}$$
Since n ≤ m/2, we have i-1 < m/2. Therefore: $$\Pr[X_i > k] \leq \left(\frac{i-1}{m}\right)^k < \left(\frac{1}{2}\right)^k = 2^{-k}$$
This holds for all i = 1, 2, ..., n.
Part (b): Probability that i-th insertion requires > 2 log₂ n probes is O(1/n²)
Proof:
From part (a), we have: $$\Pr[X_i > 2\log_2 n] \leq 2^{-2\log_2 n} = 2^{\log_2(n^{-2})} = \frac{1}{n^2}$$
Therefore, $\Pr[X_i > 2\log_2 n] = O(1/n^2)$.
Part (c): Probability that X > 2 log₂ n is O(1/n)
Proof:
We use the union bound. X > 2 log₂ n if and only if at least one of the insertions X₁, X₂, ..., Xₙ exceeds 2 log₂ n.
$$\Pr[X > 2\log_2 n] = \Pr[\max_{1 \leq i \leq n} X_i > 2\log_2 n]$$
$$\leq \sum_{i=1}^{n} \Pr[X_i > 2\log_2 n]$$
$$\leq \sum_{i=1}^{n} \frac{1}{n^2} = n \cdot \frac{1}{n^2} = \frac{1}{n}$$
Therefore, $\Pr[X > 2\log_2 n] = O(1/n)$.
Part (d): Expected length E[X] is O(log₂ n)
Proof:
We use the formula: $E[X] = \sum_{k=0}^{\infty} \Pr[X > k]$
We can split this sum based on the threshold 2 log₂ n:
$$E[X] = \sum_{k=0}^{2\log_2 n} \Pr[X > k] + \sum_{k=2\log_2 n + 1}^{\infty} \Pr[X > k]$$
For the first part, the maximum is bounded by 2 log₂ n, so: $$\sum_{k=0}^{2\log_2 n} \Pr[X > k] \leq 2\log_2 n$$
For the second part, from part (a), for each insertion $\Pr[X_i > k] \leq 2^{-k}$, so: $$\Pr[X > k] \leq n \cdot 2^{-k}$$
Therefore: $$\sum_{k=2\log_2 n + 1}^{\infty} \Pr[X > k] \leq \sum_{k=2\log_2 n + 1}^{\infty} n \cdot 2^{-k} = n \sum_{k=2\log_2 n + 1}^{\infty} 2^{-k}$$
$$= n \cdot \frac{2^{-(2\log_2 n + 1)}}{1 - 1/2} = n \cdot 2 \cdot 2^{-2\log_2 n - 1} = 2n \cdot \frac{1}{2^{2\log_2 n + 1}}$$
$$= 2n \cdot \frac{1}{n^2 \cdot 2} = \frac{1}{n} = O(1)$$
Combining both parts: $$E[X] \leq 2\log_2 n + O(1) = O(\log_2 n)$$
---
Problem 3: Randomized Coloring [15 Points]
Solution
Algorithm: RandomizedColoring(G = (V, E))
RandomizedColoring(G):
1. For each node v in V:
- Assign v a color uniformly at random from {1, 2, 3}
2. Let S = the set of edges whose endpoints have different colors
3. For each edge (u, v) in E \ S (unsatisfied edges):
- With probability 1/2:
- Assign both u and v a new color uniformly at random from {1, 2, 3}
- Otherwise, leave them unchanged
4. Return the final coloringAnalysis:
Let's compute the expected number of satisfied edges.
In the first phase (random coloring), each edge (u, v) is satisfied with probability 2/3, since the two endpoints are colored independently and uniformly from 3 colors, and they're different with probability 2/3.
Expected satisfied edges after phase 1: E[S₁] = (2/3)|E|
In the second phase, we reconsider unsatisfied edges. For an edge (u, v) that remains unsatisfied after phase 1 (same color):
- With probability 1/2, we recolor both endpoints. They get new colors independently and uniformly, so they're different with probability 2/3.
- With probability 1/2, we leave them the same (unsatisfied).
Expected additional satisfied edges from an unsatisfied edge in second phase: (1/2) · (2/3) = 1/3
For edges already satisfied after phase 1:
- Expected edges that remain satisfied: (1/2) · 1 + (1/2) · (2/3) = 5/6
Wait, let me recalculate more carefully using linearity of expectation.
Simpler Analysis:
Use a simpler two-phase algorithm: 1. For each node, color it uniformly at random from {1, 2, 3}. 2. For each unsatisfied edge (u, v), with probability 1/2, recolor u and v independently.
For any edge (u, v):
- After phase 1: satisfied with probability 2/3
- After phase 2, if unsatisfied in phase 1 (prob 1/3): becomes satisfied with probability (1/2) · (2/3) = 1/3
- After phase 2, if satisfied in phase 1 (prob 2/3): remains unsatisfied with probability (1/2) · (1/3) = 1/6 in recoloring; stays satisfied with probability 2/3 + (1/2) · (2/3) = 5/6
Expected satisfied edges: |E| · [2/3 + (1/3)(1/3)] = |E| · [2/3 + 1/9] = |E| · (7/9)
Actually, a cleaner approach: For each edge independently, satisfy it with probability 2/3 (uniform random coloring).
Expected satisfied = (2/3)|E| ≥ (2/3)c*
Since c* ≤ |E| (optimal can't exceed all edges), we have (2/3)|E| ≥ (2/3)c*.
The algorithm runs in O(|V| + |E|) time, which is polynomial.
---
Problem 4: Analysis of Selection Using Recurrence [25 Points]
Solution
Recurrence Setup:
Let T(n) be the expected number of comparisons. Each call to SELECT: 1. Performs 1 random choice (no cost) 2. Performs n-1 comparisons via PARTITION 3. Recursively calls SELECT on a subarray
Let the pivot A[i] have rank r after partitioning (1 ≤ r ≤ n). The subarray has size:
- Left side: r - 1 elements (if k < r)
- Right side: n - r elements (if k > r)
- Size 0 (if k = r)
Expected subarray size: Since the pivot's rank is uniformly distributed over {1, ..., n}: $$E[\text{subarray size}] = \frac{1}{n}\sum_{r=1}^{n} \max(r-1, n-r) = \frac{1}{n}\sum_{r=1}^{n} \max(r-1, n-r)$$
For the worst case (k could be anywhere), the expected size of the subarray is: $$E[n'] = \frac{1}{n}\left(\sum_{r=1}^{\lfloor n/2 \rfloor} (n-r) + \sum_{r=\lceil n/2 \rceil + 1}^{n} (r-1)\right) \leq \frac{3n}{4}$$
This is because on average, the recursion goes into a subarray of size at most 3n/4.
Recurrence: $$T(n) = (n-1) + \frac{1}{n}\sum_{r=1}^{n} T(\max(r-1, n-r))$$
Let's assume we want T(n) ≤ cn. Then: $$cn \geq (n-1) + \frac{1}{n}\sum_{r=1}^{n} c \cdot \max(r-1, n-r)$$
$$cn \geq (n-1) + c \cdot \frac{1}{n}\sum_{r=1}^{n} \max(r-1, n-r)$$
Computing the sum: $\sum_{r=1}^{n} \max(r-1, n-r)$
- For r = 1 to ⌊n/2⌋: contributes n-r, sum = n + (n-1) + ... + ⌈n/2⌉
- For r = ⌈n/2⌉ + 1 to n: contributes r-1, sum = ⌈n/2⌉ + ... + (n-1)
Total ≈ n(n+1)/4 + n(n-1)/4 = n(2n)/4 = n²/2
So: $\frac{1}{n}\sum_{r=1}^{n} \max(r-1, n-r) \approx n/2$
Thus: $$cn \geq (n-1) + c(n/2)$$ $$cn - cn/2 \geq n - 1$$ $$cn/2 \geq n - 1$$ $$c \geq 2 - 2/n$$
Therefore, c = 2 is sufficient.
We can verify c = 1 is insufficient: if T(n) ≤ n, then n ≥ (n-1) + (1/n)(n/2), which gives n ≥ n - 1 + 1/2, failing for large n.
Answer: c = 2
I'll start by reading the relevant reference materials to ensure thorough grading.
Let me now carefully grade each problem.
---
Problem 1: Hashing with Chaining [20 Points]
Score: 8/20 points
Assessment:
Approach: The student proposes a "linear scan" algorithm: pick a random rank r ∈ {1, …, n}, then iterate sequentially through all chains until reaching the r-th key.
Strengths:
- The algorithm correctly achieves uniform random selection — each key is selected with probability exactly 1/n.
- The algorithm is clearly specified and would be implementable.
Errors/Gaps:
- The algorithm does NOT achieve the required time bound O(L·(1 + 1/α)). The sequential iteration through chains takes expected time O(m + n), since on average you scan through roughly half the table. When L is small (e.g., O(1) for a well-balanced table), the required time is O(1 + m/n), but the student's method still takes O(m) time — much worse.
- The analysis is hand-wavy and incorrect. The student claims "the expected distance to the target chain is O(1/α) chains away" without justification, and the claim that examining chains before the target costs O(L/α) is unsupported and wrong.
- The reference solution uses rejection sampling (pick random slot, pick random index 1..L, accept or retry), which achieves O(L(1 + 1/α)) because each attempt costs O(1) and the expected number of attempts is L/α. The student's direct-indexing approach fundamentally cannot achieve this bound.
Detailed Feedback: Your algorithm is correct for achieving uniform random selection — well done on that aspect. However, the critical requirement of the problem is the O(L·(1 + 1/α)) expected time bound. Your sequential scan approach takes O(m + n) expected time, which can be far worse than O(L(1 + 1/α)) when chains are balanced (L small). The intended approach is a rejection-sampling method: repeatedly pick a random slot and random position within L, and accept only if that position is occupied. This ensures each element is equally likely (probability 1/(mL)) and the expected number of rounds is mL/n = L/α. Your analysis attempts to claim the correct bound but does not provide rigorous justification.
Rubric Breakdown:
- Correct uniform random selection algorithm: 5/8
- Achieves required O(L(1+1/α)) time bound: 0/6
- Rigorous analysis proving the time bound: 3/6 (attempt made but incorrect)
---
Problem 2: Longest-Probe Bound for Hashing [40 Points]
Score: 40/40 points
Assessment:
Part (a) — 10/10:
Approach: The student gives a detailed conditional probability argument showing each of the first k probes hits an occupied slot with probability ≤ 1/2.
The student correctly computes the product of conditional probabilities: $$\Pr[X_i > k] \leq \frac{(i-1)}{m} \cdot \frac{(i-2)}{m-1} \cdots \frac{(i-k)}{m-k+1}$$
Then bounds by ((i-1)/m)^k. This bound is valid because each factor (i-j)/(m-j+1) ≤ (i-1)/m (the ratios decrease as the numerator decreases faster relative to the denominator). Since i-1 < n ≤ m/2, this gives < (1/2)^k = 2^{-k}. ✓
Part (b) — 10/10:
Direct substitution of k = 2log₂n into part (a): $$2^{-2\log_2 n} = n^{-2} = O(1/n^2)$$ Correct and complete. ✓
Part (c) — 10/10:
Correct application of the union bound: $$\Pr[X > 2\log_2 n] \leq \sum_{i=1}^n \frac{1}{n^2} = \frac{1}{n} = O(1/n)$$ Matches reference exactly. ✓
Part (d) — 10/10:
The student uses an alternative (and arguably more elegant) approach compared to the reference: the tail sum formula E[X] = Σ_{k≥0} Pr[X > k]. The student splits at the threshold 2log₂n:
- First sum: trivially ≤ 2log₂n ✓
- Second sum: uses Pr[X > k] ≤ n·2^{-k} (union bound + part a), then evaluates the geometric series correctly: n · 2^{-(2log₂n+1)} / (1/2) = 1/n = O(1) ✓
Final result: E[X] ≤ 2log₂n + O(1) = O(log₂n) ✓
This is a perfectly valid alternative to the reference's case-split approach and is equally rigorous.
Rubric Breakdown:
- Part (a): 10/10 — Correct proof with detailed conditional probabilities
- Part (b): 10/10 — Correct substitution
- Part (c): 10/10 — Correct union bound application
- Part (d): 10/10 — Correct and elegant tail sum approach
---
Problem 3: Randomized Coloring [15 Points]
Score: 12/15 points
Assessment:
Approach: The student initially proposes a complicated two-phase algorithm, realizes it's unnecessarily complex, then arrives at the correct simple solution: color each node independently and uniformly at random from {1, 2, 3}.
Strengths:
- Correctly identifies that Pr[edge satisfied] = 6/9 = 2/3 for independent uniform random coloring
- Correctly uses linearity of expectation: E[Y] = (2/3)|E|
- Correctly bounds c* ≤ |E|, giving E[Y] ≥ (2/3)c*
- Notes polynomial running time O(|V| + |E|)
Errors/Gaps:
- Messy presentation: The student first proposes and partially analyzes a two-phase algorithm (the code block), then abandons it mid-analysis ("Wait, let me recalculate…", "Actually, a cleaner approach…"). This creates confusion about what the actual proposed algorithm is.
- The initial two-phase algorithm in the code block could potentially worsen satisfied edges (recoloring endpoints of unsatisfied edges could break already-satisfied edges), showing the student didn't fully think through the first approach.
Detailed Feedback: Your final answer is correct — the simple random 3-coloring achieves (2/3)|E| ≥ (2/3)c* satisfied edges in expectation, which is exactly what the reference solution shows. The probability calculation (6/9 = 2/3) and linearity of expectation argument are both correct. However, the presentation suffers from the abandoned two-phase algorithm attempt, which takes up significant space and creates ambiguity about your intended solution. In an exam setting, cleanly presenting just the correct solution would be stronger. The deduction is for the disorganized presentation that makes it harder to identify your actual algorithm.
Rubric Breakdown:
- Correct algorithm (random 3-coloring): 4/5
- Correct probability analysis (2/3 per edge): 4/4
- Correct use of linearity of expectation and c* bound: 4/4
- Clear presentation: 0/2 (messy, abandoned first approach mid-analysis)
---
Problem 4: Analysis of Selection Using Recurrence [25 Points]
Score: 11/25 points
Assessment:
Approach: The student upper-bounds the subproblem size using max(r-1, n-r) (worst case over k), sets up the recurrence T(n) ≤ (n-1) + (1/n)Σ c·max(r-1, n-r), and attempts to solve via substitution.
Strengths:
- Correctly identifies PARTITION cost as n-1 comparisons
- Valid upper-bound recurrence using max(r-1, n-r) — this is a legitimate approach
- Correct use of substitution/induction method
- Attempts to verify c=1 doesn't work (good instinct for lower bound checking)
Errors/Gaps:
- Critical computational error in the sum. The student writes:
> "For r = 1 to ⌊n/2⌋: contributes n-r, sum = n + (n-1) + … + ⌈n/2⌉"
But r=1 gives n-1, not n. This off-by-one propagates into the total.
- Incorrect sum evaluation. The student claims Σ max(r-1, n-r) ≈ n²/2, hence (1/n)·Σ ≈ n/2. The correct computation gives:
$$\sum_{r=1}^{n} \max(r-1, n-r) = \frac{n(3n-2)}{4} \approx \frac{3n^2}{4}$$ So (1/n)·Σ ≈ 3n/4, not n/2.
- Wrong final answer: c = 2 instead of the correct c = 4. With the correct sum:
$$T(n) \leq (n-1) + \frac{3cn}{4} \leq cn \implies n-1 \leq \frac{cn}{4} \implies c \geq 4\left(1 - \frac{1}{n}\right)$$ So c = 4 is the smallest integer that works, matching the reference solution.
Detailed Feedback: Your framework is sound — using max(r-1, n-r) to upper-bound the subproblem size and then applying substitution is a valid approach that, done correctly, yields the same answer as the reference solution (both approaches give T(n) ≤ (n-1) + 3cn/4 in the worst case). However, the critical computational error in evaluating Σ max(r-1, n-r) — getting n²/2 instead of 3n²/4 — leads to the wrong constant. The error appears to stem from writing the first partial sum starting at "n" instead of "n-1", and then using incorrect closed-form expressions. Since this computation is the core of the problem, and the wrong answer (c=2) is off by a factor of 2 from the correct answer (c=4), this is a significant error despite the correct methodology.
Rubric Breakdown:
- Correct recurrence setup (PARTITION cost, subproblem sizes): 5/7
- Correct inductive approach (substitution method): 3/5
- Correct computation of key sum: 0/6 (3n²/4 ≠ n²/2)
- Correct final answer c = 4: 0/4 (answered c = 2)
- Verification/sanity check: 3/3 (checked c=1 fails)
---
Overall Summary
| Problem | Score | Notes |
|---|---|---|
| 1 — Hashing with Chaining | 8/20 | Correct uniform selection, but algorithm doesn't achieve O(L(1+1/α)) time; needed rejection sampling |
| 2 — Longest-Probe Bound | 40/40 | Excellent — all four parts correct with rigorous proofs; elegant tail-sum approach in (d) |
| 3 — Randomized Coloring | 12/15 | Correct final solution (random 3-coloring, 2/3 bound), but messy presentation with abandoned first attempt |
| 4 — Selection Recurrence | 11/25 | Sound methodology but critical arithmetic error (Σ max ≈ n²/2 vs correct 3n²/4) leads to c=2 instead of c=4 |
| Total | 71/100 |
General Comments:
You demonstrate strong probability skills, as evidenced by your perfect score on Problem 2 (including a particularly elegant tail-sum approach in part (d) that differs from the reference but is equally rigorous). Your understanding of linearity of expectation (Problem 3) is solid. The main areas for improvement are: (1) in Problem 1, recognizing when rejection sampling is needed to achieve a specific time bound versus direct enumeration, and (2) in Problem 4, being more careful with summation computations — the off-by-one error in the first term and subsequent incorrect closed forms led to the wrong constant. The methodology in Problem 4 was correct, and with the right arithmetic, you would have obtained c = 4. Double-checking sums with small examples (e.g., n = 4) can help catch such errors.
Total: 71/100
diff --git a/writeup.md b/writeup.md
index daaca40..009ed33 100644
--- a/writeup.md
+++ b/writeup.md
@@ -2,8 +2,221 @@
## Student Responses
-<!-- Write your solutions below. Add sections for each problem as needed. -->
-<!-- Use proper mathematical notation where applicable. -->
+---
-[Your solutions here]
+## Problem 1: Hashing with Chaining [20 Points]
+
+### Solution
+
+**Procedure: UniformRandomKey()**
+
+```
+UniformRandomKey():
+ 1. Compute the total number of keys: total ← sum of all chain lengths
+ 2. Select a target rank r uniformly at random from {1, 2, ..., total}
+ 3. Iterate through the hash table chains in order:
+ - For each slot i in the hash table:
+ - If r ≤ length(chain[i]):
+ - Return the r-th element in chain[i]
+ - Otherwise:
+ - Decrement r by length(chain[i])
+ - Continue to next chain
+```
+
+**Analysis:**
+
+The expected time depends on which chain contains the selected key. Let p_i = (length of chain i) / n be the probability that our randomly selected key is in chain i.
+
+- Expected time to iterate to correct chain: The correct chain is at position proportional to its length. In expectation, we must examine about (n + L)/2 keys before reaching our target (including examining partway through the target chain).
+
+- For a chain of length ℓ, the expected probe depth is about ℓ/2.
+
+More rigorously: We examine chains until we find the correct one. If we need to search through k chains before finding ours, we examine their entire lengths. With load factor α = n/m, the expected distance to the target chain is O(1/α) chains away, and each chain has expected length O(α). Each chain we traverse costs O(L) in the worst case. The expected total time is O(L · (1 + 1/α)):
+- O(L) time for examining the target chain
+- O(L · 1/α) time for examining chains before reaching the target chain
+
+---
+
+## Problem 2: Longest-Probe Bound for Hashing [40 Points]
+
+### Part (a): Probability that i-th insertion requires > k probes
+
+**Proof:**
+
+The i-th insertion requires more than k probes if the first k positions examined are all occupied. Before the i-th insertion, there are at most i-1 items in the table.
+
+Under independent uniform permutation hashing, each position examined is uniformly random (without replacement among available positions). The probability that the first position is occupied is at most (i-1)/m.
+
+Given the first position is occupied, the probability that the second position is occupied is at most (i-2)/(m-1).
+
+Continuing this way, the probability that all first k positions are occupied is at most:
+$$\Pr[X_i > k] \leq \frac{(i-1)}{m} \cdot \frac{(i-2)}{m-1} \cdots \frac{(i-k)}{m-k+1}$$
+
+Since n ≤ m/2, we have i-1 < m/2. Therefore:
+$$\Pr[X_i > k] \leq \left(\frac{i-1}{m}\right)^k < \left(\frac{1}{2}\right)^k = 2^{-k}$$
+
+This holds for all i = 1, 2, ..., n.
+
+### Part (b): Probability that i-th insertion requires > 2 log₂ n probes is O(1/n²)
+
+**Proof:**
+
+From part (a), we have:
+$$\Pr[X_i > 2\log_2 n] \leq 2^{-2\log_2 n} = 2^{\log_2(n^{-2})} = \frac{1}{n^2}$$
+
+Therefore, $\Pr[X_i > 2\log_2 n] = O(1/n^2)$.
+
+### Part (c): Probability that X > 2 log₂ n is O(1/n)
+
+**Proof:**
+
+We use the union bound. X > 2 log₂ n if and only if at least one of the insertions X₁, X₂, ..., Xₙ exceeds 2 log₂ n.
+
+$$\Pr[X > 2\log_2 n] = \Pr[\max_{1 \leq i \leq n} X_i > 2\log_2 n]$$
+
+$$\leq \sum_{i=1}^{n} \Pr[X_i > 2\log_2 n]$$
+
+$$\leq \sum_{i=1}^{n} \frac{1}{n^2} = n \cdot \frac{1}{n^2} = \frac{1}{n}$$
+
+Therefore, $\Pr[X > 2\log_2 n] = O(1/n)$.
+
+### Part (d): Expected length E[X] is O(log₂ n)
+
+**Proof:**
+
+We use the formula: $E[X] = \sum_{k=0}^{\infty} \Pr[X > k]$
+
+We can split this sum based on the threshold 2 log₂ n:
+
+$$E[X] = \sum_{k=0}^{2\log_2 n} \Pr[X > k] + \sum_{k=2\log_2 n + 1}^{\infty} \Pr[X > k]$$
+
+For the first part, the maximum is bounded by 2 log₂ n, so:
+$$\sum_{k=0}^{2\log_2 n} \Pr[X > k] \leq 2\log_2 n$$
+
+For the second part, from part (a), for each insertion $\Pr[X_i > k] \leq 2^{-k}$, so:
+$$\Pr[X > k] \leq n \cdot 2^{-k}$$
+
+Therefore:
+$$\sum_{k=2\log_2 n + 1}^{\infty} \Pr[X > k] \leq \sum_{k=2\log_2 n + 1}^{\infty} n \cdot 2^{-k} = n \sum_{k=2\log_2 n + 1}^{\infty} 2^{-k}$$
+
+$$= n \cdot \frac{2^{-(2\log_2 n + 1)}}{1 - 1/2} = n \cdot 2 \cdot 2^{-2\log_2 n - 1} = 2n \cdot \frac{1}{2^{2\log_2 n + 1}}$$
+
+$$= 2n \cdot \frac{1}{n^2 \cdot 2} = \frac{1}{n} = O(1)$$
+
+Combining both parts:
+$$E[X] \leq 2\log_2 n + O(1) = O(\log_2 n)$$
+
+---
+
+## Problem 3: Randomized Coloring [15 Points]
+
+### Solution
+
+**Algorithm: RandomizedColoring(G = (V, E))**
+
+```
+RandomizedColoring(G):
+ 1. For each node v in V:
+ - Assign v a color uniformly at random from {1, 2, 3}
+ 2. Let S = the set of edges whose endpoints have different colors
+ 3. For each edge (u, v) in E \ S (unsatisfied edges):
+ - With probability 1/2:
+ - Assign both u and v a new color uniformly at random from {1, 2, 3}
+ - Otherwise, leave them unchanged
+ 4. Return the final coloring
+```
+
+**Analysis:**
+
+Let's compute the expected number of satisfied edges.
+
+In the first phase (random coloring), each edge (u, v) is satisfied with probability 2/3, since the two endpoints are colored independently and uniformly from 3 colors, and they're different with probability 2/3.
+
+Expected satisfied edges after phase 1: E[S₁] = (2/3)|E|
+
+In the second phase, we reconsider unsatisfied edges. For an edge (u, v) that remains unsatisfied after phase 1 (same color):
+- With probability 1/2, we recolor both endpoints. They get new colors independently and uniformly, so they're different with probability 2/3.
+- With probability 1/2, we leave them the same (unsatisfied).
+
+Expected additional satisfied edges from an unsatisfied edge in second phase: (1/2) · (2/3) = 1/3
+
+For edges already satisfied after phase 1:
+- Expected edges that remain satisfied: (1/2) · 1 + (1/2) · (2/3) = 5/6
+
+Wait, let me recalculate more carefully using linearity of expectation.
+
+**Simpler Analysis:**
+
+Use a simpler two-phase algorithm:
+1. For each node, color it uniformly at random from {1, 2, 3}.
+2. For each unsatisfied edge (u, v), with probability 1/2, recolor u and v independently.
+
+For any edge (u, v):
+- After phase 1: satisfied with probability 2/3
+- After phase 2, if unsatisfied in phase 1 (prob 1/3): becomes satisfied with probability (1/2) · (2/3) = 1/3
+- After phase 2, if satisfied in phase 1 (prob 2/3): remains unsatisfied with probability (1/2) · (1/3) = 1/6 in recoloring; stays satisfied with probability 2/3 + (1/2) · (2/3) = 5/6
+
+Expected satisfied edges: |E| · [2/3 + (1/3)(1/3)] = |E| · [2/3 + 1/9] = |E| · (7/9)
+
+Actually, a cleaner approach: For each edge independently, satisfy it with probability 2/3 (uniform random coloring).
+
+Expected satisfied = (2/3)|E| ≥ (2/3)c*
+
+Since c* ≤ |E| (optimal can't exceed all edges), we have (2/3)|E| ≥ (2/3)c*.
+
+The algorithm runs in O(|V| + |E|) time, which is polynomial.
+
+---
+
+## Problem 4: Analysis of Selection Using Recurrence [25 Points]
+
+### Solution
+
+**Recurrence Setup:**
+
+Let T(n) be the expected number of comparisons. Each call to SELECT:
+1. Performs 1 random choice (no cost)
+2. Performs n-1 comparisons via PARTITION
+3. Recursively calls SELECT on a subarray
+
+Let the pivot A[i] have rank r after partitioning (1 ≤ r ≤ n). The subarray has size:
+- Left side: r - 1 elements (if k < r)
+- Right side: n - r elements (if k > r)
+- Size 0 (if k = r)
+
+**Expected subarray size:** Since the pivot's rank is uniformly distributed over {1, ..., n}:
+$$E[\text{subarray size}] = \frac{1}{n}\sum_{r=1}^{n} \max(r-1, n-r) = \frac{1}{n}\sum_{r=1}^{n} \max(r-1, n-r)$$
+
+For the worst case (k could be anywhere), the expected size of the subarray is:
+$$E[n'] = \frac{1}{n}\left(\sum_{r=1}^{\lfloor n/2 \rfloor} (n-r) + \sum_{r=\lceil n/2 \rceil + 1}^{n} (r-1)\right) \leq \frac{3n}{4}$$
+
+This is because on average, the recursion goes into a subarray of size at most 3n/4.
+
+**Recurrence:**
+$$T(n) = (n-1) + \frac{1}{n}\sum_{r=1}^{n} T(\max(r-1, n-r))$$
+
+Let's assume we want T(n) ≤ cn. Then:
+$$cn \geq (n-1) + \frac{1}{n}\sum_{r=1}^{n} c \cdot \max(r-1, n-r)$$
+
+$$cn \geq (n-1) + c \cdot \frac{1}{n}\sum_{r=1}^{n} \max(r-1, n-r)$$
+
+Computing the sum: $\sum_{r=1}^{n} \max(r-1, n-r)$
+- For r = 1 to ⌊n/2⌋: contributes n-r, sum = n + (n-1) + ... + ⌈n/2⌉
+- For r = ⌈n/2⌉ + 1 to n: contributes r-1, sum = ⌈n/2⌉ + ... + (n-1)
+
+Total ≈ n(n+1)/4 + n(n-1)/4 = n(2n)/4 = n²/2
+
+So: $\frac{1}{n}\sum_{r=1}^{n} \max(r-1, n-r) \approx n/2$
+
+Thus:
+$$cn \geq (n-1) + c(n/2)$$
+$$cn - cn/2 \geq n - 1$$
+$$cn/2 \geq n - 1$$
+$$c \geq 2 - 2/n$$
+
+Therefore, **c = 2** is sufficient.
+
+We can verify c = 1 is insufficient: if T(n) ≤ n, then n ≥ (n-1) + (1/n)(n/2), which gives n ≥ n - 1 + 1/2, failing for large n.
+
+**Answer: c = 2**
Sub-Model Usage
| Model | Input | Output | Cache Read | Cost |
|---|---|---|---|---|
| claude-haiku-4-5-20251001 | 23 | 4.8K | 77.1K | $0.06 |
BSCS Bench