Agent Work: Homework4
Claude Sonnet 4.6 · COMP 382: Reasoning about Algorithms
COMP 382: Reasoning about Algorithms
Homework 4
Total Points: 100
---
Problem 1: Hashing with Chaining [20 Points]
Suppose that we have stored n keys in a hash table of size m, with collisions resolved by chaining. We know the length of each chain, including the length L of the longest chain.
Describe a procedure that selects a key uniformly at random from among the keys in the hash table and returns it in expected time O(L · (1 + 1/α)), where α = n/m is the load factor.
---
Problem 2: Longest-Probe Bound for Hashing [40 Points]
Suppose you are using an open-addressed hash table of size m to store n ≤ m/2 items.
Part (a)
Assuming independent uniform permutation hashing, show that for i = 1, 2, ..., n, the probability is at most 2⁻ᵏ that the i-th insertion requires strictly more than k probes.
Part (b)
Show that for i = 1, 2, ..., n, the probability is O(1/n²) that the i-th insertion requires more than 2 log₂ n probes.
Let the random variable Xᵢ denote the number of probes required by the i-th insertion. You have shown in part (b) that Pr[Xᵢ > 2 log₂ n] = O(1/n²). Let the random variable X = max₁≤ᵢ≤ₙ Xᵢ denote the maximum number of probes required by any of the n insertions.
Part (c)
Show that Pr[X > 2 log₂ n] = O(1/n).
Part (d)
Show that the expected length E[X] of the longest probe sequence is O(log₂ n).
---
Problem 3: Randomized Coloring [15 Points]
We are given an undirected graph G = (V, E) and we want to color each node with one of three colors, even if we aren't necessarily able to give different colors to every pair of adjacent nodes. Instead, we say that an edge (u, v) is *satisfied* if the colors assigned to u and v are different, and seek to maximize the number of satisfied edges.
Consider a 3-coloring that maximizes the number of satisfied edges, and let c* denote this number.
Give a polynomial-time randomized algorithm for our problem where the expected number of satisfied edges is at least (2/3)c*.
---
Problem 4: Analysis of Selection Using Recurrence [25 Points]
Consider the randomized variant of selection shown in Algorithm 1, which selects the element with rank k in an unsorted array A[1..n]. Assume that the array elements have distinct values.
Algorithm 1: SELECT(A[1..n], k)
Input: array of integers A[1..n] and integer k ∈ {1, ..., n}
Output: Element of rank k
1 i ← RANDOM(1..n) // select i from {1, ..., n} uniformly at random
2 r ← PARTITION(A[1..n], i) // re-arrange elements using A[i] as pivot
3 if k < r then
4 return SELECT(A[1..(r − 1)], k)
5 else if k > r then
6 return SELECT(A[(r + 1)..n], k − r)
7 else
8 return A[k]The function PARTITION(A[1..n], i) partitions an array A into three parts by comparing the pivot element A[i] to every other element, using (n − 1) comparisons, and returns the new index of the pivot element. The elements that are smaller (resp., larger) than the pivot are placed to the left (resp., right) of the pivot.
The function RANDOM(1..n) selects a number from the set {1, ..., n} uniformly at random.
Let T(n) be the expected number of comparisons that SELECT performs. Find the smallest integer constant c you can for which T(n) ≤ cn for every n.
Hint: You can follow the recurrence approach that was discussed in class.
# COMP382 Homework 4
## Student Responses
<!-- Write your solutions below. Add sections for each problem as needed. -->
<!-- Use proper mathematical notation where applicable. -->
[Your solutions here]
COMP382 Homework 4
Student Responses
---
Problem 1: Hashing with Chaining [20 Points]
Procedure:
1. Pick a slot $j$ uniformly at random from $\{1, \ldots, m\}$: $O(1)$ time. 2. Pick an index $i$ uniformly at random from $\{1, \ldots, L\}$: $O(1)$ time. 3. If $\text{len}(\text{chain}[j]) \geq i$, traverse chain $j$ to its $i$-th element and return it: $O(L)$ time. 4. Otherwise, reject and go back to step 1.
Correctness (uniform distribution):
For any key $k$ stored at slot $j^*$ at position $i^*$ within its chain:
$$\Pr[\text{key } k \text{ selected in one trial}] = \Pr[\text{pick slot } j^*] \cdot \Pr[\text{pick index } i^*] = \frac{1}{m} \cdot \frac{1}{L} = \frac{1}{mL}$$
This probability is identical for all $n$ keys regardless of which slot or position they occupy. Therefore, conditioned on a trial succeeding (i.e., not being rejected), each key is equally likely to be returned, giving a uniform distribution over all $n$ keys.
Expected running time:
The probability that a trial succeeds (i.e., the chosen $(j, i)$ pair corresponds to an actual key) is:
$$\Pr[\text{success}] = \frac{n}{mL} = \frac{\alpha}{L}$$
since exactly $n$ of the $mL$ possible $(j, i)$ pairs correspond to actual stored keys.
- Each failed trial costs $O(1)$ (just pick $j$ and $i$, check $\text{len}(\text{chain}[j])$, and reject).
- The successful trial costs $O(L)$ to traverse up to $L$ elements.
- The expected number of trials is $\frac{1}{\Pr[\text{success}]} = \frac{L}{\alpha}$ (geometric distribution).
Total expected cost:
$$O(1) \cdot \frac{L}{\alpha} + O(L) = O\!\left(\frac{L}{\alpha} + L\right) = O\!\left(L \cdot \left(1 + \frac{1}{\alpha}\right)\right)$$
This matches the required bound. $\square$
---
Problem 2: Longest-Probe Bound for Hashing [40 Points]
Part (a)
Claim: For $i = 1, 2, \ldots, n$, the probability that the $i$-th insertion requires strictly more than $k$ probes is at most $2^{-k}$.
Proof:
The $i$-th insertion requires more than $k$ probes if and only if all of the first $k$ probe positions it tries are already occupied. Before the $i$-th insertion, there are exactly $i - 1$ occupied slots and $m - (i-1)$ empty slots.
With independent uniform permutation hashing, each probe sequence is an independent random permutation of $\{1, \ldots, m\}$. Given that the first $j$ probes were all occupied, the $(j+1)$-th probe is chosen uniformly from the remaining $m - j$ unprobed positions, of which at most $i - 1 - j$ are occupied. Thus:
$$\Pr[\text{$j$-th probe is occupied} \mid \text{first } j-1 \text{ probes occupied}] = \frac{i-1-(j-1)}{m-(j-1)} = \frac{i-j}{m-j+1}$$
So the probability that the first $k$ probes are all occupied is:
$$\Pr[X_i > k] = \prod_{j=1}^{k} \frac{i-j}{m-j+1}$$
We claim each factor is at most $1/2$. Since $n \leq m/2$, we have $i \leq n \leq m/2$, so $i - 1 \leq m/2 - 1 < m/2$. For each $j = 1, \ldots, k$:
$$\frac{i - j}{m - j + 1} \leq \frac{i - 1}{m - 1 + 1} = \frac{i-1}{m} \leq \frac{m/2 - 1}{m} < \frac{1}{2}$$
Wait, more carefully: we want $\frac{i - j}{m - j + 1} \leq \frac{1}{2}$, i.e., $2(i - j) \leq m - j + 1$. Since $i \leq m/2$, we have $2i \leq m$, so $2i - 2j \leq m - 2j \leq m - j + 1$ for $j \geq 1$. Thus each factor is $\leq 1/2$, and:
$$\Pr[X_i > k] \leq \left(\frac{1}{2}\right)^k = 2^{-k} \qquad \square$$
Part (b)
Claim: $\Pr[X_i > 2\log_2 n] = O(1/n^2)$.
Proof:
Setting $k = 2\log_2 n$ in part (a):
$$\Pr[X_i > 2\log_2 n] \leq 2^{-2\log_2 n} = (2^{\log_2 n})^{-2} = n^{-2} = \frac{1}{n^2} = O\!\left(\frac{1}{n^2}\right) \qquad \square$$
Part (c)
Claim: $\Pr[X > 2\log_2 n] = O(1/n)$, where $X = \max_{1 \leq i \leq n} X_i$.
Proof:
$$\Pr[X > 2\log_2 n] = \Pr\!\left[\bigcup_{i=1}^{n} \{X_i > 2\log_2 n\}\right]$$
Applying the union bound:
$$\leq \sum_{i=1}^{n} \Pr[X_i > 2\log_2 n] \leq n \cdot O\!\left(\frac{1}{n^2}\right) = O\!\left(\frac{1}{n}\right) \qquad \square$$
Part (d)
Claim: $E[X] = O(\log_2 n)$.
Proof:
We use the identity $E[X] = \sum_{t=0}^{\infty} \Pr[X > t]$ and split the sum at threshold $t^* = 2\lfloor \log_2 n \rfloor$:
$$E[X] = \sum_{t=0}^{t^*} \Pr[X > t] + \sum_{t=t^*+1}^{\infty} \Pr[X > t]$$
First sum: Since $\Pr[X > t] \leq 1$, the first sum is at most $t^* + 1 = O(\log n)$.
Second sum: For $t > t^*$, by union bound and part (a):
$$\Pr[X > t] \leq \sum_{i=1}^{n} \Pr[X_i > t] \leq n \cdot 2^{-t}$$
Therefore:
$$\sum_{t=t^*+1}^{\infty} \Pr[X > t] \leq \sum_{t=t^*+1}^{\infty} n \cdot 2^{-t} = n \cdot \frac{2^{-(t^*+1)}}{1 - 1/2} = n \cdot 2^{-t^*}$$
Since $t^* = 2\lfloor \log_2 n \rfloor \geq 2(\log_2 n - 1) = 2\log_2 n - 2$:
$$n \cdot 2^{-t^*} \leq n \cdot 2^{-(2\log_2 n - 2)} = n \cdot \frac{4}{n^2} = \frac{4}{n} = O(1)$$
Combining:
$$E[X] = O(\log n) + O(1) = O(\log_2 n) \qquad \square$$
---
Problem 3: Randomized Coloring [15 Points]
Algorithm:
Assign each vertex $v \in V$ a color $c(v)$ chosen independently and uniformly at random from $\{1, 2, 3\}$. This takes $O(|V|)$ time, which is polynomial.
Analysis:
For any edge $(u, v) \in E$, the probability that it is satisfied (i.e., $c(u) \neq c(v)$) is:
$$\Pr[c(u) \neq c(v)] = 1 - \Pr[c(u) = c(v)]$$
Since $c(u)$ and $c(v)$ are chosen independently and uniformly from $\{1, 2, 3\}$:
$$\Pr[c(u) = c(v)] = \sum_{j=1}^{3} \Pr[c(u) = j] \cdot \Pr[c(v) = j] = 3 \cdot \frac{1}{3} \cdot \frac{1}{3} = \frac{1}{3}$$
Therefore $\Pr[\text{edge $(u,v)$ satisfied}] = \frac{2}{3}$.
Let $S$ be the random variable counting the number of satisfied edges. By linearity of expectation:
$$E[S] = \sum_{(u,v) \in E} \Pr[(u,v) \text{ satisfied}] = \frac{2}{3}|E|$$
**Bounding against $c^*$:**
Since the optimal coloring satisfies at most $|E|$ edges (each edge is either satisfied or not), we have $c^* \leq |E|$. Therefore:
$$E[S] = \frac{2}{3}|E| \geq \frac{2}{3} c^*$$
The expected number of satisfied edges is at least $\frac{2}{3}c^*$, as required. $\square$
---
Problem 4: Analysis of Selection Using Recurrence [25 Points]
Setting up the recurrence:
Let $T(n)$ be the worst-case expected number of comparisons for SELECT on an array of size $n$. Each call partitions the array using $n - 1$ comparisons. The pivot's rank $r$ is uniformly distributed in $\{1, \ldots, n\}$ (since the pivot is chosen uniformly at random and all elements are distinct).
For a given $k$, the recursive subproblem has size:
- $0$ if $r = k$ (pivot is the desired element),
- $r - 1$ if $r > k$ (recurse on the left part),
- $n - r$ if $r < k$ (recurse on the right part).
In either case, the subproblem size is at most $\max(r-1, n-r)$. As an upper bound over all choices of $k$:
$$T(n) \leq (n-1) + \frac{1}{n} \sum_{r=1}^{n} T(\max(r-1, n-r))$$
Simplifying the sum:
For $r$ from $1$ to $n$, the value $\max(r-1, n-r)$ takes each value from $\lfloor n/2 \rfloor$ to $n-1$ exactly twice (by symmetry: $r$ and $n+1-r$ give the same max). Therefore:
$$\sum_{r=1}^{n} T(\max(r-1, n-r)) = 2\sum_{j=\lfloor n/2 \rfloor}^{n-1} T(j)$$
So the recurrence becomes:
$$T(n) \leq (n-1) + \frac{2}{n} \sum_{j=\lfloor n/2 \rfloor}^{n-1} T(j)$$
Proving $T(n) \leq 4n$ by induction:
Base cases: $T(0) = T(1) = 0 \leq 4 \cdot 1$. ✓
Inductive step (even $n = 2m$): Assume $T(j) \leq 4j$ for all $j < n$. Then:
$$\sum_{j=m}^{2m-1} j = m + (m+1) + \cdots + (2m-1) = \frac{m(3m-1)}{2}$$
$$T(2m) \leq (2m-1) + \frac{2}{2m} \cdot 4 \cdot \frac{m(3m-1)}{2} = (2m-1) + 2(3m-1) = 8m - 3 < 8m = 4n \checkmark$$
Inductive step (odd $n = 2m+1$):
$$\sum_{j=m}^{2m} j = m + (m+1) + \cdots + 2m = \frac{3m(m+1)}{2}$$
$$T(2m+1) \leq 2m + \frac{2}{2m+1} \cdot 4 \cdot \frac{3m(m+1)}{2} = 2m + \frac{12m(m+1)}{2(2m+1)}$$
We verify $2m + \frac{6m(m+1)}{2m+1} \leq 4(2m+1) = 8m+4$:
$$\frac{6m(m+1)}{2m+1} \leq 6m+4 \iff 6m(m+1) \leq (6m+4)(2m+1) = 12m^2 + 14m + 4$$
$$6m^2 + 6m \leq 12m^2 + 14m + 4 \iff 0 \leq 6m^2 + 8m + 4 \checkmark$$
So $T(n) \leq 4n$ for all $n$. ✓
Showing $c = 3$ is insufficient:
From the induction, we need $c \geq \frac{2(2m-1)}{m+1}$ for all $m \geq 1$ (from even $n = 2m$). For $m = 6$ ($n = 12$):
$$\frac{2(11)}{7} = \frac{22}{7} \approx 3.14 > 3$$
So $c = 3$ fails for $n = 12$.
Conclusion:
The requirement $c \geq \frac{2(2m-1)}{m+1}$ approaches $4$ as $m \to \infty$ (and is always strictly less than $4$ for finite $m$). Therefore, $c = 3$ is insufficient, while $c = 4$ is both sufficient and necessary.
$$\boxed{c = 4}$$
Problem 1: Hashing with Chaining [20 Points]
Score: 20/20 points
Assessment:
Approach: Rejection sampling — pick a random slot and random index within [1, L]; accept if the position is occupied, else reject and repeat.
Strengths:
- Algorithm is clearly stated and correct
- Correctness argument that each key is selected with equal probability 1/(mL) is rigorous
- Expected time analysis is clean and well-structured: separates cost of failed trials (O(1) each) from the successful trial (O(L)), then combines via the geometric distribution
Errors/Gaps:
- None
Detailed Feedback: This is a complete, correct solution that matches the reference approach. The student clearly identifies the acceptance probability as α/L, correctly computes the expected number of trials as L/α, and properly accounts for the O(L) cost of the final successful traversal versus the O(1) cost of each rejection. The final combination O(L/α + L) = O(L(1 + 1/α)) is correctly derived.
Rubric Breakdown:
- Algorithm description: 8/8 - Clear, correct rejection sampling procedure
- Correctness proof (uniform selection): 6/6 - Rigorous argument that each key has equal probability
- Expected time analysis: 6/6 - Correct derivation of O(L(1 + 1/α))
---
Problem 2: Longest-Probe Bound for Hashing [40 Points]
Score: 40/40 points
Assessment:
Approach: Standard probabilistic analysis with explicit conditional probability computation in part (a), direct substitution in (b), union bound in (c), and a tail-sum decomposition in (d).
Strengths:
- Part (a) is more rigorous than the reference: explicitly computes each conditional probability factor (i−j)/(m−j+1) and carefully verifies each is ≤ 1/2
- The self-correction in part (a) ("Wait, more carefully...") shows careful reasoning — the student catches a potential gap and fixes it
- Part (d) uses the elegant identity E[X] = Σ Pr[X > t] rather than the simpler but equivalent conditioning approach in the reference. This is arguably a more sophisticated technique.
Errors/Gaps:
- None
Detailed Feedback:
Part (a): The student provides a detailed computation showing that each probe's conditional probability of hitting an occupied slot is at most 1/2. The key step — verifying that 2(i−j) ≤ m−j+1 given i ≤ m/2 — is handled carefully. Full marks.
Part (b): Direct substitution of k = 2log₂n into part (a). Clean and correct.
Part (c): Standard union bound application. Correct.
Part (d): The student uses the tail-sum identity E[X] = Σ_{t≥0} Pr[X > t], splitting at t* = 2⌊log₂n⌋. The first sum is trivially O(log n). For the second sum, they apply the union bound combined with part (a) to get Pr[X > t] ≤ n·2^{−t}, then sum the geometric series to obtain n·2^{−t*} = O(1). This is a perfectly valid alternative to the reference's conditioning approach, and is equally rigorous.
Rubric Breakdown:
- Part (a): 10/10 - Rigorous proof with explicit conditional probabilities
- Part (b): 10/10 - Correct application of part (a)
- Part (c): 10/10 - Correct union bound
- Part (d): 10/10 - Elegant tail-sum approach, fully correct
---
Problem 3: Randomized Coloring [15 Points]
Score: 15/15 points
Assessment:
Approach: Random 3-coloring with independent uniform color assignment per vertex.
Strengths:
- Algorithm is clearly polynomial-time (O(|V|))
- Probability calculation is explicit: computes Pr[c(u) = c(v)] = 1/3 by summing over all three colors
- Clean application of linearity of expectation
- Correctly relates E[S] to c* via the bound c* ≤ |E|
Errors/Gaps:
- None
Detailed Feedback: This solution matches the reference exactly. The student correctly identifies that each edge is independently satisfied with probability 2/3, applies linearity of expectation to get E[S] = (2/3)|E|, and uses the trivial upper bound c* ≤ |E| to establish the desired guarantee. The presentation is clean and well-organized.
Rubric Breakdown:
- Algorithm description: 5/5 - Clear polynomial-time algorithm
- Probability analysis per edge: 5/5 - Correct computation of 2/3
- Linearity of expectation + bound against c*: 5/5 - Complete and correct
---
Problem 4: Analysis of Selection Using Recurrence [25 Points]
Score: 25/25 points
Assessment:
Approach: Worst-case over k recurrence with the max(r−1, n−r) formulation, followed by induction proof for T(n) ≤ 4n, with a tightness argument showing c = 3 fails.
Strengths:
- The worst-case-over-k approach T(n) ≤ (n−1) + (1/n)Σ T(max(r−1, n−r)) is a valid and clean upper bound
- The symmetry argument reducing to 2Σ_{j=⌊n/2⌋}^{n−1} T(j) is correct
- Induction handles both even and odd cases separately with explicit arithmetic verification
- The tightness argument is excellent: derives the exact condition c ≥ 2(2m−1)/(m+1), evaluates it at m=6 to show c=3 fails, and notes the limit approaches 4
Errors/Gaps:
- None
Detailed Feedback: The student takes a slightly different approach from the reference by working with the worst-case-over-k recurrence rather than the k-specific recurrence. This is a standard and valid technique that simplifies the analysis. The resulting recurrence T(n) ≤ (n−1) + (2/n)Σ_{j=⌊n/2⌋}^{n−1} T(j) is correct.
The induction proof is thorough, handling both parities of n. For even n = 2m: T(2m) ≤ 8m − 3 < 8m = 4n. For odd n = 2m+1: the student carefully verifies the required inequality 6m²+6m ≤ 12m²+14m+4, which holds for all m ≥ 0.
The tightness analysis is a nice addition beyond what the reference provides. By extracting the exact constraint c ≥ 2(2m−1)/(m+1) from the induction step, the student conclusively shows that c = 4 is the smallest integer constant that works, since the expression exceeds 3 for m = 6 (giving 22/7 ≈ 3.14) and approaches 4 in the limit.
Rubric Breakdown:
- Recurrence setup: 5/5 - Valid worst-case-over-k formulation with correct simplification
- Induction proof for T(n) ≤ 4n: 12/12 - Both parities handled, base cases included
- Tightness argument (c = 3 fails): 4/4 - Concrete counterexample and limiting argument
- Final answer c = 4: 4/4 - Correct
---
Overall Summary
| Problem | Score | Notes |
|---|---|---|
| 1 | 20/20 | Correct rejection sampling with complete analysis |
| 2 | 40/40 | All four parts rigorous; part (d) uses elegant tail-sum method |
| 3 | 15/15 | Clean random coloring with linearity of expectation |
| 4 | 25/25 | Complete induction with tightness argument |
| Total | 100/100 |
General Comments:
This is an outstanding submission. Every problem is solved correctly with rigorous proofs that demonstrate deep understanding of the material. Several solutions are arguably more detailed or elegant than the reference: Problem 2(a) provides explicit conditional probability bounds rather than hand-waving, Problem 2(d) uses the tail-sum identity for a clean decomposition, and Problem 4 includes a thorough tightness analysis showing exactly why c = 3 fails and c = 4 is necessary. The mathematical writing is clear, well-organized, and free of errors.
Total: 100/100
diff --git a/writeup.md b/writeup.md
index daaca40..5e55075 100644
--- a/writeup.md
+++ b/writeup.md
@@ -2,8 +2,214 @@
## Student Responses
-<!-- Write your solutions below. Add sections for each problem as needed. -->
-<!-- Use proper mathematical notation where applicable. -->
+---
-[Your solutions here]
+## Problem 1: Hashing with Chaining [20 Points]
+**Procedure:**
+
+1. Pick a slot $j$ uniformly at random from $\{1, \ldots, m\}$: $O(1)$ time.
+2. Pick an index $i$ uniformly at random from $\{1, \ldots, L\}$: $O(1)$ time.
+3. If $\text{len}(\text{chain}[j]) \geq i$, traverse chain $j$ to its $i$-th element and return it: $O(L)$ time.
+4. Otherwise, **reject** and go back to step 1.
+
+**Correctness (uniform distribution):**
+
+For any key $k$ stored at slot $j^*$ at position $i^*$ within its chain:
+
+$$\Pr[\text{key } k \text{ selected in one trial}] = \Pr[\text{pick slot } j^*] \cdot \Pr[\text{pick index } i^*] = \frac{1}{m} \cdot \frac{1}{L} = \frac{1}{mL}$$
+
+This probability is identical for all $n$ keys regardless of which slot or position they occupy. Therefore, conditioned on a trial succeeding (i.e., not being rejected), each key is equally likely to be returned, giving a uniform distribution over all $n$ keys.
+
+**Expected running time:**
+
+The probability that a trial succeeds (i.e., the chosen $(j, i)$ pair corresponds to an actual key) is:
+
+$$\Pr[\text{success}] = \frac{n}{mL} = \frac{\alpha}{L}$$
+
+since exactly $n$ of the $mL$ possible $(j, i)$ pairs correspond to actual stored keys.
+
+- Each **failed** trial costs $O(1)$ (just pick $j$ and $i$, check $\text{len}(\text{chain}[j])$, and reject).
+- The **successful** trial costs $O(L)$ to traverse up to $L$ elements.
+- The expected number of trials is $\frac{1}{\Pr[\text{success}]} = \frac{L}{\alpha}$ (geometric distribution).
+
+Total expected cost:
+
+$$O(1) \cdot \frac{L}{\alpha} + O(L) = O\!\left(\frac{L}{\alpha} + L\right) = O\!\left(L \cdot \left(1 + \frac{1}{\alpha}\right)\right)$$
+
+This matches the required bound. $\square$
+
+---
+
+## Problem 2: Longest-Probe Bound for Hashing [40 Points]
+
+### Part (a)
+
+**Claim:** For $i = 1, 2, \ldots, n$, the probability that the $i$-th insertion requires strictly more than $k$ probes is at most $2^{-k}$.
+
+**Proof:**
+
+The $i$-th insertion requires more than $k$ probes if and only if all of the first $k$ probe positions it tries are already occupied. Before the $i$-th insertion, there are exactly $i - 1$ occupied slots and $m - (i-1)$ empty slots.
+
+With independent uniform permutation hashing, each probe sequence is an independent random permutation of $\{1, \ldots, m\}$. Given that the first $j$ probes were all occupied, the $(j+1)$-th probe is chosen uniformly from the remaining $m - j$ unprobed positions, of which at most $i - 1 - j$ are occupied. Thus:
+
+$$\Pr[\text{$j$-th probe is occupied} \mid \text{first } j-1 \text{ probes occupied}] = \frac{i-1-(j-1)}{m-(j-1)} = \frac{i-j}{m-j+1}$$
+
+So the probability that the first $k$ probes are all occupied is:
+
+$$\Pr[X_i > k] = \prod_{j=1}^{k} \frac{i-j}{m-j+1}$$
+
+We claim each factor is at most $1/2$. Since $n \leq m/2$, we have $i \leq n \leq m/2$, so $i - 1 \leq m/2 - 1 < m/2$. For each $j = 1, \ldots, k$:
+
+$$\frac{i - j}{m - j + 1} \leq \frac{i - 1}{m - 1 + 1} = \frac{i-1}{m} \leq \frac{m/2 - 1}{m} < \frac{1}{2}$$
+
+Wait, more carefully: we want $\frac{i - j}{m - j + 1} \leq \frac{1}{2}$, i.e., $2(i - j) \leq m - j + 1$. Since $i \leq m/2$, we have $2i \leq m$, so $2i - 2j \leq m - 2j \leq m - j + 1$ for $j \geq 1$. Thus each factor is $\leq 1/2$, and:
+
+$$\Pr[X_i > k] \leq \left(\frac{1}{2}\right)^k = 2^{-k} \qquad \square$$
+
+### Part (b)
+
+**Claim:** $\Pr[X_i > 2\log_2 n] = O(1/n^2)$.
+
+**Proof:**
+
+Setting $k = 2\log_2 n$ in part (a):
+
+$$\Pr[X_i > 2\log_2 n] \leq 2^{-2\log_2 n} = (2^{\log_2 n})^{-2} = n^{-2} = \frac{1}{n^2} = O\!\left(\frac{1}{n^2}\right) \qquad \square$$
+
+### Part (c)
+
+**Claim:** $\Pr[X > 2\log_2 n] = O(1/n)$, where $X = \max_{1 \leq i \leq n} X_i$.
+
+**Proof:**
+
+$$\Pr[X > 2\log_2 n] = \Pr\!\left[\bigcup_{i=1}^{n} \{X_i > 2\log_2 n\}\right]$$
+
+Applying the union bound:
+
+$$\leq \sum_{i=1}^{n} \Pr[X_i > 2\log_2 n] \leq n \cdot O\!\left(\frac{1}{n^2}\right) = O\!\left(\frac{1}{n}\right) \qquad \square$$
+
+### Part (d)
+
+**Claim:** $E[X] = O(\log_2 n)$.
+
+**Proof:**
+
+We use the identity $E[X] = \sum_{t=0}^{\infty} \Pr[X > t]$ and split the sum at threshold $t^* = 2\lfloor \log_2 n \rfloor$:
+
+$$E[X] = \sum_{t=0}^{t^*} \Pr[X > t] + \sum_{t=t^*+1}^{\infty} \Pr[X > t]$$
+
+**First sum:** Since $\Pr[X > t] \leq 1$, the first sum is at most $t^* + 1 = O(\log n)$.
+
+**Second sum:** For $t > t^*$, by union bound and part (a):
+
+$$\Pr[X > t] \leq \sum_{i=1}^{n} \Pr[X_i > t] \leq n \cdot 2^{-t}$$
+
+Therefore:
+
+$$\sum_{t=t^*+1}^{\infty} \Pr[X > t] \leq \sum_{t=t^*+1}^{\infty} n \cdot 2^{-t} = n \cdot \frac{2^{-(t^*+1)}}{1 - 1/2} = n \cdot 2^{-t^*}$$
+
+Since $t^* = 2\lfloor \log_2 n \rfloor \geq 2(\log_2 n - 1) = 2\log_2 n - 2$:
+
+$$n \cdot 2^{-t^*} \leq n \cdot 2^{-(2\log_2 n - 2)} = n \cdot \frac{4}{n^2} = \frac{4}{n} = O(1)$$
+
+Combining:
+
+$$E[X] = O(\log n) + O(1) = O(\log_2 n) \qquad \square$$
+
+---
+
+## Problem 3: Randomized Coloring [15 Points]
+
+**Algorithm:**
+
+Assign each vertex $v \in V$ a color $c(v)$ chosen independently and uniformly at random from $\{1, 2, 3\}$. This takes $O(|V|)$ time, which is polynomial.
+
+**Analysis:**
+
+For any edge $(u, v) \in E$, the probability that it is satisfied (i.e., $c(u) \neq c(v)$) is:
+
+$$\Pr[c(u) \neq c(v)] = 1 - \Pr[c(u) = c(v)]$$
+
+Since $c(u)$ and $c(v)$ are chosen independently and uniformly from $\{1, 2, 3\}$:
+
+$$\Pr[c(u) = c(v)] = \sum_{j=1}^{3} \Pr[c(u) = j] \cdot \Pr[c(v) = j] = 3 \cdot \frac{1}{3} \cdot \frac{1}{3} = \frac{1}{3}$$
+
+Therefore $\Pr[\text{edge $(u,v)$ satisfied}] = \frac{2}{3}$.
+
+Let $S$ be the random variable counting the number of satisfied edges. By linearity of expectation:
+
+$$E[S] = \sum_{(u,v) \in E} \Pr[(u,v) \text{ satisfied}] = \frac{2}{3}|E|$$
+
+**Bounding against $c^*$:**
+
+Since the optimal coloring satisfies at most $|E|$ edges (each edge is either satisfied or not), we have $c^* \leq |E|$. Therefore:
+
+$$E[S] = \frac{2}{3}|E| \geq \frac{2}{3} c^*$$
+
+The expected number of satisfied edges is at least $\frac{2}{3}c^*$, as required. $\square$
+
+---
+
+## Problem 4: Analysis of Selection Using Recurrence [25 Points]
+
+**Setting up the recurrence:**
+
+Let $T(n)$ be the worst-case expected number of comparisons for SELECT on an array of size $n$. Each call partitions the array using $n - 1$ comparisons. The pivot's rank $r$ is uniformly distributed in $\{1, \ldots, n\}$ (since the pivot is chosen uniformly at random and all elements are distinct).
+
+For a given $k$, the recursive subproblem has size:
+- $0$ if $r = k$ (pivot is the desired element),
+- $r - 1$ if $r > k$ (recurse on the left part),
+- $n - r$ if $r < k$ (recurse on the right part).
+
+In either case, the subproblem size is at most $\max(r-1, n-r)$. As an upper bound over all choices of $k$:
+
+$$T(n) \leq (n-1) + \frac{1}{n} \sum_{r=1}^{n} T(\max(r-1, n-r))$$
+
+**Simplifying the sum:**
+
+For $r$ from $1$ to $n$, the value $\max(r-1, n-r)$ takes each value from $\lfloor n/2 \rfloor$ to $n-1$ exactly twice (by symmetry: $r$ and $n+1-r$ give the same max). Therefore:
+
+$$\sum_{r=1}^{n} T(\max(r-1, n-r)) = 2\sum_{j=\lfloor n/2 \rfloor}^{n-1} T(j)$$
+
+So the recurrence becomes:
+
+$$T(n) \leq (n-1) + \frac{2}{n} \sum_{j=\lfloor n/2 \rfloor}^{n-1} T(j)$$
+
+**Proving $T(n) \leq 4n$ by induction:**
+
+**Base cases:** $T(0) = T(1) = 0 \leq 4 \cdot 1$. ✓
+
+**Inductive step (even $n = 2m$):** Assume $T(j) \leq 4j$ for all $j < n$. Then:
+
+$$\sum_{j=m}^{2m-1} j = m + (m+1) + \cdots + (2m-1) = \frac{m(3m-1)}{2}$$
+
+$$T(2m) \leq (2m-1) + \frac{2}{2m} \cdot 4 \cdot \frac{m(3m-1)}{2} = (2m-1) + 2(3m-1) = 8m - 3 < 8m = 4n \checkmark$$
+
+**Inductive step (odd $n = 2m+1$):**
+
+$$\sum_{j=m}^{2m} j = m + (m+1) + \cdots + 2m = \frac{3m(m+1)}{2}$$
+
+$$T(2m+1) \leq 2m + \frac{2}{2m+1} \cdot 4 \cdot \frac{3m(m+1)}{2} = 2m + \frac{12m(m+1)}{2(2m+1)}$$
+
+We verify $2m + \frac{6m(m+1)}{2m+1} \leq 4(2m+1) = 8m+4$:
+
+$$\frac{6m(m+1)}{2m+1} \leq 6m+4 \iff 6m(m+1) \leq (6m+4)(2m+1) = 12m^2 + 14m + 4$$
+
+$$6m^2 + 6m \leq 12m^2 + 14m + 4 \iff 0 \leq 6m^2 + 8m + 4 \checkmark$$
+
+So $T(n) \leq 4n$ for all $n$. ✓
+
+**Showing $c = 3$ is insufficient:**
+
+From the induction, we need $c \geq \frac{2(2m-1)}{m+1}$ for all $m \geq 1$ (from even $n = 2m$). For $m = 6$ ($n = 12$):
+
+$$\frac{2(11)}{7} = \frac{22}{7} \approx 3.14 > 3$$
+
+So $c = 3$ fails for $n = 12$.
+
+**Conclusion:**
+
+The requirement $c \geq \frac{2(2m-1)}{m+1}$ approaches $4$ as $m \to \infty$ (and is always strictly less than $4$ for finite $m$). Therefore, $c = 3$ is insufficient, while $c = 4$ is both sufficient and necessary.
+
+$$\boxed{c = 4}$$
Sub-Model Usage
| Model | Input | Output | Cache Read | Cost |
|---|---|---|---|---|
| claude-sonnet-4-6 | 7 | 32.6K | 228.1K | $1.22 |
BSCS Bench