Agent Work: Homework3
Claude Sonnet 4.6 · COMP 382: Reasoning about Algorithms
COMP 382: Reasoning about Algorithms
Homework 3
Total Points: 100
---
Problem 1: String Merging [15 points]
Let X, Y, and Z be three strings, where |X| = m, |Y| = n, and |Z| = m + n. We say that Z *merges* X and Y iff Z can be formed by interleaving the characters from X and Y in a way that maintains the left-to-right ordering of the characters from each string. For example, both abucvwx and uavwbxc merge the strings abc and uvwx.
Part 1 [5 points]
Give an efficient dynamic programming algorithm that determines whether Z merges X and Y. What is the time complexity of your algorithm?
Part 2 [10 points]
We will now generalize the problem so that the input includes a number K (in addition to the strings X, Y, Z). K is the maximum number of errors (substitutions) that we can allow during the merging. That is, the computational problem is to determine if there exists a string Z' (of length m + n) such that: (i) Z' merges X and Y, and (ii) Z, Z' differ at a maximum of K positions (i.e., the Hamming distance between Z and Z' is at most K).
Give an efficient dynamic programming algorithm for this problem. What is the time complexity of your algorithm?
---
Problem 2: Inventory Planning [20 Points]
The Rinky Dink Company makes machines that resurface ice rinks. The demand for such products varies from month to month, and so the company needs to develop a strategy to plan its manufacturing given the fluctuating, but predictable, demand.
The company wishes to design a plan for the next n months. For each month i, the company knows the demand dᵢ, that is, the number of machines that it will sell. Let D = Σᵢ₌₁ⁿ dᵢ be the total demand over the next n months.
The company keeps a full-time staff who provide labor to manufacture up to m machines per month. If the company needs to make more than m machines in a given month, it can hire additional, part-time labor, at a cost that works out to c dollars per machine.
Furthermore, if, at the end of a month, the company is holding any unsold machines, it must pay inventory costs. The cost for holding j machines is given as a function h(j) for j = 1, 2, ..., D, where h(j) ≥ 0 for 1 ≤ j ≤ D and h(j) ≤ h(j + 1) for 1 ≤ j ≤ D − 1.
Give an algorithm that calculates a plan for the company that minimizes its costs while fulfilling all the demand. The running time should be polynomial in n and D.
---
Problem 3: Optimal Matching of Sequences [30 Points]
For an integer n ≥ 1, we define [n] = {1, ..., n}. Suppose we are given as input sequences (i.e., arrays) of integers X = [x₁, ..., xₘ] and Y = [y₁, ..., yₙ]. A *matching* for (X, Y) is a subset M ⊆ [m] × [n] (containing pairs of indexes) that satisfies the following properties:
(i) (Left covered) For every index i ∈ [m] there is some j ∈ [n] such that (i, j) ∈ M.
(ii) (Right covered) For every index j ∈ [n] there is some i ∈ [m] such that (i, j) ∈ M.
(iii) (No crossing) There are no indexes i, i' ∈ [m] and j, j' ∈ [n] with i < i' and j < j' such that (i, j') ∈ M and (i', j) ∈ M.
The *cost* of a matching M for (X, Y) is defined as follows:
$$\text{cost}(M) = \sum_{(i,j) \in M} (x_i - y_j)^2$$
We are interested in finding an *optimal matching*, that is, a matching with minimal cost.
Part 1 [3 points]
Prove that every matching contains the pairs (1, 1) and (m, n).
Part 2 [7 points]
Let f(m, n) be the number of all possible matchings for (X, Y), where X = [x₁, ..., xₘ] and Y = [y₁, ..., yₙ]. Give a recursive definition for the function f and carefully explain how you obtained it.
Part 3 [20 points]
Design an algorithm that computes a matching with minimal cost. Explain why it is correct and discuss its time and space complexity.
---
Problem 4: Edit Distance & LCS [35 Points]
We will consider a generalization of the concept of edit distance, where the cost of an insertion, deletion and substitution is cᵢ, c_d and c_s respectively. Each parameter can be chosen to be any positive extended real number, i.e., an element of {x ∈ ℝ | x > 0} ∪ {∞}.
Keep in mind the following properties of ∞:
- ∞ + ∞ = ∞
- x + ∞ = ∞
- x < ∞
for every x ∈ ℝ.
Part 1 [3 points]
Provide pseudocode for the procedure EDITDISTANCE(X, Y, cᵢ, c_d, c_s), where X[1..m] and Y[1..n] are strings and cᵢ, c_d, c_s are the edit operation costs. This procedure should return the edit distance from X to Y, which we denote by D(X, Y). Recall that:
D(X, Y) = min{cost(W) | edit sequence W for (X, Y)}
as we defined in the lectures.
Part 2 [12 points]
Prove that D(xa, ya) = D(x, y) for all strings x, y and every letter a.
Note: This is not a trivial claim. You should justify it with a careful proof using the definition of edit distance.
Part 3 [20 points]
Let L(x, y) be the length of the longest common subsequence of strings x, y. We know from CLRS (page 396) that L is given by the following recursive definition:
- L(x, ε) = 0
- L(ε, y) = 0
- L(xa, ya) = L(x, y) + 1
- L(xa, yb) = max(L(xa, y), L(x, yb)) when a ≠ b
for all strings x, y and letters a, b.
For strings X and Y, show how L(X, Y) can be computed in one line of code using only one invocation of the procedure EDITDISTANCE(X, Y, cᵢ, c_d, c_s). Justify the correctness of your approach with a careful proof.
# COMP382 Homework 3
## Student Responses
<!-- Write your solutions below. Add sections for each problem as needed. -->
<!-- Use proper mathematical notation where applicable. -->
[Your solutions here]
COMP382 Homework 3
Student Responses
---
Problem 1: String Merging
Part 1 [5 points]
Algorithm:
Define dp[i][j] = true if Z[1..i+j] can be formed by interleaving X[1..i] and Y[1..j].
Base case: dp[0][0] = true
Recurrence:
dp[i][j] = (i > 0 AND dp[i-1][j] AND Z[i+j] == X[i])
OR (j > 0 AND dp[i][j-1] AND Z[i+j] == Y[j])Return: dp[m][n]
Pseudocode:
MERGE_CHECK(X, Y, Z):
m = |X|, n = |Y|
dp[0][0] = true
dp[i][j] = false for all other (i, j)
for i = 0 to m:
for j = 0 to n:
if (i, j) == (0, 0): continue
if i > 0 and dp[i-1][j] and Z[i+j] == X[i]:
dp[i][j] = true
if j > 0 and dp[i][j-1] and Z[i+j] == Y[j]:
dp[i][j] = true
return dp[m][n]Correctness: dp[i][j] is true iff there is a way to interleave X[1..i] and Y[1..j] to form Z[1..i+j]. At each cell, we consider whether the last character came from X (moving from dp[i-1][j]) or from Y (moving from dp[i][j-1]).
Time complexity: O(mn), since we fill an (m+1) × (n+1) table with O(1) work per cell.
---
Part 2 [10 points]
Algorithm:
Define dp[i][j] = minimum Hamming distance between Z[1..i+j] and any string Z'[1..i+j] that is a valid interleaving of X[1..i] and Y[1..j].
Base case: dp[0][0] = 0
Recurrence: When building Z' at position i+j, the character comes from either X[i] (moving from dp[i-1][j]) or Y[j] (moving from dp[i][j-1]):
dp[i][j] = min(
dp[i-1][j] + (X[i] != Z[i+j] ? 1 : 0), // take X[i] as next char
dp[i][j-1] + (Y[j] != Z[i+j] ? 1 : 0) // take Y[j] as next char
)with the convention that dp[i][j] = ∞ if neither predecessor is valid (i.e., when i = j = 0 and both terms don't apply).
Pseudocode:
MERGE_WITH_ERRORS(X, Y, Z, K):
m = |X|, n = |Y|
Initialize dp[i][j] = ∞ for all i, j
dp[0][0] = 0
for i = 0 to m:
for j = 0 to n:
if i == 0 and j == 0: continue
cost_x = ∞ if i == 0 else dp[i-1][j] + (X[i] != Z[i+j] ? 1 : 0)
cost_y = ∞ if j == 0 else dp[i][j-1] + (Y[j] != Z[i+j] ? 1 : 0)
dp[i][j] = min(cost_x, cost_y)
return dp[m][n] <= KCorrectness: dp[i][j] gives the minimum number of positions where Z' must differ from Z to produce Z[1..i+j] as an interleaving of X[1..i] and Y[1..j]. At each position i+j of Z', we choose the character from either X[i] or Y[j], incurring a cost of 1 if that character differs from Z[i+j]. We return true iff this minimum cost is ≤ K.
Time complexity: O(mn), using the same (m+1) × (n+1) table with O(1) work per cell.
---
Problem 2: Inventory Planning [20 Points]
DP Formulation:
Let dp[i][j] = minimum total cost for months 1 through i, with j machines left in inventory at the end of month i.
State space: i ∈ {0, ..., n}, j ∈ {0, ..., D} where D = Σ dᵢ is total demand.
Base case: dp[0][0] = 0, dp[0][j] = ∞ for j > 0 (start with no inventory).
Transition: For each month i with demand dᵢ and ending inventory j, the starting inventory was prev_j (end of month i−1), so production is:
p_i = j - prev_j + d_iThis requires p_i ≥ 0, i.e., prev_j ≤ j + dᵢ.
The holding cost is h(j) for j ≥ 1, and 0 for j = 0. Define h(0) = 0.
dp[i][j] = min over 0 ≤ prev_j ≤ min(D, j + d_i):
dp[i-1][prev_j] + c · max(0, (j - prev_j + d_i) - m) + h(j)Algorithm:
INVENTORY_PLAN(d[1..n], m, c, h[1..D]):
D = sum of d[i]
Initialize dp[0][0] = 0, dp[0][j] = ∞ for j > 0
for i = 1 to n:
for j = 0 to D:
dp[i][j] = ∞
for prev_j = 0 to min(D, j + d[i]):
p_i = j - prev_j + d[i] // production this month
overtime = c * max(0, p_i - m) // overtime labor cost
holding = h(j) // holding cost (h(0) = 0)
dp[i][j] = min(dp[i][j], dp[i-1][prev_j] + overtime + holding)
return min over j of dp[n][j]Correctness: The DP correctly accounts for all costs. In each month i, given prev_j units of inventory carried over, producing p_i = j - prev_j + dᵢ ≥ 0 machines fulfills demand dᵢ and leaves j units in inventory. Overtime cost applies when p_i > m. The DP considers all reachable states and takes the minimum cost.
Time complexity: O(n · D²), since for each of the n months we iterate over O(D) values of j and O(D) values of prev_j.
Space complexity: O(D) using a rolling array (only need previous row).
Since n and D are the problem parameters and this running time is polynomial in both, the algorithm satisfies the requirement.
---
Problem 3: Optimal Matching of Sequences [30 Points]
Part 1 [3 points]
Claim: Every matching M for (X, Y) contains the pairs (1, 1) and (m, n).
Proof that (1, 1) ∈ M:
By left-coverage, ∃j ≥ 1 such that (1, j) ∈ M. By right-coverage, ∃i ≥ 1 such that (i, 1) ∈ M.
If i = 1, then (1, 1) ∈ M and we are done.
Suppose i > 1 (so i' := i > 1). We have (1, j) ∈ M and (i', 1) ∈ M. If j > 1, then taking i = 1 < i' and j_small = 1 < j_large = j, we have (1, j_large) = (1, j) ∈ M and (i', j_small) = (i', 1) ∈ M, which violates the no-crossing condition. Therefore j = 1, giving (1, 1) ∈ M. ∎
Proof that (m, n) ∈ M:
By left-coverage, ∃j ≤ n such that (m, j) ∈ M. By right-coverage, ∃i ≤ m such that (i, n) ∈ M.
If j = n, then (m, n) ∈ M and we are done.
Suppose j < n. If i = m, then (m, n) ∈ M and we are done. Suppose i < m. Then we have (i, n) ∈ M and (m, j) ∈ M with i < m and j < n. Taking i_small = i < i_large = m and j_small = j < j_large = n, we have (i_small, j_large) = (i, n) ∈ M and (i_large, j_small) = (m, j) ∈ M, which violates no-crossing. Therefore i = m and (m, n) ∈ M. ∎
---
Part 2 [7 points]
Recursive definition:
f(1, n) = 1 for all n ≥ 1
f(m, 1) = 1 for all m ≥ 1
f(m, n) = f(m-1, n) + f(m, n-1) + f(m-1, n-1) for m, n ≥ 2Explanation:
First, we observe the no-crossing condition implies that the pairs of any matching, sorted by first index, must have non-decreasing second indices. (If (i₁, j₁) and (i₂, j₂) are in M with i₁ < i₂ but j₁ > j₂, this is precisely a crossing.) Therefore every matching M corresponds to a monotone path of cells from (1,1) to (m,n) in the grid [m] × [n], where consecutive cells in the path differ by (+1,0), (0,+1), or (+1,+1). The pairs in M are exactly the cells visited by the path.
The left-coverage, right-coverage, and no-crossing properties are equivalent to this path visiting every row 1…m (path starts at row 1 and ends at row m, visiting each row at least once) and every column 1…n (similarly).
Base cases: For m = 1 (only one row), the path must visit all n columns, so it moves right at each step: the unique path is (1,1), (1,2), ..., (1,n), giving f(1,n) = 1. Symmetrically, f(m,1) = 1.
Recursive case: Consider a matching M for [m] × [n]. By Part 1, (m,n) ∈ M. We show: it is impossible to have both (m, j) ∈ M with j < n and (i, n) ∈ M with i < m simultaneously, because that would give a crossing (with i_small = i, i_large = m, j_small = j, j_large = n). Therefore exactly one of three cases holds:
- Case A — No extra pairs in row m or column n: removing (m,n) yields a valid matching for [m−1] × [n−1].
- Left-coverage: row m is covered by (m,n) in the original; rows 1..m−1 had no pairs in column n (by case assumption), so all their pairs remain and they are still covered. - Right-coverage: column n is covered by (m,n); columns 1..n−1 had no pairs in row m (by case assumption), so they remain covered. - These matchings are in bijection with matchings for [m−1] × [n−1]: count = f(m−1, n−1).
- Case B — Extra pairs in row m (some (m, j) ∈ M with j < n), but no extra pairs in column n: removing (m,n) gives a valid matching for [m] × [n−1].
- Right-coverage for [n−1]: each column j < n must be covered; column n has no pairs other than (m,n) (case assumption), so column n−1 must be covered in M by some i < m, which remains. ✓ - Left-coverage for row m: (m, j) ∈ M for some j < n, so row m is still covered after removing (m,n). ✓ - These matchings are in bijection with matchings for [m] × [n−1]: count = f(m, n−1).
- Case C — Extra pairs in column n (some (i, n) ∈ M with i < m), but no extra pairs in row m: by symmetry, removing (m,n) gives a valid matching for [m−1] × [n]: count = f(m−1, n).
Since the three cases are mutually exclusive and exhaustive: $$f(m,n) = f(m-1,n-1) + f(m,n-1) + f(m-1,n)$$
(These are the Delannoy numbers, counting monotone paths with steps right, down, or diagonal.)
---
Part 3 [20 points]
DP Algorithm:
Let OPT[i][j] = minimum cost of a matching for (X[1..i], Y[1..j]).
Base cases:
OPT[1][1] = (x₁ - y₁)²
OPT[1][j] = OPT[1][j-1] + (x₁ - yⱼ)² for j ≥ 2
OPT[i][1] = OPT[i-1][1] + (xᵢ - y₁)² for i ≥ 2Recurrence:
OPT[i][j] = (xᵢ - yⱼ)² + min(OPT[i-1][j], OPT[i][j-1], OPT[i-1][j-1])Answer: OPT[m][n]. To recover the actual matching, backtrack through the table.
Pseudocode:
OPTIMAL_MATCH(X[1..m], Y[1..n]):
OPT[1][1] = (x₁ - y₁)²
for j = 2 to n: OPT[1][j] = OPT[1][j-1] + (x₁ - yⱼ)²
for i = 2 to m: OPT[i][1] = OPT[i-1][1] + (xᵢ - y₁)²
for i = 2 to m:
for j = 2 to n:
OPT[i][j] = (xᵢ - yⱼ)² + min(OPT[i-1][j], OPT[i-1][j-1], OPT[i][j-1])
return OPT[m][n]Correctness:
By Part 1 (generalized to any sub-rectangle), every matching for (X[1..i], Y[1..j]) must contain the pair (i, j). Combined with the structure from Part 2, any matching for [i] × [j] is a monotone path ending at (i,j), whose second-to-last step was one of (i−1,j), (i,j−1), or (i−1,j−1).
We prove correctness by induction on i+j:
*Base case:* OPT[1][1] = (x₁−y₁)² is correct: the only valid matching for [1]×[1] is {(1,1)}.
*Inductive step:* Assume OPT[i'][j'] is correct for all i'+j' < i+j. Any matching M for [i]×[j] contains (i,j) and falls into one of three cases from Part 2:
- Restriction to [i−1]×[j] is a valid matching with cost OPT[i−1][j] (by IH)
- Restriction to [i]×[j−1] is a valid matching with cost OPT[i][j−1] (by IH)
- Restriction to [i−1]×[j−1] is a valid matching with cost OPT[i−1][j−1] (by IH)
The cost of M equals (xᵢ−yⱼ)² plus the cost of the restriction. Therefore: $$\text{OPT}[i][j] = (x_i - y_j)^2 + \min(\text{OPT}[i-1][j],\ \text{OPT}[i][j-1],\ \text{OPT}[i-1][j-1])$$
is correct by optimal substructure. ✓
Time complexity: O(mn) — filling an m×n table with O(1) work per cell.
Space complexity: O(mn) to store the table (reducible to O(min(m,n)) using two rows if we only need the cost, not the full matching).
---
Problem 4: Edit Distance & LCS [35 Points]
Part 1 [3 points]
EDITDISTANCE(X, Y, cᵢ, c_d, c_s):
m = |X|, n = |Y|
Create table D[0..m][0..n]
D[0][0] = 0
for i = 1 to m:
D[i][0] = i * c_d // delete X[1..i] entirely
for j = 1 to n:
D[0][j] = j * cᵢ // insert Y[1..j] from empty string
for i = 1 to m:
for j = 1 to n:
sub_cost = 0 if X[i] == Y[j] else c_s
D[i][j] = min(
D[i-1][j] + c_d, // delete X[i]
D[i][j-1] + cᵢ, // insert Y[j]
D[i-1][j-1] + sub_cost // match or substitute
)
return D[m][n]---
Part 2 [12 points]
Claim: D(xa, ya) = D(x, y) for all strings x, y and every letter a.
Proof:
We use the edit distance recurrence, which follows directly from the definition D(X,Y) = min{cost(W) | edit sequence W for (X,Y)}: for any strings X[1..m] and Y[1..n], an optimal edit sequence for transforming X to Y must either delete X[m], insert Y[n], or match/substitute X[m] with Y[n] as its last operation, giving: $$D(X[1..m], Y[1..n]) = \min\bigl(D(X[1..m{-}1], Y[1..n]) + c_d,\ D(X[1..m], Y[1..n{-}1]) + c_i,\ D(X[1..m{-}1], Y[1..n{-}1]) + \text{cost}(X[m], Y[n])\bigr)$$
Applying this to (xa, ya) where both strings end in the same letter a (so the substitution cost is 0): $$D(xa, ya) = \min\bigl(D(x, ya) + c_d,\ D(xa, y) + c_i,\ D(x, y) + 0\bigr) \tag{*}$$
We now establish two bounds.
Step 1: D(xa, ya) ≤ D(x, y).
This follows immediately from (*): D(xa, ya) = min{..., D(x,y),...} ≤ D(x, y).
Alternatively: take any optimal edit sequence W for (x, y). Appending the operation "match a with a" (cost 0) to W yields a valid edit sequence for (xa, ya) of cost D(x, y). Hence D(xa, ya) ≤ D(x, y).
Step 2: D(xa, ya) ≥ D(x, y), i.e., the minimum in (*) equals D(x, y).
We show each of the first two terms in (*) is ≥ D(x, y):
- D(x, ya) + c_d ≥ D(x, y): By the triangle inequality for edit distance, D(x, y) ≤ D(x, ya) + D(ya, y). Since D(ya, y) ≤ c_d (delete the last character a from ya), we get D(x, y) ≤ D(x, ya) + c_d, i.e., D(x, ya) + c_d ≥ D(x, y). ✓
- D(xa, y) + cᵢ ≥ D(x, y): By the triangle inequality, D(x, y) ≤ D(x, xa) + D(xa, y). Since D(x, xa) ≤ cᵢ (insert a at the end of x), we get D(x, y) ≤ cᵢ + D(xa, y), i.e., D(xa, y) + cᵢ ≥ D(x, y). ✓
Since both D(x, ya) + c_d ≥ D(x, y) and D(xa, y) + cᵢ ≥ D(x, y), and the third term equals D(x, y), the minimum in (*) is exactly D(x, y).
Conclusion: Combining Steps 1 and 2: $$D(xa, ya) = D(x, y) \qquad \square$$
---
Part 3 [20 points]
One-line computation:
L(X, Y) = (|X| + |Y| - EDITDISTANCE(X, Y, 1, 1, ∞)) / 2Explanation: We call EDITDISTANCE(X, Y, 1, 1, ∞) with insertion cost cᵢ = 1, deletion cost c_d = 1, and substitution cost c_s = ∞.
Proof of correctness:
We prove that D(X, Y, 1, 1, ∞) = |X| + |Y| − 2·L(X, Y), which immediately gives the formula.
Let m = |X|, n = |Y|, L = L(X, Y).
Setting c_s = ∞ means substitutions are never used. Any edit sequence that uses a substitution has infinite cost, so no optimal sequence uses substitutions (since there always exists a finite-cost sequence using only insertions and deletions).
Upper bound: D(X, Y, 1, 1, ∞) ≤ m + n − 2L.
Let (i₁, j₁), ..., (i_L, j_L) be the index pairs of a longest common subsequence (so X[i_k] = Y[j_k] for each k). Construct an edit sequence:
- Delete the m − L characters of X not in the LCS (cost m − L).
- Insert the n − L characters of Y not in the LCS in the correct positions (cost n − L).
The result is Y, with total cost (m − L) + (n − L) = m + n − 2L.
Lower bound: D(X, Y, 1, 1, ∞) ≥ m + n − 2L.
Consider any edit sequence W for (X, Y) using only insertions and deletions. We can analyze W as follows: some characters of X are deleted (say m − k are deleted) and some characters of Y are inserted (say n − k are inserted). The remaining k characters of X are "kept": they must appear in Y without insertion, in the same relative order. Since these kept characters must match characters of Y exactly (substitutions are forbidden), they form a common subsequence of X and Y of length k. By definition of L(X, Y), we have k ≤ L(X, Y). Therefore: $$\text{cost}(W) = (m - k) \cdot 1 + (n - k) \cdot 1 = m + n - 2k \geq m + n - 2L$$
Combining: D(X, Y, 1, 1, ∞) = m + n − 2L(X, Y), so: $$L(X, Y) = \frac{|X| + |Y| - D(X, Y, 1, 1, \infty)}{2} = \frac{|X| + |Y| - \texttt{EDITDISTANCE}(X, Y, 1, 1, \infty)}{2}$$
Note that the result is always a non-negative integer since D(X,Y,1,1,∞) and |X|+|Y| have the same parity (both differ from 2L by equal amounts). ∎
Problem 1: String Merging [15 points]
Part 1
Score: 5/5
Approach: Standard DP with Boolean table dp[i][j] indicating whether Z[1..i+j] merges X[1..i] and Y[1..j].
Assessment: The subproblem definition, recurrence, base cases, pseudocode, correctness argument, and time complexity analysis are all correct and match the reference solution. The recurrence correctly checks whether the last character of Z[1..i+j] came from X[i] or Y[j]. Time complexity O(mn) is correctly stated.
Part 2
Score: 10/10
Approach: DP table dp[i][j] storing the minimum Hamming distance, with the same transition structure as Part 1 but accumulating mismatch costs.
Assessment: The formulation is correct. The cost function (X[i] != Z[i+j] ? 1 : 0) is equivalent to the reference solution's ν(a,b) helper. The recurrence correctly takes the minimum over the two choices (take from X or take from Y), each adding a cost of 0 or 1 depending on whether the chosen character matches Z at that position. The final check dp[m][n] ≤ K is correct. Pseudocode is clear and handles boundary cases properly. Time complexity O(mn) is correct.
Rubric Breakdown:
- Subproblem definition: 2/2
- Recurrence + base cases: 5/5
- Correctness explanation: 4/4
- Time complexity: 4/4
---
Problem 2: Inventory Planning [20 Points]
Score: 20/20
Approach: Forward DP with dp[i][j] = minimum cost for months 1 through i with j machines in inventory at end of month i, iterating over previous inventory level prev_j.
Assessment: The student uses a forward formulation (months 1..i) rather than the reference's backward formulation (months i..n), but this is entirely equivalent and correct.
Strengths:
- The transition correctly computes production as
p_i = j - prev_j + d_iand enforces non-negativity via the constraintprev_j ≤ j + d_i - Overtime cost
c · max(0, p_i - m)is correct - Holding cost
h(j)is correctly applied, with the note thath(0) = 0 - Base case
dp[0][0] = 0(start with no inventory) is correct - Final answer
min over j of dp[n][j]correctly allows any ending inventory level - Time complexity O(n·D²) is correct and matches the reference
- Space optimization note (rolling array) is a nice addition
Rubric Breakdown:
- DP formulation (states, base cases): 5/5
- Recurrence relation: 7/7
- Correctness argument: 4/4
- Time complexity analysis: 4/4
---
Problem 3: Optimal Matching of Sequences [30 Points]
Part 1
Score: 3/3
Assessment: The proof is correct and well-structured. For (1,1): the student shows that if (1,j) ∈ M with j > 1 and (i,1) ∈ M with i > 1, we get a crossing between (1,j) and (i,1). For (m,n): symmetric argument showing that (i,n) and (m,j) with i < m and j < n produces a crossing. The case analysis is careful and complete.
Part 2
Score: 7/7
Assessment: The recursive definition is correct:
- f(1,n) = 1, f(m,1) = 1
- f(m,n) = f(m−1,n) + f(m,n−1) + f(m−1,n−1)
The explanation is exceptionally thorough — more detailed than the reference solution. The student: 1. Establishes the connection to monotone paths (Delannoy numbers) 2. Identifies the three mutually exclusive cases based on removing (m,n) 3. Correctly argues why extra pairs in both row m and column n simultaneously is impossible (it would create a crossing) 4. Proves each case maps bijectively to the corresponding sub-problem, verifying that left-coverage and right-coverage are maintained
Part 3
Score: 20/20
Assessment: The DP is correct and matches the reference solution:
OPT[i][j] = (x_i - y_j)² + min(OPT[i-1][j], OPT[i][j-1], OPT[i-1][j-1])- Base cases correctly handle the single-row and single-column matchings
- Correctness proof by induction on i+j is rigorous, correctly referencing Parts 1 and 2 for the structural argument
- Time complexity O(mn) and space complexity O(mn) are correct
Rubric Breakdown:
- Part 1 proof: 3/3
- Part 2 recursive definition: 3/3
- Part 2 explanation/justification: 4/4
- Part 3 DP formulation: 8/8
- Part 3 correctness proof: 8/8
- Part 3 complexity analysis: 4/4
---
Problem 4: Edit Distance & LCS [35 Points]
Part 1
Score: 3/3
Assessment: The pseudocode is correct. It properly handles:
- Base cases:
D[i][0] = i·c_d(delete all of X),D[0][j] = j·c_i(insert all of Y) - Main recurrence with the three operations (deletion, insertion, substitution/match)
- Returns
D[m][n]
Matches the reference solution exactly.
Part 2
Score: 12/12
Assessment: The student's proof takes a different but equally valid approach compared to the reference solution.
≤ direction: Shown in two ways — both via the recurrence (the third term in the min gives D(x,y) + 0 ≤ D(x,y)) and via constructing an edit sequence (append copy(a→a) to an optimal sequence for (x,y)). Both are correct.
≥ direction: The student uses the triangle inequality for edit distance, which is a valid and elegant approach:
- D(x, ya) + c_d ≥ D(x, y): from D(x,y) ≤ D(x,ya) + D(ya,y) and D(ya,y) ≤ c_d
- D(xa, y) + c_i ≥ D(x, y): from D(x,y) ≤ D(x,xa) + D(xa,y) and D(x,xa) ≤ c_i
The triangle inequality D(A,C) ≤ D(A,B) + D(B,C) holds for edit distance because edit sequences can be composed. This is correctly applied. Since both non-copy terms in the recurrence (*) are ≥ D(x,y), and the copy term equals D(x,y), the minimum is exactly D(x,y).
This is actually a cleaner proof than the reference solution's case analysis.
Part 3
Score: 20/20
Assessment: The student correctly states L(X,Y) = (|X| + |Y| - EDITDISTANCE(X, Y, 1, 1, ∞)) / 2 and provides a rigorous proof.
The proof approach differs from the reference (which uses induction on the recursive structure) but is equally valid — the student uses a direct counting argument:
- Upper bound (D ≤ m+n−2L): Construct an edit sequence from the LCS — delete m−L characters not in the LCS, insert n−L characters not in the LCS. Total cost = m+n−2L.
- Lower bound (D ≥ m+n−2L): Any valid edit sequence (no substitutions since c_s = ∞) keeps k characters of X that form a common subsequence with Y, so k ≤ L. The cost is (m−k) + (n−k) = m+n−2k ≥ m+n−2L.
This is a clean, direct proof. The parity observation at the end is a nice touch.
Rubric Breakdown:
- Part 1 pseudocode: 3/3
- Part 2 ≤ direction: 4/4
- Part 2 ≥ direction: 8/8
- Part 3 formula: 4/4
- Part 3 proof (upper bound): 8/8
- Part 3 proof (lower bound): 8/8
---
Overall Summary
| Problem | Score | Notes |
|---|---|---|
| 1.1 - String Merging (basic) | 5/5 | Correct DP, clean pseudocode |
| 1.2 - String Merging (with errors) | 10/10 | Correct extension to Hamming distance |
| 2 - Inventory Planning | 20/20 | Correct forward DP formulation, O(nD²) |
| 3.1 - Matching contains (1,1),(m,n) | 3/3 | Clean contradiction proof |
| 3.2 - Recursive definition of f | 7/7 | Thorough case analysis, Delannoy numbers |
| 3.3 - Optimal matching algorithm | 20/20 | Correct DP with inductive proof |
| 4.1 - Edit distance pseudocode | 3/3 | Standard correct implementation |
| 4.2 - D(xa,ya) = D(x,y) | 12/12 | Elegant triangle inequality proof |
| 4.3 - LCS via edit distance | 20/20 | Clean direct counting argument |
| Total | 100/100 |
General Comments:
This is an outstanding submission. Every problem is solved correctly with clear, rigorous justifications. Several proofs demonstrate insight beyond the reference solution — notably the triangle inequality approach for Problem 4 Part 2 and the thorough case analysis with bijection argument for Problem 3 Part 2. The student shows strong command of dynamic programming formulations, proof techniques (contradiction, induction, triangle inequality), and complexity analysis. The pseudocode is consistently clear and implementable. No errors or gaps were found in any part of the submission.
Total: 100/100
diff --git a/writeup.md b/writeup.md
index 5119c04..1545869 100644
--- a/writeup.md
+++ b/writeup.md
@@ -2,8 +2,347 @@
## Student Responses
-<!-- Write your solutions below. Add sections for each problem as needed. -->
-<!-- Use proper mathematical notation where applicable. -->
+---
-[Your solutions here]
+## Problem 1: String Merging
+### Part 1 [5 points]
+
+**Algorithm:**
+
+Define `dp[i][j]` = `true` if `Z[1..i+j]` can be formed by interleaving `X[1..i]` and `Y[1..j]`.
+
+**Base case:** `dp[0][0] = true`
+
+**Recurrence:**
+```
+dp[i][j] = (i > 0 AND dp[i-1][j] AND Z[i+j] == X[i])
+ OR (j > 0 AND dp[i][j-1] AND Z[i+j] == Y[j])
+```
+
+**Return:** `dp[m][n]`
+
+**Pseudocode:**
+```
+MERGE_CHECK(X, Y, Z):
+ m = |X|, n = |Y|
+ dp[0][0] = true
+ dp[i][j] = false for all other (i, j)
+ for i = 0 to m:
+ for j = 0 to n:
+ if (i, j) == (0, 0): continue
+ if i > 0 and dp[i-1][j] and Z[i+j] == X[i]:
+ dp[i][j] = true
+ if j > 0 and dp[i][j-1] and Z[i+j] == Y[j]:
+ dp[i][j] = true
+ return dp[m][n]
+```
+
+**Correctness:** `dp[i][j]` is `true` iff there is a way to interleave `X[1..i]` and `Y[1..j]` to form `Z[1..i+j]`. At each cell, we consider whether the last character came from `X` (moving from `dp[i-1][j]`) or from `Y` (moving from `dp[i][j-1]`).
+
+**Time complexity:** O(mn), since we fill an (m+1) × (n+1) table with O(1) work per cell.
+
+---
+
+### Part 2 [10 points]
+
+**Algorithm:**
+
+Define `dp[i][j]` = minimum Hamming distance between `Z[1..i+j]` and any string `Z'[1..i+j]` that is a valid interleaving of `X[1..i]` and `Y[1..j]`.
+
+**Base case:** `dp[0][0] = 0`
+
+**Recurrence:** When building Z' at position i+j, the character comes from either X[i] (moving from dp[i-1][j]) or Y[j] (moving from dp[i][j-1]):
+```
+dp[i][j] = min(
+ dp[i-1][j] + (X[i] != Z[i+j] ? 1 : 0), // take X[i] as next char
+ dp[i][j-1] + (Y[j] != Z[i+j] ? 1 : 0) // take Y[j] as next char
+)
+```
+with the convention that `dp[i][j] = ∞` if neither predecessor is valid (i.e., when i = j = 0 and both terms don't apply).
+
+**Pseudocode:**
+```
+MERGE_WITH_ERRORS(X, Y, Z, K):
+ m = |X|, n = |Y|
+ Initialize dp[i][j] = ∞ for all i, j
+ dp[0][0] = 0
+ for i = 0 to m:
+ for j = 0 to n:
+ if i == 0 and j == 0: continue
+ cost_x = ∞ if i == 0 else dp[i-1][j] + (X[i] != Z[i+j] ? 1 : 0)
+ cost_y = ∞ if j == 0 else dp[i][j-1] + (Y[j] != Z[i+j] ? 1 : 0)
+ dp[i][j] = min(cost_x, cost_y)
+ return dp[m][n] <= K
+```
+
+**Correctness:** `dp[i][j]` gives the minimum number of positions where Z' must differ from Z to produce Z[1..i+j] as an interleaving of X[1..i] and Y[1..j]. At each position i+j of Z', we choose the character from either X[i] or Y[j], incurring a cost of 1 if that character differs from Z[i+j]. We return `true` iff this minimum cost is ≤ K.
+
+**Time complexity:** O(mn), using the same (m+1) × (n+1) table with O(1) work per cell.
+
+---
+
+## Problem 2: Inventory Planning [20 Points]
+
+**DP Formulation:**
+
+Let `dp[i][j]` = minimum total cost for months 1 through i, with j machines left in inventory at the end of month i.
+
+**State space:** i ∈ {0, ..., n}, j ∈ {0, ..., D} where D = Σ dᵢ is total demand.
+
+**Base case:** `dp[0][0] = 0`, `dp[0][j] = ∞` for j > 0 (start with no inventory).
+
+**Transition:** For each month i with demand dᵢ and ending inventory j, the starting inventory was prev_j (end of month i−1), so production is:
+```
+p_i = j - prev_j + d_i
+```
+This requires p_i ≥ 0, i.e., prev_j ≤ j + dᵢ.
+
+The holding cost is h(j) for j ≥ 1, and 0 for j = 0. Define h(0) = 0.
+
+```
+dp[i][j] = min over 0 ≤ prev_j ≤ min(D, j + d_i):
+ dp[i-1][prev_j] + c · max(0, (j - prev_j + d_i) - m) + h(j)
+```
+
+**Algorithm:**
+```
+INVENTORY_PLAN(d[1..n], m, c, h[1..D]):
+ D = sum of d[i]
+ Initialize dp[0][0] = 0, dp[0][j] = ∞ for j > 0
+ for i = 1 to n:
+ for j = 0 to D:
+ dp[i][j] = ∞
+ for prev_j = 0 to min(D, j + d[i]):
+ p_i = j - prev_j + d[i] // production this month
+ overtime = c * max(0, p_i - m) // overtime labor cost
+ holding = h(j) // holding cost (h(0) = 0)
+ dp[i][j] = min(dp[i][j], dp[i-1][prev_j] + overtime + holding)
+ return min over j of dp[n][j]
+```
+
+**Correctness:** The DP correctly accounts for all costs. In each month i, given prev_j units of inventory carried over, producing p_i = j - prev_j + dᵢ ≥ 0 machines fulfills demand dᵢ and leaves j units in inventory. Overtime cost applies when p_i > m. The DP considers all reachable states and takes the minimum cost.
+
+**Time complexity:** O(n · D²), since for each of the n months we iterate over O(D) values of j and O(D) values of prev_j.
+
+**Space complexity:** O(D) using a rolling array (only need previous row).
+
+Since n and D are the problem parameters and this running time is polynomial in both, the algorithm satisfies the requirement.
+
+---
+
+## Problem 3: Optimal Matching of Sequences [30 Points]
+
+### Part 1 [3 points]
+
+**Claim:** Every matching M for (X, Y) contains the pairs (1, 1) and (m, n).
+
+**Proof that (1, 1) ∈ M:**
+
+By left-coverage, ∃j ≥ 1 such that (1, j) ∈ M.
+By right-coverage, ∃i ≥ 1 such that (i, 1) ∈ M.
+
+If i = 1, then (1, 1) ∈ M and we are done.
+
+Suppose i > 1 (so i' := i > 1). We have (1, j) ∈ M and (i', 1) ∈ M. If j > 1, then taking i = 1 < i' and j_small = 1 < j_large = j, we have (1, j_large) = (1, j) ∈ M and (i', j_small) = (i', 1) ∈ M, which violates the no-crossing condition. Therefore j = 1, giving (1, 1) ∈ M. ∎
+
+**Proof that (m, n) ∈ M:**
+
+By left-coverage, ∃j ≤ n such that (m, j) ∈ M.
+By right-coverage, ∃i ≤ m such that (i, n) ∈ M.
+
+If j = n, then (m, n) ∈ M and we are done.
+
+Suppose j < n. If i = m, then (m, n) ∈ M and we are done. Suppose i < m. Then we have (i, n) ∈ M and (m, j) ∈ M with i < m and j < n. Taking i_small = i < i_large = m and j_small = j < j_large = n, we have (i_small, j_large) = (i, n) ∈ M and (i_large, j_small) = (m, j) ∈ M, which violates no-crossing. Therefore i = m and (m, n) ∈ M. ∎
+
+---
+
+### Part 2 [7 points]
+
+**Recursive definition:**
+
+```
+f(1, n) = 1 for all n ≥ 1
+f(m, 1) = 1 for all m ≥ 1
+f(m, n) = f(m-1, n) + f(m, n-1) + f(m-1, n-1) for m, n ≥ 2
+```
+
+**Explanation:**
+
+First, we observe the no-crossing condition implies that the pairs of any matching, sorted by first index, must have non-decreasing second indices. (If (i₁, j₁) and (i₂, j₂) are in M with i₁ < i₂ but j₁ > j₂, this is precisely a crossing.) Therefore every matching M corresponds to a **monotone path** of cells from (1,1) to (m,n) in the grid [m] × [n], where consecutive cells in the path differ by (+1,0), (0,+1), or (+1,+1). The pairs in M are exactly the cells visited by the path.
+
+The left-coverage, right-coverage, and no-crossing properties are equivalent to this path visiting every row 1…m (path starts at row 1 and ends at row m, visiting each row at least once) and every column 1…n (similarly).
+
+**Base cases:** For m = 1 (only one row), the path must visit all n columns, so it moves right at each step: the unique path is (1,1), (1,2), ..., (1,n), giving f(1,n) = 1. Symmetrically, f(m,1) = 1.
+
+**Recursive case:** Consider a matching M for [m] × [n]. By Part 1, (m,n) ∈ M. We show: it is impossible to have both (m, j) ∈ M with j < n and (i, n) ∈ M with i < m simultaneously, because that would give a crossing (with i_small = i, i_large = m, j_small = j, j_large = n). Therefore exactly one of three cases holds:
+
+- **Case A** — No extra pairs in row m or column n: removing (m,n) yields a valid matching for [m−1] × [n−1].
+ - Left-coverage: row m is covered by (m,n) in the original; rows 1..m−1 had no pairs in column n (by case assumption), so all their pairs remain and they are still covered.
+ - Right-coverage: column n is covered by (m,n); columns 1..n−1 had no pairs in row m (by case assumption), so they remain covered.
+ - These matchings are in bijection with matchings for [m−1] × [n−1]: count = f(m−1, n−1).
+
+- **Case B** — Extra pairs in row m (some (m, j) ∈ M with j < n), but no extra pairs in column n: removing (m,n) gives a valid matching for [m] × [n−1].
+ - Right-coverage for [n−1]: each column j < n must be covered; column n has no pairs other than (m,n) (case assumption), so column n−1 must be covered in M by some i < m, which remains. ✓
+ - Left-coverage for row m: (m, j) ∈ M for some j < n, so row m is still covered after removing (m,n). ✓
+ - These matchings are in bijection with matchings for [m] × [n−1]: count = f(m, n−1).
+
+- **Case C** — Extra pairs in column n (some (i, n) ∈ M with i < m), but no extra pairs in row m: by symmetry, removing (m,n) gives a valid matching for [m−1] × [n]: count = f(m−1, n).
+
+Since the three cases are mutually exclusive and exhaustive:
+$$f(m,n) = f(m-1,n-1) + f(m,n-1) + f(m-1,n)$$
+
+(These are the **Delannoy numbers**, counting monotone paths with steps right, down, or diagonal.)
+
+---
+
+### Part 3 [20 points]
+
+**DP Algorithm:**
+
+Let `OPT[i][j]` = minimum cost of a matching for (X[1..i], Y[1..j]).
+
+**Base cases:**
+```
+OPT[1][1] = (x₁ - y₁)²
+OPT[1][j] = OPT[1][j-1] + (x₁ - yⱼ)² for j ≥ 2
+OPT[i][1] = OPT[i-1][1] + (xᵢ - y₁)² for i ≥ 2
+```
+
+**Recurrence:**
+```
+OPT[i][j] = (xᵢ - yⱼ)² + min(OPT[i-1][j], OPT[i][j-1], OPT[i-1][j-1])
+```
+
+**Answer:** OPT[m][n]. To recover the actual matching, backtrack through the table.
+
+**Pseudocode:**
+```
+OPTIMAL_MATCH(X[1..m], Y[1..n]):
+ OPT[1][1] = (x₁ - y₁)²
+ for j = 2 to n: OPT[1][j] = OPT[1][j-1] + (x₁ - yⱼ)²
+ for i = 2 to m: OPT[i][1] = OPT[i-1][1] + (xᵢ - y₁)²
+ for i = 2 to m:
+ for j = 2 to n:
+ OPT[i][j] = (xᵢ - yⱼ)² + min(OPT[i-1][j], OPT[i-1][j-1], OPT[i][j-1])
+ return OPT[m][n]
+```
+
+**Correctness:**
+
+By Part 1 (generalized to any sub-rectangle), every matching for (X[1..i], Y[1..j]) must contain the pair (i, j). Combined with the structure from Part 2, any matching for [i] × [j] is a monotone path ending at (i,j), whose second-to-last step was one of (i−1,j), (i,j−1), or (i−1,j−1).
+
+We prove correctness by induction on i+j:
+
+*Base case:* OPT[1][1] = (x₁−y₁)² is correct: the only valid matching for [1]×[1] is {(1,1)}.
+
+*Inductive step:* Assume OPT[i'][j'] is correct for all i'+j' < i+j. Any matching M for [i]×[j] contains (i,j) and falls into one of three cases from Part 2:
+- Restriction to [i−1]×[j] is a valid matching with cost OPT[i−1][j] (by IH)
+- Restriction to [i]×[j−1] is a valid matching with cost OPT[i][j−1] (by IH)
+- Restriction to [i−1]×[j−1] is a valid matching with cost OPT[i−1][j−1] (by IH)
+
+The cost of M equals (xᵢ−yⱼ)² plus the cost of the restriction. Therefore:
+$$\text{OPT}[i][j] = (x_i - y_j)^2 + \min(\text{OPT}[i-1][j],\ \text{OPT}[i][j-1],\ \text{OPT}[i-1][j-1])$$
+
+is correct by optimal substructure. ✓
+
+**Time complexity:** O(mn) — filling an m×n table with O(1) work per cell.
+
+**Space complexity:** O(mn) to store the table (reducible to O(min(m,n)) using two rows if we only need the cost, not the full matching).
+
+---
+
+## Problem 4: Edit Distance & LCS [35 Points]
+
+### Part 1 [3 points]
+
+```
+EDITDISTANCE(X, Y, cᵢ, c_d, c_s):
+ m = |X|, n = |Y|
+ Create table D[0..m][0..n]
+ D[0][0] = 0
+ for i = 1 to m:
+ D[i][0] = i * c_d // delete X[1..i] entirely
+ for j = 1 to n:
+ D[0][j] = j * cᵢ // insert Y[1..j] from empty string
+ for i = 1 to m:
+ for j = 1 to n:
+ sub_cost = 0 if X[i] == Y[j] else c_s
+ D[i][j] = min(
+ D[i-1][j] + c_d, // delete X[i]
+ D[i][j-1] + cᵢ, // insert Y[j]
+ D[i-1][j-1] + sub_cost // match or substitute
+ )
+ return D[m][n]
+```
+
+---
+
+### Part 2 [12 points]
+
+**Claim:** D(xa, ya) = D(x, y) for all strings x, y and every letter a.
+
+**Proof:**
+
+We use the edit distance recurrence, which follows directly from the definition D(X,Y) = min{cost(W) | edit sequence W for (X,Y)}: for any strings X[1..m] and Y[1..n], an optimal edit sequence for transforming X to Y must either delete X[m], insert Y[n], or match/substitute X[m] with Y[n] as its last operation, giving:
+$$D(X[1..m], Y[1..n]) = \min\bigl(D(X[1..m{-}1], Y[1..n]) + c_d,\ D(X[1..m], Y[1..n{-}1]) + c_i,\ D(X[1..m{-}1], Y[1..n{-}1]) + \text{cost}(X[m], Y[n])\bigr)$$
+
+Applying this to (xa, ya) where both strings end in the same letter a (so the substitution cost is 0):
+$$D(xa, ya) = \min\bigl(D(x, ya) + c_d,\ D(xa, y) + c_i,\ D(x, y) + 0\bigr) \tag{*}$$
+
+We now establish two bounds.
+
+**Step 1:** D(xa, ya) ≤ D(x, y).
+
+This follows immediately from (*): D(xa, ya) = min{..., D(x,y),...} ≤ D(x, y).
+
+Alternatively: take any optimal edit sequence W for (x, y). Appending the operation "match a with a" (cost 0) to W yields a valid edit sequence for (xa, ya) of cost D(x, y). Hence D(xa, ya) ≤ D(x, y).
+
+**Step 2:** D(xa, ya) ≥ D(x, y), i.e., the minimum in (*) equals D(x, y).
+
+We show each of the first two terms in (*) is ≥ D(x, y):
+
+- **D(x, ya) + c_d ≥ D(x, y):** By the triangle inequality for edit distance, D(x, y) ≤ D(x, ya) + D(ya, y). Since D(ya, y) ≤ c_d (delete the last character a from ya), we get D(x, y) ≤ D(x, ya) + c_d, i.e., D(x, ya) + c_d ≥ D(x, y). ✓
+
+- **D(xa, y) + cᵢ ≥ D(x, y):** By the triangle inequality, D(x, y) ≤ D(x, xa) + D(xa, y). Since D(x, xa) ≤ cᵢ (insert a at the end of x), we get D(x, y) ≤ cᵢ + D(xa, y), i.e., D(xa, y) + cᵢ ≥ D(x, y). ✓
+
+Since both D(x, ya) + c_d ≥ D(x, y) and D(xa, y) + cᵢ ≥ D(x, y), and the third term equals D(x, y), the minimum in (*) is exactly D(x, y).
+
+**Conclusion:** Combining Steps 1 and 2:
+$$D(xa, ya) = D(x, y) \qquad \square$$
+
+---
+
+### Part 3 [20 points]
+
+**One-line computation:**
+```
+L(X, Y) = (|X| + |Y| - EDITDISTANCE(X, Y, 1, 1, ∞)) / 2
+```
+
+**Explanation:** We call `EDITDISTANCE(X, Y, 1, 1, ∞)` with insertion cost cᵢ = 1, deletion cost c_d = 1, and substitution cost c_s = ∞.
+
+**Proof of correctness:**
+
+We prove that D(X, Y, 1, 1, ∞) = |X| + |Y| − 2·L(X, Y), which immediately gives the formula.
+
+Let m = |X|, n = |Y|, L = L(X, Y).
+
+**Setting c_s = ∞ means substitutions are never used.** Any edit sequence that uses a substitution has infinite cost, so no optimal sequence uses substitutions (since there always exists a finite-cost sequence using only insertions and deletions).
+
+**Upper bound:** D(X, Y, 1, 1, ∞) ≤ m + n − 2L.
+
+Let (i₁, j₁), ..., (i_L, j_L) be the index pairs of a longest common subsequence (so X[i_k] = Y[j_k] for each k). Construct an edit sequence:
+- Delete the m − L characters of X not in the LCS (cost m − L).
+- Insert the n − L characters of Y not in the LCS in the correct positions (cost n − L).
+The result is Y, with total cost (m − L) + (n − L) = m + n − 2L.
+
+**Lower bound:** D(X, Y, 1, 1, ∞) ≥ m + n − 2L.
+
+Consider any edit sequence W for (X, Y) using only insertions and deletions. We can analyze W as follows: some characters of X are deleted (say m − k are deleted) and some characters of Y are inserted (say n − k are inserted). The remaining k characters of X are "kept": they must appear in Y without insertion, in the same relative order. Since these kept characters must match characters of Y exactly (substitutions are forbidden), they form a **common subsequence** of X and Y of length k. By definition of L(X, Y), we have k ≤ L(X, Y). Therefore:
+$$\text{cost}(W) = (m - k) \cdot 1 + (n - k) \cdot 1 = m + n - 2k \geq m + n - 2L$$
+
+**Combining:** D(X, Y, 1, 1, ∞) = m + n − 2L(X, Y), so:
+$$L(X, Y) = \frac{|X| + |Y| - D(X, Y, 1, 1, \infty)}{2} = \frac{|X| + |Y| - \texttt{EDITDISTANCE}(X, Y, 1, 1, \infty)}{2}$$
+
+Note that the result is always a non-negative integer since D(X,Y,1,1,∞) and |X|+|Y| have the same parity (both differ from 2L by equal amounts). ∎
Sub-Model Usage
| Model | Input | Output | Cache Read | Cost |
|---|---|---|---|---|
| claude-sonnet-4-6 | 6 | 36.3K | 204.6K | $1.33 |
BSCS Bench