Agent Work: Homework1
Claude Sonnet 4.6 · COMP 382: Reasoning about Algorithms
COMP 382: Reasoning about Algorithms
Homework 1
Total Points: 50
---
Problem 1: Radix Trees [20 points]
This problem is based on Problem 12.2 from CLRS. A *radix tree* is a data structure for the efficient representation of sets of strings. The following radix tree data structure stores the bit strings 1011, 10, 011, 100, and 0.
[root]
/ \
0 1
| |
1 0
/ /|\
1 0 1 (10)
| | |
(011) (100) 1
|
(1011)Figure 12.5: A radix tree storing the bit strings 1011, 10, 011, 100, and 0. To determine each node's key, traverse the simple path from the root to that node. There is no need, therefore, to store the keys in the nodes. The keys appear here for illustrative purposes only. Keys corresponding to blue nodes are not in the tree. Such nodes are present only to establish a path to other nodes.
Part 1 [3 points]
Define formally the radix tree data structure.
Part 2 [10 points]
Let T be a radix tree and w be a bit string. Write the pseudo-code of RADIX-TREE-SEARCH(T, w) and RADIX-TREE-INSERT(T, w) so that each of the functions takes Θ(|w|) time. Justify that your algorithms indeed take Θ(|w|) time.
Part 3 [7 points]
Given two strings a = a₀a₁...aₚ and b = b₀b₁...bᵧ, where each aᵢ and bⱼ is in some ordered alphabet, we say that string a is *lexicographically less than* string b if either:
(i) there exists an integer j, where 0 ≤ j ≤ min{p, q}, such that aᵢ = bᵢ for all i = 0, 1, ..., j − 1 and aⱼ < bⱼ, or
(ii) p < q and aᵢ = bᵢ for all i = 0, 1, ..., p.
Let S be a set of distinct bit strings whose lengths sum to n (i.e., the sum of the lengths of all strings in S equals n). Show how to use a radix tree to sort S lexicographically in Θ(n) time. Describe the algorithm idea in English, but please make sure your description is very clear. Justify that your algorithm indeed takes Θ(n) time.
---
Problem 2: Multisets [20 Points]
Let K be a set of elements. A *(finite) multiset* or *bag* over K is a collection of elements from K, each of which can occur any finite number of times. The *count* of an element a ∈ K in a multiset S is the number of occurrences of a in S. The count of an element that does not appear in the multiset is equal to 0. For example, the multiset S = {a, b, b, b, c, c, d} contains the elements a, b, c, d with counts 1, 3, 2, 1 respectively. The count of e in S is 0.
Part 1 [10 points]
Design a data structure that can represent finite multisets and can support the following operations with the specified time complexities.
| Operation | Description | Time (worst-case) |
|---|---|---|
| CREATE(S) | Set S to be the empty multiset | O(1) |
| INSERT(S, k) | Insert the element k in the multiset S | O(log n) |
| GETCOUNT(S, k) | Return the count of k in S | O(log n) |
| DELETE(S, k) | Delete one occurrence of k from S | O(log n) |
| GETCOUNTS≥(S, k) | Return the total count of elements ≥ k in S | O(log n) |
Assume that the elements are linearly ordered. In the table above, n is the number of unique elements in the multiset.
Part 2 [10 points]
Adapt the multiset data structure to support all operations from Part 1, as well as the following ones:
| Operation | Description | Time (worst-case) |
|---|---|---|
| WITHCOUNT(S, c) | Return the number of elements with count c in S | O(log n) |
| FREQUENT(S, c) | Return the number of elements with count ≥ c in S | O(log n) |
In the table above, n is the number of unique elements in the multiset.
Notes for both parts: Try to make your data structure as efficient as possible in terms of information stored. Describe the design of the data structure, and briefly describe in English how you would implement each of the operations so that it has the prescribed worst-case time complexity.
You can assume a known fact for AVL trees (in case you use AVL trees for your implementation). There is an algorithm for performing node deletion in an AVL tree with running time O(log n), where n is the size of the tree.
---
Problem 3: AVL Trees: Height & Size [10 Points]
In class, we discuss the main ideas that are needed to prove that an AVL tree with n nodes has height O(log n). Give the best upper bound you can find on the height of an AVL tree with n nodes. The upper bound that you give should be of the form c · log₂(n + k) + b, where c, k, b are constants. Provide a careful and very thorough justification of every step.
Note: Recall that for AVL trees we use the following definition: The *height* of a tree is the number of nodes in the longest path from the root to a leaf.
# COMP382 Homework 1
## Student Responses
<!-- Write your solutions below. Add sections for each problem as needed. -->
<!-- Use proper mathematical notation where applicable. -->
[Your solutions here]
COMP382 Homework 1
Student Responses
---
Problem 1: Radix Trees [20 points]
Part 1: Formal Definition [3 points]
A radix tree over the binary alphabet {0, 1} is a rooted binary tree T with the following structure:
- T.root is the root node, representing the empty string ε.
- Each node x has the following fields:
- x.left: pointer to the left child of x (representing extending the path with bit '0'), or NIL if no such child exists. - x.right: pointer to the right child of x (representing extending the path with bit '1'), or NIL if no such child exists. - x.mark: a boolean flag. x.mark = TRUE if and only if the bit string spelled out by the root-to-x path belongs to the stored set.
The key (bit string) of a node x is defined implicitly as the sequence of bits read along the unique simple path from the root to x: each left-child edge contributes '0' and each right-child edge contributes '1'. The root's key is the empty string ε.
A radix tree storing a set S of bit strings contains exactly those internal (unmarked) nodes that are needed to establish paths to marked nodes, plus all marked nodes corresponding to strings in S.
---
Part 2: RADIX-TREE-SEARCH and RADIX-TREE-INSERT [10 points]
Pseudocode:
RADIX-TREE-SEARCH(T, w)
// w = w[0]w[1]...w[m-1] is a bit string of length m = |w|
x = T.root
for i = 0 to |w| - 1:
if w[i] == 0:
if x.left == NIL:
return NIL // prefix not in tree → w not stored
x = x.left
else: // w[i] == 1
if x.right == NIL:
return NIL
x = x.right
if x.mark == TRUE:
return x // w is stored in the tree
else:
return NIL // w is a prefix path but not marked
RADIX-TREE-INSERT(T, w)
// Insert bit string w into the radix tree T
x = T.root
for i = 0 to |w| - 1:
if w[i] == 0:
if x.left == NIL:
x.left = ALLOCATE-NODE()
x.left.left = NIL
x.left.right = NIL
x.left.mark = FALSE
x = x.left
else: // w[i] == 1
if x.right == NIL:
x.right = ALLOCATE-NODE()
x.right.left = NIL
x.right.right = NIL
x.right.mark = FALSE
x = x.right
x.mark = TRUE // mark the terminal node for wTime complexity justification:
In both algorithms, the only loop runs exactly |w| iterations (once per bit of w). Each iteration performs a constant number of operations: a comparison, a pointer follow (or pointer assignment with node allocation), and an index increment. No other loops or recursive calls are made. Therefore:
- RADIX-TREE-SEARCH: performs exactly |w| iterations of the loop, each taking Θ(1) time, so the total time is Θ(|w|).
- RADIX-TREE-INSERT: performs exactly |w| iterations of the loop, each taking Θ(1) time (node allocation is O(1)), so the total time is Θ(|w|).
Both algorithms run in Θ(|w|) time. ∎
---
Part 3: Lexicographic Sorting in Θ(n) Time [7 points]
Algorithm:
1. Build the radix tree. Insert every string in S into a radix tree T using RADIX-TREE-INSERT.
2. Traverse the tree in preorder (DFS, left before right). Perform a depth-first traversal of T, always visiting the left subtree (bit '0') before the right subtree (bit '1'). Maintain a current path string (the sequence of bits on the root-to-current-node path). Whenever we visit a node x with x.mark = TRUE, output the current path string as the next element in the sorted order.
Why the output is in lexicographic order:
Consider any two distinct strings a, b ∈ S. We claim a is output before b if and only if a <_lex b.
- Case (ii): a is a proper prefix of b. Since a is shorter, the node for a lies strictly above the node for b in the tree. In a preorder traversal, a parent is visited before its descendants, so a is output before b. This matches the lexicographic order (condition (ii): p < q and all a_i = b_i).
- Case (i): a and b differ at position j (first differing bit). The path to a and the path to b agree for the first j steps and diverge at step j. At the divergence point, a goes left (a_j = 0) and b goes right (b_j = 1), or vice versa. Since we always visit the left subtree before the right subtree, strings with a '0' at position j are output before strings with a '1' at position j. Hence if a_j = 0 < 1 = b_j, then a is output before b, matching a <_lex b.
Time complexity:
- *Step 1 (building the tree):* Each string w ∈ S takes Θ(|w|) to insert. The total insertion time is Σ_{w ∈ S} Θ(|w|) = Θ(n), since the sum of all string lengths is n.
- *Step 2 (DFS traversal):* The radix tree has at most O(n) nodes. To see this: each node (except the root) corresponds to one edge on the root-to-node path, and each such edge is "used" by at least one string in S. More precisely, the tree has at most n + 1 nodes total (at most one new node is created per bit inserted, and n total bits are inserted). The DFS visits each node exactly once, doing O(1) work per node (output the path string if marked, follow child pointers). Thus the traversal takes O(n) time.
- Total time: Θ(n) + O(n) = Θ(n). ∎
---
Problem 2: Multisets [20 points]
Part 1: Multiset Data Structure [10 points]
Data structure design:
Use an augmented AVL tree T₁ where each node x stores:
- x.key: the unique element k ∈ K
- x.count: the multiplicity of k in the multiset (a positive integer)
- x.sub_total: the sum of counts of all elements in x's subtree, i.e., x.count + (x.left.sub_total if x.left else 0) + (x.right.sub_total if x.right else 0)
Nodes are ordered by x.key in the standard BST order (using the linear order on K).
Operations:
CREATE(S): Set T₁ = NIL (empty tree). Time: O(1). ✓
INSERT(S, k): Search for k in T₁ (O(log n) BST search).
- If found: increment x.count by 1. Walk back up to the root updating x.sub_total for each ancestor. The AVL tree has height O(log n), so this is O(log n).
- If not found: insert a new node with key k, count 1, sub_total 1 using the standard AVL insert. Rebalancing (rotations) updates sub_total for O(1) rotated nodes. Walking up the insertion path updates sub_total for O(log n) ancestors.
Total time: O(log n). ✓
GETCOUNT(S, k): Search for k in T₁. If found, return x.count; otherwise return 0. Time: O(log n) BST search. ✓
DELETE(S, k): Search for k in T₁. If not found, do nothing (or signal an error). If found with x.count > 1: decrement x.count by 1 and update sub_total for ancestors. O(log n). If found with x.count = 1: delete the node using the standard AVL deletion (O(log n) by the stated fact). Update sub_total during the deletion path. O(log n). ✓
GETCOUNTS≥(S, k): Traverse from the root, accumulating the total count of all elements with key ≥ k:
GETCOUNTS≥(T₁, k):
total = 0
x = T₁.root
while x ≠ NIL:
if k == x.key:
total += x.count + (x.right.sub_total if x.right else 0)
return total
elif k < x.key:
// x and everything in x's right subtree have key ≥ k
total += x.count + (x.right.sub_total if x.right else 0)
x = x.left
else: // k > x.key
x = x.right
return totalThis traverses exactly one root-to-leaf path, performing O(1) work per step. Since the AVL tree has height O(log n), the total time is O(log n). ✓
Maintaining sub_total during rebalancing: For each rotation (single or double), only the rotated nodes change their sub_total, and each can be recomputed from its children in O(1). This does not increase the asymptotic complexity of any operation.
---
Part 2: Extended Multiset with WITHCOUNT and FREQUENT [10 points]
Data structure design:
Maintain the augmented AVL tree T₁ from Part 1, plus a second augmented AVL tree T₂ indexed by count values, where each node y stores:
- y.count_val: an integer c (the count value this node tracks)
- y.freq: the number of distinct elements in S that have exactly c occurrences
- y.sub_freq: the sum of freq in y's subtree (augmented, analogous to sub_total in T₁)
Nodes in T₂ are ordered by y.count_val.
Operations (new):
WITHCOUNT(S, c): Search T₂ for node with count_val = c. If found, return y.freq; otherwise return 0. Time: O(log n). (T₂ has at most n nodes since there are at most n distinct count values among n unique elements.) ✓
FREQUENT(S, c): Return the number of distinct elements with count ≥ c, using the same traversal strategy as GETCOUNTS≥ but on T₂:
FREQUENT(T₂, c):
total = 0
y = T₂.root
while y ≠ NIL:
if c == y.count_val:
total += y.freq + (y.right.sub_freq if y.right else 0)
return total
elif c < y.count_val:
total += y.freq + (y.right.sub_freq if y.right else 0)
y = y.left
else:
y = y.right
return totalTime: O(log n). ✓
Maintaining T₂ when T₁ is modified:
Whenever an element k's count in T₁ changes from c_old to c_new (by INSERT or DELETE): 1. In T₂: find node with count_val = c_old and decrement its freq by 1. If freq drops to 0, delete that node from T₂. Update sub_freq for ancestors. Time: O(log n). 2. In T₂: find node with count_val = c_new and increment its freq by 1. If no such node exists, insert a new node with freq = 1. Update sub_freq for ancestors. Time: O(log n).
For INSERT(S, k): c_old = current count of k (0 if k is new, but a new element has no T₂ entry, so we only add to c_new = 1's node). For DELETE(S, k): c_old = current count, c_new = c_old − 1 (or if c_old = 1, we remove k's T₁ node and remove from T₂'s count_val = 1 node).
All five operations from Part 1 plus the two new operations remain O(log n) since each touches O(log n) nodes in each of T₁ and T₂. ✓
Summary of operations:
| Operation | T₁ work | T₂ work | Total |
|---|---|---|---|
| CREATE(S) | O(1) | O(1) | O(1) |
| INSERT(S, k) | O(log n) | O(log n) | O(log n) |
| GETCOUNT(S, k) | O(log n) | — | O(log n) |
| DELETE(S, k) | O(log n) | O(log n) | O(log n) |
| GETCOUNTS≥(S, k) | O(log n) | — | O(log n) |
| WITHCOUNT(S, c) | — | O(log n) | O(log n) |
| FREQUENT(S, c) | — | O(log n) | O(log n) |
---
Problem 3: AVL Trees: Height & Size [10 points]
Claim: An AVL tree with n nodes has height at most
$$h \leq \frac{1}{\log_2 \varphi} \cdot \log_2(n + 1)$$
where φ = (1 + √5)/2 ≈ 1.618 is the golden ratio. This is of the form c · log₂(n + k) + b with c = 1/log₂(φ) ≈ 1.44042, k = 1, b = 0.
---
Proof:
Step 1: Define N(h) — minimum nodes in an AVL tree of height h.
Using the height convention that height = number of nodes on the longest root-to-leaf path (so a single-node tree has height 1, an empty tree has height 0):
To minimize the number of nodes in an AVL tree of height h, we make one subtree of height h − 1 (minimum) and the other of height h − 2 (minimum), since the AVL property allows subtree heights to differ by at most 1. This gives the recurrence:
$$N(0) = 0, \quad N(1) = 1, \quad N(h) = 1 + N(h-1) + N(h-2) \text{ for } h \geq 2$$
Any AVL tree of height h has at least N(h) nodes, so n ≥ N(h).
Step 2: Relate N(h) to Fibonacci numbers.
Define the Fibonacci sequence: F(1) = 1, F(2) = 1, F(k) = F(k−1) + F(k−2) for k ≥ 3.
Claim: N(h) = F(h + 2) − 1 for all h ≥ 0.
*Proof by induction:*
- Base case h = 0: N(0) = 0 and F(2) − 1 = 1 − 1 = 0. ✓
- Base case h = 1: N(1) = 1 and F(3) − 1 = 2 − 1 = 1. ✓
- Inductive step (h ≥ 2): Assume N(h−1) = F(h+1) − 1 and N(h−2) = F(h) − 1. Then:
N(h) = 1 + N(h−1) + N(h−2) = 1 + (F(h+1) − 1) + (F(h) − 1) = F(h+1) + F(h) − 1 = F(h+2) − 1. ✓
Therefore N(h) + 1 = F(h + 2).
Step 3: Lower bound on Fibonacci numbers.
Claim: F(k) ≥ φ^(k−2) for all k ≥ 2, where φ = (1 + √5)/2.
*Proof by strong induction:*
- Base case k = 2: F(2) = 1 = φ^0 = 1. ✓ (equality)
- Base case k = 3: F(3) = 2 ≥ φ^1 ≈ 1.618. ✓
- Inductive step (k ≥ 4): Assume F(k−1) ≥ φ^(k−3) and F(k−2) ≥ φ^(k−4). Then:
F(k) = F(k−1) + F(k−2) ≥ φ^(k−3) + φ^(k−4) = φ^(k−4)(φ + 1) = φ^(k−4) · φ^2 = φ^(k−2).
The key identity used is φ + 1 = φ² (the defining property of the golden ratio, since φ² − φ − 1 = 0). ✓
Step 4: Derive the upper bound on h.
Since n ≥ N(h), we have:
$$n + 1 \geq N(h) + 1 = F(h + 2) \geq \varphi^{(h+2)-2} = \varphi^h$$
Therefore:
$$\varphi^h \leq n + 1$$
Taking log base φ of both sides (log_φ is an increasing function):
$$h \leq \log_\varphi(n + 1) = \frac{\log_2(n + 1)}{\log_2 \varphi}$$
Step 5: Express in the required form.
$$\boxed{h \leq \frac{1}{\log_2 \varphi} \cdot \log_2(n + 1)}$$
with c = 1/log₂(φ) ≈ 1.44042, k = 1, b = 0.
Verification at small values:
- h = 1, n_min = 1: (1/log₂φ)·log₂(2) = 1/log₂φ ≈ 1.44 ≥ 1. ✓
- h = 2, n_min = 2: (1/log₂φ)·log₂(3) ≈ 1.44·1.585 ≈ 2.28 ≥ 2. ✓
- h = 3, n_min = 4: (1/log₂φ)·log₂(5) ≈ 1.44·2.322 ≈ 3.34 ≥ 3. ✓
- h = 4, n_min = 7: (1/log₂φ)·log₂(8) ≈ 1.44·3 = 4.32 ≥ 4. ✓
Why the constant c is tight:
As h → ∞, the minimum-node AVL tree satisfies N(h) ≈ φ^h · (φ²/√5) − 1, so n ≈ φ^h · 1.171. Taking log: h ≈ log_φ(n) − log_φ(1.171). This shows that h grows as exactly (1/log₂φ)·log₂(n) asymptotically, so the coefficient c = 1/log₂φ cannot be improved. ∎
COMP 382 Homework 1 — Grading Report
---
Problem 1: Radix Trees [20 points]
Score: 20/20 points
Assessment:
Approach: The student provides a clean, well-structured solution using a mark boolean flag (instead of the reference's grey/black node terminology) and left/right child pointers. This is a completely valid alternative formalization.
Strengths:
- Part 1 definition is precise and formally complete — nodes have
left,right,markfields; keys are implicitly defined by root-to-node paths; edge semantics (0 = left, 1 = right) are clearly stated. - Part 2 pseudocode is clear, correct, and well-commented. Both
SEARCHandINSERTcorrectly handle all cases (missing children, mark flag). - Part 3 provides a particularly thorough correctness argument, covering both cases of the lexicographic ordering definition (prefix case and first-differing-bit case) with explicit reference back to the problem's formal definition.
- Time complexity arguments are rigorous throughout.
Errors/Gaps:
- None of substance.
Detailed Feedback:
The student's formalization in Part 1 using x.mark (boolean) rather than grey/black node coloring is an equivalent and arguably cleaner representation. The pseudocode in Part 2 is essentially identical to the reference in logic, differing only in notation (mark vs. grey/black). The time complexity justifications are correct and concise. Part 3 is excellent — the correctness argument explicitly addresses both conditions (i) and (ii) of the lexicographic ordering definition, which is more thorough than the reference. The bound on tree size (at most n + 1 nodes) is correctly argued.
Rubric Breakdown:
- Part 1 (Formal Definition): 3/3 — Complete and correct formalization
- Part 2 (Pseudocode + Justification): 10/10 — Correct algorithms with proper Θ(|w|) justification
- Part 3 (Sorting Algorithm): 7/7 — Correct algorithm, thorough correctness proof, valid Θ(n) analysis
---
Problem 2: Multisets [20 points]
Score: 20/20 points
Assessment:
Approach: The student uses an augmented AVL tree (T₁) with key, count, and sub_total fields for Part 1, and adds a second augmented AVL tree (T₂) indexed by count values for Part 2. This matches the reference solution's two-tree approach, using AVL trees instead of Red-Black trees (both are valid balanced BSTs).
Strengths:
- Clear description of data structure fields and their semantics
- Explicit pseudocode for
GETCOUNTS≥andFREQUENToperations, making the algorithms concrete and verifiable - Thorough analysis of how T₂ is maintained during INSERT/DELETE operations
- Excellent summary table at the end showing all operations and their costs across both trees
- Correctly notes that
sub_total(andsub_freq) can be maintained during AVL rotations in O(1) per rotation
Errors/Gaps:
- None of substance.
Detailed Feedback:
Part 1 is exemplary. The augmented AVL tree with sub_total is exactly the right approach. The GETCOUNTS≥ pseudocode correctly accumulates counts from right subtrees and current nodes when traversing left, which is the standard technique for aggregate range queries on augmented BSTs. The student correctly identifies that rotations affect only O(1) nodes' augmented fields.
Part 2 is also excellent. The second tree T₂ keyed by count values with freq and sub_freq is the natural extension. The maintenance protocol (updating T₂ when counts change in T₁) is correctly described with proper handling of edge cases (new elements, elements with count dropping to 0). The FREQUENT operation correctly mirrors the GETCOUNTS≥ logic on T₂. The observation that T₂ has at most n nodes (since there are at most n distinct count values) correctly establishes the O(log n) bound.
Rubric Breakdown:
- Part 1 (Basic Multiset): 10/10 — Correct data structure with all operations at specified complexities
- Part 2 (Extended with WITHCOUNT/FREQUENT): 10/10 — Correct two-tree design with proper maintenance
---
Problem 3: AVL Trees: Height & Size [10 points]
Score: 9/10 points
Assessment:
Approach: The student derives h ≤ (1/log₂φ) · log₂(n + 1), i.e., c = 1/log₂φ ≈ 1.44042, k = 1, b = 0. This is achieved through a clean chain: minimum-node recurrence → Fibonacci relation → lower bound F(k) ≥ φ^(k−2) → height bound.
Strengths:
- Impeccably structured five-step proof with clear logical flow
- Correct induction proof that N(h) = F(h+2) − 1
- Correct strong induction proof that F(k) ≥ φ^(k−2), using the golden ratio property φ² = φ + 1
- Numerical verification at small values (h = 1, 2, 3, 4) to sanity-check the bound
- Insightful discussion of why c = 1/log₂φ is asymptotically tight
- All steps are rigorous with no logical gaps
Errors/Gaps:
- The bound is not the tightest achievable. The student uses F(k) ≥ φ^(k−2), which is weaker than F(k) ≥ φ^k/√5 (obtainable from the exact Binet formula). Using the tighter Fibonacci bound yields h < 1.44042 · log₂(n + 2) − 0.328 (the reference solution), which is roughly 0.33 smaller in the additive constant. Since the problem explicitly asks for "the best upper bound you can find," a small deduction is warranted.
Detailed Feedback: The proof is one of the best-organized submissions for this problem. Every step is carefully justified with proper induction proofs and all key identities are verified. The leading constant c = 1/log₂φ ≈ 1.44042 is optimal and correctly derived. The only shortcoming is that the Fibonacci lower bound F(k) ≥ φ^(k−2) is not as tight as F(k) ≥ φ^k/√5, which follows from the Binet formula. Using the tighter bound would give n + 1 ≥ F(h+2) ≥ φ^(h+2)/√5, leading to h < (1/log₂φ)·log₂(n+2) + (log₂5)/(2·log₂φ) − 2 ≈ 1.44042·log₂(n+2) − 0.328. This tighter bound shaves about 0.33 off the additive constant. The student's discussion of asymptotic tightness is a nice touch showing deep understanding.
Rubric Breakdown:
- Recurrence and Fibonacci relation: 3/3 — Correct N(h) = F(h+2) − 1 with induction proof
- Fibonacci lower bound: 2/3 — Correct but not the tightest available (used φ^(k−2) instead of φ^k/√5)
- Final derivation and form: 3/3 — Correctly expressed in required form with right leading constant
- Rigor and justification: 1/1 — Thorough, no logical gaps
---
Overall Summary
| Problem | Score | Notes |
|---|---|---|
| 1 (Radix Trees) | 20/20 | Excellent across all parts; Part 3 correctness argument is especially thorough |
| 2 (Multisets) | 20/20 | Clean two-tree design with correct augmentation and pseudocode |
| 3 (AVL Height) | 9/10 | Rigorous proof with optimal leading constant; additive constant slightly loose |
| Total | 49/50 |
General Comments:
This is an outstanding submission. The student demonstrates deep understanding of data structures, augmented trees, and mathematical proof techniques. Solutions are consistently well-organized with clear structure and rigorous justification. The pseudocode is clean and implementable. The only point of improvement is in Problem 3, where using the Binet formula for Fibonacci numbers would yield a slightly tighter bound. The student's additional analysis (numerical verification, asymptotic tightness discussion) shows mathematical maturity beyond what is required.
Total: 49/50
diff --git a/writeup.md b/writeup.md
index 7d45f57..618e63d 100644
--- a/writeup.md
+++ b/writeup.md
@@ -2,8 +2,305 @@
## Student Responses
-<!-- Write your solutions below. Add sections for each problem as needed. -->
-<!-- Use proper mathematical notation where applicable. -->
+---
-[Your solutions here]
+## Problem 1: Radix Trees [20 points]
+### Part 1: Formal Definition [3 points]
+
+A **radix tree** over the binary alphabet {0, 1} is a rooted binary tree T with the following structure:
+
+- **T.root** is the root node, representing the empty string ε.
+- Each node x has the following fields:
+ - **x.left**: pointer to the left child of x (representing extending the path with bit '0'), or NIL if no such child exists.
+ - **x.right**: pointer to the right child of x (representing extending the path with bit '1'), or NIL if no such child exists.
+ - **x.mark**: a boolean flag. x.mark = TRUE if and only if the bit string spelled out by the root-to-x path belongs to the stored set.
+
+The **key** (bit string) of a node x is defined implicitly as the sequence of bits read along the unique simple path from the root to x: each left-child edge contributes '0' and each right-child edge contributes '1'. The root's key is the empty string ε.
+
+A radix tree storing a set S of bit strings contains exactly those internal (unmarked) nodes that are needed to establish paths to marked nodes, plus all marked nodes corresponding to strings in S.
+
+---
+
+### Part 2: RADIX-TREE-SEARCH and RADIX-TREE-INSERT [10 points]
+
+**Pseudocode:**
+
+```
+RADIX-TREE-SEARCH(T, w)
+ // w = w[0]w[1]...w[m-1] is a bit string of length m = |w|
+ x = T.root
+ for i = 0 to |w| - 1:
+ if w[i] == 0:
+ if x.left == NIL:
+ return NIL // prefix not in tree → w not stored
+ x = x.left
+ else: // w[i] == 1
+ if x.right == NIL:
+ return NIL
+ x = x.right
+ if x.mark == TRUE:
+ return x // w is stored in the tree
+ else:
+ return NIL // w is a prefix path but not marked
+
+
+RADIX-TREE-INSERT(T, w)
+ // Insert bit string w into the radix tree T
+ x = T.root
+ for i = 0 to |w| - 1:
+ if w[i] == 0:
+ if x.left == NIL:
+ x.left = ALLOCATE-NODE()
+ x.left.left = NIL
+ x.left.right = NIL
+ x.left.mark = FALSE
+ x = x.left
+ else: // w[i] == 1
+ if x.right == NIL:
+ x.right = ALLOCATE-NODE()
+ x.right.left = NIL
+ x.right.right = NIL
+ x.right.mark = FALSE
+ x = x.right
+ x.mark = TRUE // mark the terminal node for w
+```
+
+**Time complexity justification:**
+
+In both algorithms, the only loop runs exactly |w| iterations (once per bit of w). Each iteration performs a constant number of operations: a comparison, a pointer follow (or pointer assignment with node allocation), and an index increment. No other loops or recursive calls are made. Therefore:
+
+- **RADIX-TREE-SEARCH**: performs exactly |w| iterations of the loop, each taking Θ(1) time, so the total time is Θ(|w|).
+- **RADIX-TREE-INSERT**: performs exactly |w| iterations of the loop, each taking Θ(1) time (node allocation is O(1)), so the total time is Θ(|w|).
+
+Both algorithms run in Θ(|w|) time. ∎
+
+---
+
+### Part 3: Lexicographic Sorting in Θ(n) Time [7 points]
+
+**Algorithm:**
+
+1. **Build the radix tree.** Insert every string in S into a radix tree T using RADIX-TREE-INSERT.
+
+2. **Traverse the tree in preorder (DFS, left before right).** Perform a depth-first traversal of T, always visiting the left subtree (bit '0') before the right subtree (bit '1'). Maintain a current path string (the sequence of bits on the root-to-current-node path). Whenever we visit a node x with x.mark = TRUE, output the current path string as the next element in the sorted order.
+
+**Why the output is in lexicographic order:**
+
+Consider any two distinct strings a, b ∈ S. We claim a is output before b if and only if a <_lex b.
+
+- **Case (ii): a is a proper prefix of b.** Since a is shorter, the node for a lies strictly above the node for b in the tree. In a preorder traversal, a parent is visited before its descendants, so a is output before b. This matches the lexicographic order (condition (ii): p < q and all a_i = b_i).
+
+- **Case (i): a and b differ at position j (first differing bit).** The path to a and the path to b agree for the first j steps and diverge at step j. At the divergence point, a goes left (a_j = 0) and b goes right (b_j = 1), or vice versa. Since we always visit the left subtree before the right subtree, strings with a '0' at position j are output before strings with a '1' at position j. Hence if a_j = 0 < 1 = b_j, then a is output before b, matching a <_lex b.
+
+**Time complexity:**
+
+- *Step 1 (building the tree):* Each string w ∈ S takes Θ(|w|) to insert. The total insertion time is Σ_{w ∈ S} Θ(|w|) = Θ(n), since the sum of all string lengths is n.
+
+- *Step 2 (DFS traversal):* The radix tree has at most O(n) nodes. To see this: each node (except the root) corresponds to one edge on the root-to-node path, and each such edge is "used" by at least one string in S. More precisely, the tree has at most n + 1 nodes total (at most one new node is created per bit inserted, and n total bits are inserted). The DFS visits each node exactly once, doing O(1) work per node (output the path string if marked, follow child pointers). Thus the traversal takes O(n) time.
+
+- **Total time:** Θ(n) + O(n) = Θ(n). ∎
+
+---
+
+## Problem 2: Multisets [20 points]
+
+### Part 1: Multiset Data Structure [10 points]
+
+**Data structure design:**
+
+Use an **augmented AVL tree** T₁ where each node x stores:
+- **x.key**: the unique element k ∈ K
+- **x.count**: the multiplicity of k in the multiset (a positive integer)
+- **x.sub_total**: the sum of counts of all elements in x's subtree, i.e., x.count + (x.left.sub_total if x.left else 0) + (x.right.sub_total if x.right else 0)
+
+Nodes are ordered by x.key in the standard BST order (using the linear order on K).
+
+**Operations:**
+
+**CREATE(S):** Set T₁ = NIL (empty tree). Time: O(1). ✓
+
+**INSERT(S, k):**
+Search for k in T₁ (O(log n) BST search).
+- If found: increment x.count by 1. Walk back up to the root updating x.sub_total for each ancestor. The AVL tree has height O(log n), so this is O(log n).
+- If not found: insert a new node with key k, count 1, sub_total 1 using the standard AVL insert. Rebalancing (rotations) updates sub_total for O(1) rotated nodes. Walking up the insertion path updates sub_total for O(log n) ancestors.
+
+Total time: O(log n). ✓
+
+**GETCOUNT(S, k):**
+Search for k in T₁. If found, return x.count; otherwise return 0.
+Time: O(log n) BST search. ✓
+
+**DELETE(S, k):**
+Search for k in T₁. If not found, do nothing (or signal an error).
+If found with x.count > 1: decrement x.count by 1 and update sub_total for ancestors. O(log n).
+If found with x.count = 1: delete the node using the standard AVL deletion (O(log n) by the stated fact). Update sub_total during the deletion path. O(log n). ✓
+
+**GETCOUNTS≥(S, k):**
+Traverse from the root, accumulating the total count of all elements with key ≥ k:
+
+```
+GETCOUNTS≥(T₁, k):
+ total = 0
+ x = T₁.root
+ while x ≠ NIL:
+ if k == x.key:
+ total += x.count + (x.right.sub_total if x.right else 0)
+ return total
+ elif k < x.key:
+ // x and everything in x's right subtree have key ≥ k
+ total += x.count + (x.right.sub_total if x.right else 0)
+ x = x.left
+ else: // k > x.key
+ x = x.right
+ return total
+```
+
+This traverses exactly one root-to-leaf path, performing O(1) work per step. Since the AVL tree has height O(log n), the total time is O(log n). ✓
+
+**Maintaining sub_total during rebalancing:**
+For each rotation (single or double), only the rotated nodes change their sub_total, and each can be recomputed from its children in O(1). This does not increase the asymptotic complexity of any operation.
+
+---
+
+### Part 2: Extended Multiset with WITHCOUNT and FREQUENT [10 points]
+
+**Data structure design:**
+
+Maintain the augmented AVL tree T₁ from Part 1, plus a second **augmented AVL tree** T₂ indexed by count values, where each node y stores:
+- **y.count_val**: an integer c (the count value this node tracks)
+- **y.freq**: the number of distinct elements in S that have exactly c occurrences
+- **y.sub_freq**: the sum of freq in y's subtree (augmented, analogous to sub_total in T₁)
+
+Nodes in T₂ are ordered by y.count_val.
+
+**Operations (new):**
+
+**WITHCOUNT(S, c):**
+Search T₂ for node with count_val = c. If found, return y.freq; otherwise return 0.
+Time: O(log n). (T₂ has at most n nodes since there are at most n distinct count values among n unique elements.) ✓
+
+**FREQUENT(S, c):**
+Return the number of distinct elements with count ≥ c, using the same traversal strategy as GETCOUNTS≥ but on T₂:
+
+```
+FREQUENT(T₂, c):
+ total = 0
+ y = T₂.root
+ while y ≠ NIL:
+ if c == y.count_val:
+ total += y.freq + (y.right.sub_freq if y.right else 0)
+ return total
+ elif c < y.count_val:
+ total += y.freq + (y.right.sub_freq if y.right else 0)
+ y = y.left
+ else:
+ y = y.right
+ return total
+```
+
+Time: O(log n). ✓
+
+**Maintaining T₂ when T₁ is modified:**
+
+Whenever an element k's count in T₁ changes from c_old to c_new (by INSERT or DELETE):
+1. In T₂: find node with count_val = c_old and decrement its freq by 1. If freq drops to 0, delete that node from T₂. Update sub_freq for ancestors. Time: O(log n).
+2. In T₂: find node with count_val = c_new and increment its freq by 1. If no such node exists, insert a new node with freq = 1. Update sub_freq for ancestors. Time: O(log n).
+
+For INSERT(S, k): c_old = current count of k (0 if k is new, but a new element has no T₂ entry, so we only add to c_new = 1's node). For DELETE(S, k): c_old = current count, c_new = c_old − 1 (or if c_old = 1, we remove k's T₁ node and remove from T₂'s count_val = 1 node).
+
+All five operations from Part 1 plus the two new operations remain O(log n) since each touches O(log n) nodes in each of T₁ and T₂. ✓
+
+**Summary of operations:**
+
+| Operation | T₁ work | T₂ work | Total |
+|-----------|---------|---------|-------|
+| CREATE(S) | O(1) | O(1) | O(1) |
+| INSERT(S, k) | O(log n) | O(log n) | O(log n) |
+| GETCOUNT(S, k) | O(log n) | — | O(log n) |
+| DELETE(S, k) | O(log n) | O(log n) | O(log n) |
+| GETCOUNTS≥(S, k) | O(log n) | — | O(log n) |
+| WITHCOUNT(S, c) | — | O(log n) | O(log n) |
+| FREQUENT(S, c) | — | O(log n) | O(log n) |
+
+---
+
+## Problem 3: AVL Trees: Height & Size [10 points]
+
+**Claim:** An AVL tree with n nodes has height at most
+
+$$h \leq \frac{1}{\log_2 \varphi} \cdot \log_2(n + 1)$$
+
+where φ = (1 + √5)/2 ≈ 1.618 is the golden ratio. This is of the form c · log₂(n + k) + b with **c = 1/log₂(φ) ≈ 1.44042**, **k = 1**, **b = 0**.
+
+---
+
+**Proof:**
+
+**Step 1: Define N(h) — minimum nodes in an AVL tree of height h.**
+
+Using the height convention that height = number of nodes on the longest root-to-leaf path (so a single-node tree has height 1, an empty tree has height 0):
+
+To minimize the number of nodes in an AVL tree of height h, we make one subtree of height h − 1 (minimum) and the other of height h − 2 (minimum), since the AVL property allows subtree heights to differ by at most 1. This gives the recurrence:
+
+$$N(0) = 0, \quad N(1) = 1, \quad N(h) = 1 + N(h-1) + N(h-2) \text{ for } h \geq 2$$
+
+Any AVL tree of height h has at least N(h) nodes, so n ≥ N(h).
+
+**Step 2: Relate N(h) to Fibonacci numbers.**
+
+Define the Fibonacci sequence: F(1) = 1, F(2) = 1, F(k) = F(k−1) + F(k−2) for k ≥ 3.
+
+**Claim:** N(h) = F(h + 2) − 1 for all h ≥ 0.
+
+*Proof by induction:*
+- Base case h = 0: N(0) = 0 and F(2) − 1 = 1 − 1 = 0. ✓
+- Base case h = 1: N(1) = 1 and F(3) − 1 = 2 − 1 = 1. ✓
+- Inductive step (h ≥ 2): Assume N(h−1) = F(h+1) − 1 and N(h−2) = F(h) − 1. Then:
+
+ N(h) = 1 + N(h−1) + N(h−2) = 1 + (F(h+1) − 1) + (F(h) − 1) = F(h+1) + F(h) − 1 = F(h+2) − 1. ✓
+
+Therefore N(h) + 1 = F(h + 2).
+
+**Step 3: Lower bound on Fibonacci numbers.**
+
+**Claim:** F(k) ≥ φ^(k−2) for all k ≥ 2, where φ = (1 + √5)/2.
+
+*Proof by strong induction:*
+- Base case k = 2: F(2) = 1 = φ^0 = 1. ✓ (equality)
+- Base case k = 3: F(3) = 2 ≥ φ^1 ≈ 1.618. ✓
+- Inductive step (k ≥ 4): Assume F(k−1) ≥ φ^(k−3) and F(k−2) ≥ φ^(k−4). Then:
+
+ F(k) = F(k−1) + F(k−2) ≥ φ^(k−3) + φ^(k−4) = φ^(k−4)(φ + 1) = φ^(k−4) · φ^2 = φ^(k−2).
+
+ The key identity used is **φ + 1 = φ²** (the defining property of the golden ratio, since φ² − φ − 1 = 0). ✓
+
+**Step 4: Derive the upper bound on h.**
+
+Since n ≥ N(h), we have:
+
+$$n + 1 \geq N(h) + 1 = F(h + 2) \geq \varphi^{(h+2)-2} = \varphi^h$$
+
+Therefore:
+
+$$\varphi^h \leq n + 1$$
+
+Taking log base φ of both sides (log_φ is an increasing function):
+
+$$h \leq \log_\varphi(n + 1) = \frac{\log_2(n + 1)}{\log_2 \varphi}$$
+
+**Step 5: Express in the required form.**
+
+$$\boxed{h \leq \frac{1}{\log_2 \varphi} \cdot \log_2(n + 1)}$$
+
+with **c = 1/log₂(φ) ≈ 1.44042**, **k = 1**, **b = 0**.
+
+**Verification at small values:**
+- h = 1, n_min = 1: (1/log₂φ)·log₂(2) = 1/log₂φ ≈ 1.44 ≥ 1. ✓
+- h = 2, n_min = 2: (1/log₂φ)·log₂(3) ≈ 1.44·1.585 ≈ 2.28 ≥ 2. ✓
+- h = 3, n_min = 4: (1/log₂φ)·log₂(5) ≈ 1.44·2.322 ≈ 3.34 ≥ 3. ✓
+- h = 4, n_min = 7: (1/log₂φ)·log₂(8) ≈ 1.44·3 = 4.32 ≥ 4. ✓
+
+**Why the constant c is tight:**
+
+As h → ∞, the minimum-node AVL tree satisfies N(h) ≈ φ^h · (φ²/√5) − 1, so n ≈ φ^h · 1.171. Taking log: h ≈ log_φ(n) − log_φ(1.171). This shows that h grows as exactly (1/log₂φ)·log₂(n) asymptotically, so the coefficient c = 1/log₂φ cannot be improved. ∎
Sub-Model Usage
| Model | Input | Output | Cache Read | Cost |
|---|---|---|---|---|
| claude-sonnet-4-6 | 6 | 24.5K | 138.0K | $0.92 |
BSCS Bench