Agent Work: Homework1
Claude Opus 4.6 · COMP 382: Reasoning about Algorithms
COMP 382: Reasoning about Algorithms
Homework 1
Total Points: 50
---
Problem 1: Radix Trees [20 points]
This problem is based on Problem 12.2 from CLRS. A *radix tree* is a data structure for the efficient representation of sets of strings. The following radix tree data structure stores the bit strings 1011, 10, 011, 100, and 0.
[root]
/ \
0 1
| |
1 0
/ /|\
1 0 1 (10)
| | |
(011) (100) 1
|
(1011)Figure 12.5: A radix tree storing the bit strings 1011, 10, 011, 100, and 0. To determine each node's key, traverse the simple path from the root to that node. There is no need, therefore, to store the keys in the nodes. The keys appear here for illustrative purposes only. Keys corresponding to blue nodes are not in the tree. Such nodes are present only to establish a path to other nodes.
Part 1 [3 points]
Define formally the radix tree data structure.
Part 2 [10 points]
Let T be a radix tree and w be a bit string. Write the pseudo-code of RADIX-TREE-SEARCH(T, w) and RADIX-TREE-INSERT(T, w) so that each of the functions takes Θ(|w|) time. Justify that your algorithms indeed take Θ(|w|) time.
Part 3 [7 points]
Given two strings a = a₀a₁...aₚ and b = b₀b₁...bᵧ, where each aᵢ and bⱼ is in some ordered alphabet, we say that string a is *lexicographically less than* string b if either:
(i) there exists an integer j, where 0 ≤ j ≤ min{p, q}, such that aᵢ = bᵢ for all i = 0, 1, ..., j − 1 and aⱼ < bⱼ, or
(ii) p < q and aᵢ = bᵢ for all i = 0, 1, ..., p.
Let S be a set of distinct bit strings whose lengths sum to n (i.e., the sum of the lengths of all strings in S equals n). Show how to use a radix tree to sort S lexicographically in Θ(n) time. Describe the algorithm idea in English, but please make sure your description is very clear. Justify that your algorithm indeed takes Θ(n) time.
---
Problem 2: Multisets [20 Points]
Let K be a set of elements. A *(finite) multiset* or *bag* over K is a collection of elements from K, each of which can occur any finite number of times. The *count* of an element a ∈ K in a multiset S is the number of occurrences of a in S. The count of an element that does not appear in the multiset is equal to 0. For example, the multiset S = {a, b, b, b, c, c, d} contains the elements a, b, c, d with counts 1, 3, 2, 1 respectively. The count of e in S is 0.
Part 1 [10 points]
Design a data structure that can represent finite multisets and can support the following operations with the specified time complexities.
| Operation | Description | Time (worst-case) |
|---|---|---|
| CREATE(S) | Set S to be the empty multiset | O(1) |
| INSERT(S, k) | Insert the element k in the multiset S | O(log n) |
| GETCOUNT(S, k) | Return the count of k in S | O(log n) |
| DELETE(S, k) | Delete one occurrence of k from S | O(log n) |
| GETCOUNTS≥(S, k) | Return the total count of elements ≥ k in S | O(log n) |
Assume that the elements are linearly ordered. In the table above, n is the number of unique elements in the multiset.
Part 2 [10 points]
Adapt the multiset data structure to support all operations from Part 1, as well as the following ones:
| Operation | Description | Time (worst-case) |
|---|---|---|
| WITHCOUNT(S, c) | Return the number of elements with count c in S | O(log n) |
| FREQUENT(S, c) | Return the number of elements with count ≥ c in S | O(log n) |
In the table above, n is the number of unique elements in the multiset.
Notes for both parts: Try to make your data structure as efficient as possible in terms of information stored. Describe the design of the data structure, and briefly describe in English how you would implement each of the operations so that it has the prescribed worst-case time complexity.
You can assume a known fact for AVL trees (in case you use AVL trees for your implementation). There is an algorithm for performing node deletion in an AVL tree with running time O(log n), where n is the size of the tree.
---
Problem 3: AVL Trees: Height & Size [10 Points]
In class, we discuss the main ideas that are needed to prove that an AVL tree with n nodes has height O(log n). Give the best upper bound you can find on the height of an AVL tree with n nodes. The upper bound that you give should be of the form c · log₂(n + k) + b, where c, k, b are constants. Provide a careful and very thorough justification of every step.
Note: Recall that for AVL trees we use the following definition: The *height* of a tree is the number of nodes in the longest path from the root to a leaf.
# COMP382 Homework 1
## Student Responses
<!-- Write your solutions below. Add sections for each problem as needed. -->
<!-- Use proper mathematical notation where applicable. -->
[Your solutions here]
COMP382 Homework 1
Student Responses
---
Problem 1: Radix Trees [20 points]
Part 1: Formal Definition [3 points]
A radix tree for storing a set S of bit strings is a rooted tree T with the following properties:
1. Binary branching: Each node v has at most two children: a left child (corresponding to bit 0) and a right child (corresponding to bit 1).
2. Membership flag: Each node v has a boolean attribute v.member, which is TRUE if and only if the bit string represented by the path from the root to v belongs to S.
3. Implicit keys: The key (bit string) associated with a node v is not stored explicitly. Instead, it is determined by the sequence of edge labels (0 for left, 1 for right) on the unique path from the root to v. The root represents the empty string ε.
4. No redundant leaves: Every leaf node has v.member = TRUE. Internal nodes may have v.member = TRUE or FALSE. (A node with member = FALSE exists only because it lies on the path to some descendant with member = TRUE.)
Formally, a bit string w = w₁w₂...wₖ belongs to S if and only if the node reached by starting at the root and following child w₁, then child w₂, ..., then child wₖ exists and has member = TRUE.
---
Part 2: Pseudocode for Search and Insert [10 points]
RADIX-TREE-SEARCH(T, w):
RADIX-TREE-SEARCH(T, w):
// w = w[1]w[2]...w[m] is a bit string of length m = |w|
node = T.root
for i = 1 to |w|:
if w[i] == 0:
node = node.left
else:
node = node.right
if node == NIL:
return FALSE
return node.memberTime complexity justification: The algorithm performs exactly |w| iterations of the for-loop. Each iteration does O(1) work: one comparison and one pointer traversal. The final check of node.member is O(1). Therefore, the total running time is Θ(|w|).
RADIX-TREE-INSERT(T, w):
RADIX-TREE-INSERT(T, w):
// w = w[1]w[2]...w[m] is a bit string of length m = |w|
node = T.root
for i = 1 to |w|:
if w[i] == 0:
if node.left == NIL:
node.left = new Node(member = FALSE, left = NIL, right = NIL)
node = node.left
else:
if node.right == NIL:
node.right = new Node(member = FALSE, left = NIL, right = NIL)
node = node.right
node.member = TRUETime complexity justification: The algorithm performs exactly |w| iterations of the for-loop. Each iteration does O(1) work: one comparison, possibly one node creation (which is O(1)), and one pointer traversal. Setting node.member = TRUE at the end is O(1). Therefore, the total running time is Θ(|w|).
Note: Both algorithms traverse exactly the path from the root to the node at depth |w|, performing constant work per level. Neither algorithm ever backtracks or visits additional nodes. Hence both run in Θ(|w|) time.
---
Part 3: Sorting Bit Strings Using a Radix Tree [7 points]
Algorithm:
Given a set S of distinct bit strings whose lengths sum to n:
1. Build the radix tree: Create an empty radix tree T (just a root with member = FALSE). For each string w ∈ S, call RADIX-TREE-INSERT(T, w).
2. Traverse the tree in pre-order, visiting 0-children before 1-children: Perform a depth-first traversal starting from the root. At each node, first check if the node is a member (output its string if so), then recursively visit the left child (edge labeled 0), then recursively visit the right child (edge labeled 1). Maintain the current path string by appending the edge label when descending and removing it when backtracking.
RADIX-TREE-SORT(T):
result = empty list
path = empty string
TRAVERSE(T.root, path, result)
return result
TRAVERSE(node, path, result):
if node == NIL:
return
if node.member == TRUE:
append path to result
TRAVERSE(node.left, path + "0", result)
TRAVERSE(node.right, path + "1", result)Correctness (lexicographic order):
The traversal produces strings in lexicographic order because:
- Rule (ii) — prefixes first: Since we visit a node before its descendants (pre-order), if a string s is a prefix of a string s', then s is output before s'. This satisfies rule (ii) of lexicographic ordering: a shorter string that is a prefix of a longer string comes first.
- Rule (i) — 0 before 1: At every node, we visit the left subtree (bit 0) before the right subtree (bit 1). This means that among two strings that first differ at position j, the one with bit 0 at position j is output first. Since 0 < 1, this satisfies rule (i).
Time complexity justification — Θ(n):
- Insertion phase: Each string wᵢ takes Θ(|wᵢ|) time to insert (as shown in Part 2). The total insertion time is Θ(Σᵢ |wᵢ|) = Θ(n).
- Traversal phase: The traversal visits each node in the tree exactly once. The total number of nodes in the tree is at most n + 1 (the root plus at most one node created per bit across all insertions; shared prefixes only reduce this count). At each node, we do O(1) work (check membership, recursive calls). Thus the traversal takes O(number of nodes) = O(n) time. (Note: appending the current path to the result can be done in O(1) per node if we store strings as pointers into the path array, or the total copying cost across all output strings is Σ|wᵢ| = n, which is also O(n).)
- Total time: Θ(n) + O(n) = Θ(n).
The lower bound Ω(n) holds because we must read all n bits of input.
---
Problem 2: Multisets [20 Points]
Part 1: Data Structure Design [10 points]
Data Structure: An augmented AVL tree (keyed by element value) where each node stores additional information about the multiplicity of the element and the total count in its subtree.
Node structure:
key: the element kcount: the number of occurrences of k in the multiset (multiplicity)subtree_total: the sum ofcountvalues for all nodes in the subtree rooted at this nodeheight: the height of the subtree (for AVL balancing)left,right: pointers to children
The tree is ordered by key values (using the linear ordering of elements). We define subtree_total(NIL) = 0.
Invariant: For every node v: v.subtree_total = v.count + v.left.subtree_total + v.right.subtree_total.
Operations:
CREATE(S) — O(1): Set S.root = NIL. This creates an empty multiset in O(1) time.
INSERT(S, k) — O(log n): Search the AVL tree for key k.
- *If k is found* at node v: Increment v.count by 1. Walk back up to the root, incrementing
subtree_totalby 1 for every ancestor of v (including v itself). No structural change to the tree. Time: O(log n) for the search + O(log n) for the ancestor updates = O(log n). - *If k is not found:* Insert a new node with key = k, count = 1, subtree_total = 1. Perform standard AVL rebalancing (rotations). After each rotation, update
subtree_totalfor the O(1) affected nodes. Updatesubtree_totalfor all ancestors along the insertion path. Time: O(log n).
Note: During AVL rotations, subtree_total can be recomputed in O(1) per rotation since each rotated node's subtree_total is just the sum of its children's subtree_total values plus its own count.
GETCOUNT(S, k) — O(log n): Search the AVL tree for key k. If found at node v, return v.count. If not found, return 0. Time: O(log n) for the search.
DELETE(S, k) — O(log n): Search the AVL tree for key k.
- *If k is not found:* Do nothing (or report error).
- *If k is found* at node v and v.count > 1: Decrement v.count by 1. Walk back up to the root, decrementing
subtree_totalby 1 for every ancestor (including v). Time: O(log n). - *If k is found* at node v and v.count = 1: Remove the node using standard AVL deletion (with rebalancing). Update
subtree_totalfor all affected ancestors. Time: O(log n), using the known fact that AVL deletion takes O(log n) time. Thesubtree_totalupdates along the path and during rotations add only O(log n) overhead.
GETCOUNTS≥(S, k) — O(log n): Return the total count of all elements ≥ k. Traverse from the root to a leaf, accumulating counts:
GETCOUNTS≥(S, k):
result = 0
node = S.root
while node ≠ NIL:
if node.key == k:
return result + node.count + node.right.subtree_total
else if node.key > k:
result = result + node.count + node.right.subtree_total
node = node.left
else: // node.key < k
node = node.right
return resultJustification: At each step, we follow a single root-to-leaf path. When we encounter a node with key > k, all elements in its right subtree are also > k, so we add the node's count and its right subtree's total. When key < k, we go right. When key == k, we add the node's count and its right subtree's total, and we are done. Since the AVL tree has height O(log n), this traversal visits O(log n) nodes, doing O(1) work per node. Total time: O(log n).
---
Part 2: Extended Data Structure [10 points]
Additional Data Structure: We maintain a second augmented AVL tree (the "count tree"), keyed by count values, alongside the primary tree from Part 1.
Count tree node structure:
key: a count value c (a positive integer)frequency: the number of unique elements in the multiset that have exactly this countsubtree_freq: the sum offrequencyvalues for all nodes in the subtreeheight,left,right: for AVL balancing
Invariant: For every node u in the count tree: u.subtree_freq = u.frequency + u.left.subtree_freq + u.right.subtree_freq.
The count tree has at most n nodes (one per distinct count value), and since there are at most n unique elements, there are at most n distinct count values. So operations on the count tree take O(log n) time.
Modified operations (INSERT and DELETE now also update the count tree):
INSERT(S, k) — O(log n): 1. Search the primary tree for k. 2. If k is found with current count = old_c: - In the count tree, find node with key = old_c and decrement its frequency. If frequency becomes 0, delete this node from the count tree. - Set new_c = old_c + 1. In the count tree, find node with key = new_c. If found, increment its frequency. If not found, insert a new node with key = new_c, frequency = 1. - In the primary tree, set k's count to new_c and update subtree_total values. 3. If k is not found: - Insert k into the primary tree with count = 1. - In the count tree, find node with key = 1. If found, increment its frequency. If not found, insert (key = 1, frequency = 1). 4. Update subtree_freq values in the count tree along affected paths. 5. Total: O(log n) for the primary tree operations + O(log n) for the count tree operations = O(log n).
DELETE(S, k) — O(log n): 1. Search the primary tree for k. If not found, do nothing. 2. If found with current count = old_c: - In the count tree, find node with key = old_c and decrement its frequency. If frequency becomes 0, delete this node. - If old_c > 1: set new_c = old_c − 1. In the count tree, find or insert node with key = new_c and increment its frequency. Update k's count in the primary tree. - If old_c = 1: remove k from the primary tree entirely. 3. Update subtree_freq values in the count tree. 4. Total: O(log n).
New operations:
WITHCOUNT(S, c) — O(log n): Search the count tree for key = c. If found at node u, return u.frequency. If not found, return 0. Time: O(log n) for the search.
FREQUENT(S, c) — O(log n): Return the number of unique elements with count ≥ c. This is analogous to GETCOUNTS≥ but on the count tree, using subtree_freq instead of subtree_total:
FREQUENT(S, c):
result = 0
node = count_tree.root
while node ≠ NIL:
if node.key == c:
return result + node.frequency + node.right.subtree_freq
else if node.key > c:
result = result + node.frequency + node.right.subtree_freq
node = node.left
else: // node.key < c
node = node.right
return resultJustification: Identical reasoning to GETCOUNTS≥ from Part 1. We traverse a single root-to-leaf path in the count tree (height O(log n)), doing O(1) work per node. When a node has key ≥ c, its right subtree also has keys ≥ c, so we accumulate frequency and the right child's subtree_freq. Total time: O(log n).
All operations from Part 1 (CREATE, GETCOUNT, GETCOUNTS≥) are unchanged and still run in their specified time complexities.
Space: O(n) for the primary tree + O(n) for the count tree = O(n) total, where n is the number of unique elements.
---
Problem 3: AVL Trees: Height & Size [10 Points]
Claim: An AVL tree with n nodes has height at most
$$h \leq \frac{1}{\log_2 \varphi} \cdot \log_2(n + 2) + \left(\log_\varphi \sqrt{5} - 2\right)$$
where φ = (1 + √5)/2 is the golden ratio. Numerically, this gives:
h ≤ 1.4404 · log₂(n + 2) − 0.3277
with c = 1/log₂(φ) ≈ 1.4404, k = 2, b = log_φ(√5) − 2 ≈ −0.3277.
Proof:
Step 1: Recurrence for minimum number of nodes.
Using the convention that height = number of nodes on the longest root-to-leaf path, let N(h) denote the minimum number of nodes in an AVL tree of height h.
- N(1) = 1 (a single node, which is both root and leaf).
- N(2) = 2 (a root with one child).
- For h ≥ 3: An AVL tree of height h has a root node whose two subtrees have heights that differ by at most 1. To minimize nodes, one subtree has height h − 1 and the other has height h − 2. Therefore:
N(h) = 1 + N(h − 1) + N(h − 2), for h ≥ 3.
Step 2: Relate N(h) to Fibonacci numbers.
Define F(h) = N(h) + 1. Then:
- F(1) = N(1) + 1 = 2
- F(2) = N(2) + 1 = 3
- For h ≥ 3: F(h) = N(h) + 1 = 1 + N(h−1) + N(h−2) + 1 = (N(h−1) + 1) + (N(h−2) + 1) = F(h−1) + F(h−2).
So F(h) satisfies the Fibonacci recurrence. Using the standard Fibonacci numbering where Fib(1) = 1, Fib(2) = 1, Fib(3) = 2, Fib(4) = 3, Fib(5) = 5, ..., we verify:
- F(1) = 2 = Fib(3) ✓
- F(2) = 3 = Fib(4) ✓
- F(h) = F(h−1) + F(h−2) matches Fib(h+2) = Fib(h+1) + Fib(h) ✓
Therefore F(h) = Fib(h + 2), and so:
N(h) = Fib(h + 2) − 1.
Step 3: Lower bound on Fibonacci numbers.
By the closed-form formula (Binet's formula):
Fib(k) = (φ^k − ψ^k) / √5
where φ = (1 + √5)/2 ≈ 1.618 and ψ = (1 − √5)/2 ≈ −0.618.
Since |ψ| < 1, we have |ψ^k| < 1 for all k ≥ 1, and thus |ψ^k / √5| < 1/√5 < 1/2 for all k ≥ 1. Therefore:
Fib(k) > φ^k / √5 − 1/2, for all k ≥ 1.
Step 4: Derive the height bound.
Since any AVL tree with n nodes and height h satisfies n ≥ N(h):
n ≥ N(h) = Fib(h + 2) − 1
So:
n + 1 ≥ Fib(h + 2) > φ^(h+2) / √5 − 1/2
Therefore:
n + 3/2 > φ^(h+2) / √5
Multiplying both sides by √5:
√5 · (n + 3/2) > φ^(h+2)
Since n + 3/2 ≤ n + 2, we can write the slightly looser but cleaner bound:
√5 · (n + 2) > φ^(h+2)
Taking log base 2 of both sides:
log₂(√5) + log₂(n + 2) > (h + 2) · log₂(φ)
Solving for h:
h + 2 < [log₂(n + 2) + log₂(√5)] / log₂(φ)
h < (1/log₂φ) · log₂(n + 2) + log₂(√5)/log₂(φ) − 2
Step 5: Compute the constants.
- φ = (1 + √5)/2 ≈ 1.6180
- log₂(φ) ≈ 0.6942
- c = 1/log₂(φ) = log_φ(2) ≈ 1.4404
- log₂(√5) = (1/2)log₂(5) ≈ 1.1610
- log₂(√5)/log₂(φ) = log_φ(√5) ≈ 1.6722
- b = log_φ(√5) − 2 ≈ −0.3277
Step 6: Final result.
$$\boxed{h \leq \frac{1}{\log_2 \varphi} \cdot \log_2(n + 2) + (\log_\varphi\!\sqrt{5} - 2) \approx 1.4404 \cdot \log_2(n + 2) - 0.3277}$$
where c = 1/log₂(φ) ≈ 1.4404, k = 2, and b = log_φ(√5) − 2 ≈ −0.3277.
Verification with small cases:
- h = 1, N(1) = 1, n ≥ 1: bound gives h ≤ 1.4404 · log₂(3) − 0.328 ≈ 1.4404 · 1.585 − 0.328 ≈ 1.955. So h = 1 ≤ 1.955 ✓
- h = 2, N(2) = 2, n ≥ 2: bound gives h ≤ 1.4404 · log₂(4) − 0.328 = 1.4404 · 2 − 0.328 = 2.553. So h = 2 ≤ 2.553 ✓
- h = 3, N(3) = 4, n ≥ 4: bound gives h ≤ 1.4404 · log₂(6) − 0.328 ≈ 1.4404 · 2.585 − 0.328 ≈ 3.396. So h = 3 ≤ 3.396 ✓
- h = 7, N(7) = 32, n ≥ 32: bound gives h ≤ 1.4404 · log₂(34) − 0.328 ≈ 1.4404 · 5.087 − 0.328 ≈ 6.998. So h = 7 ≤ 6.998? Let me check: N(7) = Fib(9) − 1 = 34 − 1 = 33. So n ≥ 33: bound gives 1.4404 · log₂(35) − 0.328 ≈ 1.4404 · 5.129 − 0.328 ≈ 7.058. So h = 7 ≤ 7.058 ✓
This confirms the bound is tight (close to being achieved by worst-case AVL trees), which demonstrates it is indeed a strong upper bound.
Problem 1: Radix Trees [20 points]
Part 1: Formal Definition
Score: 3/3 points
Approach: The student defines a radix tree using binary branching, a boolean membership flag, implicit keys via edge labels, and a no-redundant-leaves constraint.
Strengths:
- The definition is clean, formal, and complete
- The
memberboolean is equivalent to the reference solution's grey/black node distinction - The "no redundant leaves" constraint correctly captures that internal non-member nodes exist only as path intermediaries
- The formal condition at the end (bit string w ∈ S iff the corresponding node exists and has member = TRUE) is a nice touch
Errors/Gaps: None.
Detailed Feedback: This is a fully correct and arguably cleaner formulation than the reference solution's grey/black node approach. Both capture the same structure — the student's boolean member flag is isomorphic to the reference's node coloring. All essential properties are stated.
Rubric Breakdown:
- Tree structure (binary, rooted): 1/1
- Membership/key representation: 1/1
- Edge labeling and implicit key property: 1/1
---
Part 2: Pseudocode for Search and Insert
Score: 10/10 points
Approach: Standard traversal-based search and insert, following bits left (0) or right (1), with node creation on insert.
Strengths:
- Both algorithms are correct and clearly written
- The search correctly returns FALSE when hitting NIL (missing path) and returns
node.memberafter full traversal - The insert correctly creates intermediate nodes with
member = FALSEand setsmember = TRUEat the terminal node - Time complexity justification is precise: exactly |w| iterations, O(1) work per iteration, Θ(|w|) total
- The additional note about no backtracking strengthens the Θ (not just O) argument
Errors/Gaps: None.
Detailed Feedback: Both algorithms match the reference solution in correctness and logic. The student's pseudocode is clean and uses consistent notation. The Θ(|w|) justification correctly argues both the upper and lower bound (exactly |w| iterations with Θ(1) work each).
Rubric Breakdown:
- RADIX-TREE-SEARCH correctness: 4/4
- RADIX-TREE-INSERT correctness: 4/4
- Time complexity justification (Θ(|w|) for both): 2/2
---
Part 3: Sorting Bit Strings
Score: 7/7 points
Approach: Insert all strings into a radix tree, then perform pre-order DFS visiting 0-children before 1-children.
Strengths:
- Algorithm is correct and includes pseudocode (going beyond the reference)
- Correctness argument explicitly addresses both lexicographic rules: (i) 0 < 1 ordering and (ii) prefix ordering via pre-order traversal
- Time analysis is thorough: Θ(n) for insertion, O(n) for traversal (at most n+1 nodes), Θ(n) total
- Notes the Ω(n) lower bound to justify Θ(n) rather than just O(n)
- Correctly handles the subtlety of string copying cost during traversal
Errors/Gaps: None.
Detailed Feedback: This is an excellent answer that exceeds the reference solution in detail. The correctness argument mapping to both lexicographic rules is well-structured. The time analysis correctly bounds the number of tree nodes by n+1 and accounts for the output cost. The Ω(n) lower bound argument (must read all input bits) completes the Θ(n) claim.
Rubric Breakdown:
- Algorithm description (insert + traverse): 3/3
- Correctness of lexicographic ordering: 2/2
- Time complexity justification (Θ(n)): 2/2
---
Problem 2: Multisets [20 Points]
Part 1: Data Structure Design
Score: 10/10 points
Approach: Augmented AVL tree keyed by element value, with count (multiplicity) and subtree_total (sum of counts in subtree) fields.
Strengths:
- Data structure is well-defined with clear node structure and invariant
- All five operations are correctly described with proper time complexity
- INSERT correctly handles both found (increment count) and not-found (new node) cases
- DELETE correctly handles count > 1 (decrement) vs. count = 1 (remove node)
- GETCOUNTS≥ has clean pseudocode with correct accumulation logic
- Correctly addresses subtree_total maintenance during AVL rotations (O(1) per rotation)
Errors/Gaps: None.
Detailed Feedback: The student's AVL tree approach is equivalent to the reference solution's Red-Black tree approach — both are balanced BSTs with the same augmentation. The GETCOUNTS≥ algorithm is particularly well-done, with clear pseudocode showing exactly how to accumulate counts along a root-to-leaf path. The invariant is explicitly stated, which adds rigor.
Rubric Breakdown:
- Data structure design (node fields, invariant): 3/3
- CREATE, INSERT, GETCOUNT, DELETE implementations: 4/4
- GETCOUNTS≥ implementation with justification: 3/3
---
Part 2: Extended Data Structure
Score: 10/10 points
Approach: Second augmented AVL tree ("count tree") keyed by count values, with frequency and subtree_freq fields.
Strengths:
- The two-tree design matches the reference solution's approach
- Count tree node structure is clearly defined with proper invariant
- INSERT and DELETE correctly maintain both trees, including transitions between count values
- WITHCOUNT is a simple search in the count tree — correct
- FREQUENT uses the same accumulation technique as GETCOUNTS≥, adapted for the count tree — correct with pseudocode
- Correctly notes that the count tree has at most n nodes
- Space analysis (O(n) total) is provided
Errors/Gaps: None.
Detailed Feedback: This is a thorough and correct solution. The student correctly identifies that INSERT and DELETE must update both trees, and carefully describes the old_c → new_c transitions (decrement old count's frequency, increment new count's frequency, handle creation/deletion of count tree nodes). The FREQUENT operation elegantly mirrors GETCOUNTS≥ on the count tree. All time complexities are correctly justified as O(log n).
Rubric Breakdown:
- Count tree design (structure, invariant): 3/3
- Modified INSERT/DELETE maintaining both trees: 3/3
- WITHCOUNT and FREQUENT implementations: 3/3
- Time complexity justification: 1/1
---
Problem 3: AVL Trees: Height & Size [10 Points]
Score: 10/10 points
Approach: Derive the tight bound h ≤ 1.4404 · log₂(n+2) − 0.3277 via Fibonacci number analysis.
Strengths:
- Step 1: Recurrence N(h) = 1 + N(h−1) + N(h−2) is correctly established with proper base cases
- Step 2: The substitution F(h) = N(h) + 1 cleanly yields the Fibonacci recurrence, and N(h) = Fib(h+2) − 1 is proven by induction with verified base cases
- Step 3: Binet's formula is correctly applied to get Fib(k) > φ^k/√5 − 1/2
- Step 4: The chain of inequalities is correct: n + 3/2 > φ^(h+2)/√5, relaxed to n + 2 for cleanliness
- Step 5: All constants are correctly computed
- Step 6: Verification with small cases is excellent, and the student catches and corrects their own arithmetic error (N(7) = 33, not 32), demonstrating intellectual honesty
Errors/Gaps:
- Very minor: The student uses ≤ in the final claim while the derivation yields strict inequality (<). This is technically correct (a weaker but still valid statement since h is an integer), so no deduction.
Detailed Feedback: This is an exemplary proof that matches the reference solution's result exactly: c ≈ 1.4404, k = 2, b ≈ −0.3277. Every step is carefully justified. The self-correction during verification (initially writing N(7)=32, then catching that Fib(9)=34 so N(7)=33) actually strengthens the submission by showing careful checking. The verification with small cases demonstrates the bound is tight, which is a valuable addition beyond the reference solution.
Rubric Breakdown:
- Recurrence and base cases: 2/2
- Fibonacci connection (N(h) = Fib(h+2) − 1): 2/2
- Fibonacci lower bound via Binet's formula: 2/2
- Derivation of final bound with correct constants: 3/3
- Presentation and rigor: 1/1
---
Overall Summary
| Problem | Score | Notes | ||
|---|---|---|---|---|
| 1.1 (Formal Definition) | 3/3 | Clean, correct formal definition | ||
| 1.2 (Search/Insert) | 10/10 | Correct pseudocode with solid Θ( | w | ) justification |
| 1.3 (Sorting) | 7/7 | Excellent — includes pseudocode, correctness proof, and Ω(n) lower bound | ||
| 2.1 (Multiset DS) | 10/10 | Well-designed augmented AVL tree with clear invariant | ||
| 2.2 (Extended DS) | 10/10 | Correct two-tree approach with thorough operation descriptions | ||
| 3 (AVL Height) | 10/10 | Complete, rigorous proof matching the optimal bound | ||
| Total | 50/50 |
General Comments:
This is an outstanding submission. Every problem is solved correctly with thorough justification. The student demonstrates strong command of data structure design, algorithm analysis, and mathematical proof techniques. Notable strengths include: providing pseudocode even when not strictly required (Problem 1.3), explicitly stating invariants (Problem 2), self-correcting during verification (Problem 3), and arguing both upper and lower bounds for Θ claims. The solutions are equivalent to or exceed the reference solutions in detail and clarity.
Total: 50/50
diff --git a/writeup.md b/writeup.md
index 7d45f57..8e9c01b 100644
--- a/writeup.md
+++ b/writeup.md
@@ -2,8 +2,348 @@
## Student Responses
-<!-- Write your solutions below. Add sections for each problem as needed. -->
-<!-- Use proper mathematical notation where applicable. -->
+---
-[Your solutions here]
+## Problem 1: Radix Trees [20 points]
+### Part 1: Formal Definition [3 points]
+
+A **radix tree** for storing a set S of bit strings is a rooted tree T with the following properties:
+
+1. **Binary branching:** Each node v has at most two children: a left child (corresponding to bit 0) and a right child (corresponding to bit 1).
+
+2. **Membership flag:** Each node v has a boolean attribute v.member, which is TRUE if and only if the bit string represented by the path from the root to v belongs to S.
+
+3. **Implicit keys:** The key (bit string) associated with a node v is not stored explicitly. Instead, it is determined by the sequence of edge labels (0 for left, 1 for right) on the unique path from the root to v. The root represents the empty string ε.
+
+4. **No redundant leaves:** Every leaf node has v.member = TRUE. Internal nodes may have v.member = TRUE or FALSE. (A node with member = FALSE exists only because it lies on the path to some descendant with member = TRUE.)
+
+Formally, a bit string w = w₁w₂...wₖ belongs to S if and only if the node reached by starting at the root and following child w₁, then child w₂, ..., then child wₖ exists and has member = TRUE.
+
+---
+
+### Part 2: Pseudocode for Search and Insert [10 points]
+
+**RADIX-TREE-SEARCH(T, w):**
+
+```
+RADIX-TREE-SEARCH(T, w):
+ // w = w[1]w[2]...w[m] is a bit string of length m = |w|
+ node = T.root
+ for i = 1 to |w|:
+ if w[i] == 0:
+ node = node.left
+ else:
+ node = node.right
+ if node == NIL:
+ return FALSE
+ return node.member
+```
+
+**Time complexity justification:** The algorithm performs exactly |w| iterations of the for-loop. Each iteration does O(1) work: one comparison and one pointer traversal. The final check of node.member is O(1). Therefore, the total running time is **Θ(|w|)**.
+
+**RADIX-TREE-INSERT(T, w):**
+
+```
+RADIX-TREE-INSERT(T, w):
+ // w = w[1]w[2]...w[m] is a bit string of length m = |w|
+ node = T.root
+ for i = 1 to |w|:
+ if w[i] == 0:
+ if node.left == NIL:
+ node.left = new Node(member = FALSE, left = NIL, right = NIL)
+ node = node.left
+ else:
+ if node.right == NIL:
+ node.right = new Node(member = FALSE, left = NIL, right = NIL)
+ node = node.right
+ node.member = TRUE
+```
+
+**Time complexity justification:** The algorithm performs exactly |w| iterations of the for-loop. Each iteration does O(1) work: one comparison, possibly one node creation (which is O(1)), and one pointer traversal. Setting node.member = TRUE at the end is O(1). Therefore, the total running time is **Θ(|w|)**.
+
+Note: Both algorithms traverse exactly the path from the root to the node at depth |w|, performing constant work per level. Neither algorithm ever backtracks or visits additional nodes. Hence both run in Θ(|w|) time.
+
+---
+
+### Part 3: Sorting Bit Strings Using a Radix Tree [7 points]
+
+**Algorithm:**
+
+Given a set S of distinct bit strings whose lengths sum to n:
+
+1. **Build the radix tree:** Create an empty radix tree T (just a root with member = FALSE). For each string w ∈ S, call RADIX-TREE-INSERT(T, w).
+
+2. **Traverse the tree in pre-order, visiting 0-children before 1-children:** Perform a depth-first traversal starting from the root. At each node, first check if the node is a member (output its string if so), then recursively visit the left child (edge labeled 0), then recursively visit the right child (edge labeled 1). Maintain the current path string by appending the edge label when descending and removing it when backtracking.
+
+```
+RADIX-TREE-SORT(T):
+ result = empty list
+ path = empty string
+ TRAVERSE(T.root, path, result)
+ return result
+
+TRAVERSE(node, path, result):
+ if node == NIL:
+ return
+ if node.member == TRUE:
+ append path to result
+ TRAVERSE(node.left, path + "0", result)
+ TRAVERSE(node.right, path + "1", result)
+```
+
+**Correctness (lexicographic order):**
+
+The traversal produces strings in lexicographic order because:
+
+- **Rule (ii) — prefixes first:** Since we visit a node before its descendants (pre-order), if a string s is a prefix of a string s', then s is output before s'. This satisfies rule (ii) of lexicographic ordering: a shorter string that is a prefix of a longer string comes first.
+
+- **Rule (i) — 0 before 1:** At every node, we visit the left subtree (bit 0) before the right subtree (bit 1). This means that among two strings that first differ at position j, the one with bit 0 at position j is output first. Since 0 < 1, this satisfies rule (i).
+
+**Time complexity justification — Θ(n):**
+
+- **Insertion phase:** Each string wᵢ takes Θ(|wᵢ|) time to insert (as shown in Part 2). The total insertion time is Θ(Σᵢ |wᵢ|) = Θ(n).
+
+- **Traversal phase:** The traversal visits each node in the tree exactly once. The total number of nodes in the tree is at most n + 1 (the root plus at most one node created per bit across all insertions; shared prefixes only reduce this count). At each node, we do O(1) work (check membership, recursive calls). Thus the traversal takes O(number of nodes) = O(n) time. (Note: appending the current path to the result can be done in O(1) per node if we store strings as pointers into the path array, or the total copying cost across all output strings is Σ|wᵢ| = n, which is also O(n).)
+
+- **Total time: Θ(n) + O(n) = Θ(n).**
+
+The lower bound Ω(n) holds because we must read all n bits of input.
+
+---
+
+## Problem 2: Multisets [20 Points]
+
+### Part 1: Data Structure Design [10 points]
+
+**Data Structure:** An **augmented AVL tree** (keyed by element value) where each node stores additional information about the multiplicity of the element and the total count in its subtree.
+
+**Node structure:**
+- `key`: the element k
+- `count`: the number of occurrences of k in the multiset (multiplicity)
+- `subtree_total`: the sum of `count` values for all nodes in the subtree rooted at this node
+- `height`: the height of the subtree (for AVL balancing)
+- `left`, `right`: pointers to children
+
+The tree is ordered by key values (using the linear ordering of elements). We define `subtree_total(NIL) = 0`.
+
+**Invariant:** For every node v: `v.subtree_total = v.count + v.left.subtree_total + v.right.subtree_total`.
+
+**Operations:**
+
+**CREATE(S) — O(1):**
+Set S.root = NIL. This creates an empty multiset in O(1) time.
+
+**INSERT(S, k) — O(log n):**
+Search the AVL tree for key k.
+- *If k is found* at node v: Increment v.count by 1. Walk back up to the root, incrementing `subtree_total` by 1 for every ancestor of v (including v itself). No structural change to the tree. Time: O(log n) for the search + O(log n) for the ancestor updates = O(log n).
+- *If k is not found:* Insert a new node with key = k, count = 1, subtree_total = 1. Perform standard AVL rebalancing (rotations). After each rotation, update `subtree_total` for the O(1) affected nodes. Update `subtree_total` for all ancestors along the insertion path. Time: O(log n).
+
+Note: During AVL rotations, `subtree_total` can be recomputed in O(1) per rotation since each rotated node's `subtree_total` is just the sum of its children's `subtree_total` values plus its own `count`.
+
+**GETCOUNT(S, k) — O(log n):**
+Search the AVL tree for key k. If found at node v, return v.count. If not found, return 0. Time: O(log n) for the search.
+
+**DELETE(S, k) — O(log n):**
+Search the AVL tree for key k.
+- *If k is not found:* Do nothing (or report error).
+- *If k is found* at node v and v.count > 1: Decrement v.count by 1. Walk back up to the root, decrementing `subtree_total` by 1 for every ancestor (including v). Time: O(log n).
+- *If k is found* at node v and v.count = 1: Remove the node using standard AVL deletion (with rebalancing). Update `subtree_total` for all affected ancestors. Time: O(log n), using the known fact that AVL deletion takes O(log n) time. The `subtree_total` updates along the path and during rotations add only O(log n) overhead.
+
+**GETCOUNTS≥(S, k) — O(log n):**
+Return the total count of all elements ≥ k. Traverse from the root to a leaf, accumulating counts:
+
+```
+GETCOUNTS≥(S, k):
+ result = 0
+ node = S.root
+ while node ≠ NIL:
+ if node.key == k:
+ return result + node.count + node.right.subtree_total
+ else if node.key > k:
+ result = result + node.count + node.right.subtree_total
+ node = node.left
+ else: // node.key < k
+ node = node.right
+ return result
+```
+
+**Justification:** At each step, we follow a single root-to-leaf path. When we encounter a node with key > k, all elements in its right subtree are also > k, so we add the node's count and its right subtree's total. When key < k, we go right. When key == k, we add the node's count and its right subtree's total, and we are done. Since the AVL tree has height O(log n), this traversal visits O(log n) nodes, doing O(1) work per node. Total time: **O(log n)**.
+
+---
+
+### Part 2: Extended Data Structure [10 points]
+
+**Additional Data Structure:** We maintain a **second augmented AVL tree** (the "count tree"), keyed by count values, alongside the primary tree from Part 1.
+
+**Count tree node structure:**
+- `key`: a count value c (a positive integer)
+- `frequency`: the number of unique elements in the multiset that have exactly this count
+- `subtree_freq`: the sum of `frequency` values for all nodes in the subtree
+- `height`, `left`, `right`: for AVL balancing
+
+**Invariant:** For every node u in the count tree: `u.subtree_freq = u.frequency + u.left.subtree_freq + u.right.subtree_freq`.
+
+The count tree has at most n nodes (one per distinct count value), and since there are at most n unique elements, there are at most n distinct count values. So operations on the count tree take O(log n) time.
+
+**Modified operations (INSERT and DELETE now also update the count tree):**
+
+**INSERT(S, k) — O(log n):**
+1. Search the primary tree for k.
+2. If k is found with current count = old_c:
+ - In the count tree, find node with key = old_c and decrement its frequency. If frequency becomes 0, delete this node from the count tree.
+ - Set new_c = old_c + 1. In the count tree, find node with key = new_c. If found, increment its frequency. If not found, insert a new node with key = new_c, frequency = 1.
+ - In the primary tree, set k's count to new_c and update subtree_total values.
+3. If k is not found:
+ - Insert k into the primary tree with count = 1.
+ - In the count tree, find node with key = 1. If found, increment its frequency. If not found, insert (key = 1, frequency = 1).
+4. Update subtree_freq values in the count tree along affected paths.
+5. **Total: O(log n)** for the primary tree operations + O(log n) for the count tree operations = O(log n).
+
+**DELETE(S, k) — O(log n):**
+1. Search the primary tree for k. If not found, do nothing.
+2. If found with current count = old_c:
+ - In the count tree, find node with key = old_c and decrement its frequency. If frequency becomes 0, delete this node.
+ - If old_c > 1: set new_c = old_c − 1. In the count tree, find or insert node with key = new_c and increment its frequency. Update k's count in the primary tree.
+ - If old_c = 1: remove k from the primary tree entirely.
+3. Update subtree_freq values in the count tree.
+4. **Total: O(log n).**
+
+**New operations:**
+
+**WITHCOUNT(S, c) — O(log n):**
+Search the count tree for key = c. If found at node u, return u.frequency. If not found, return 0. Time: O(log n) for the search.
+
+**FREQUENT(S, c) — O(log n):**
+Return the number of unique elements with count ≥ c. This is analogous to GETCOUNTS≥ but on the count tree, using subtree_freq instead of subtree_total:
+
+```
+FREQUENT(S, c):
+ result = 0
+ node = count_tree.root
+ while node ≠ NIL:
+ if node.key == c:
+ return result + node.frequency + node.right.subtree_freq
+ else if node.key > c:
+ result = result + node.frequency + node.right.subtree_freq
+ node = node.left
+ else: // node.key < c
+ node = node.right
+ return result
+```
+
+**Justification:** Identical reasoning to GETCOUNTS≥ from Part 1. We traverse a single root-to-leaf path in the count tree (height O(log n)), doing O(1) work per node. When a node has key ≥ c, its right subtree also has keys ≥ c, so we accumulate frequency and the right child's subtree_freq. **Total time: O(log n).**
+
+**All operations from Part 1 (CREATE, GETCOUNT, GETCOUNTS≥) are unchanged** and still run in their specified time complexities.
+
+**Space:** O(n) for the primary tree + O(n) for the count tree = O(n) total, where n is the number of unique elements.
+
+---
+
+## Problem 3: AVL Trees: Height & Size [10 Points]
+
+**Claim:** An AVL tree with n nodes has height at most
+
+$$h \leq \frac{1}{\log_2 \varphi} \cdot \log_2(n + 2) + \left(\log_\varphi \sqrt{5} - 2\right)$$
+
+where φ = (1 + √5)/2 is the golden ratio. Numerically, this gives:
+
+**h ≤ 1.4404 · log₂(n + 2) − 0.3277**
+
+with c = 1/log₂(φ) ≈ 1.4404, k = 2, b = log_φ(√5) − 2 ≈ −0.3277.
+
+**Proof:**
+
+**Step 1: Recurrence for minimum number of nodes.**
+
+Using the convention that height = number of nodes on the longest root-to-leaf path, let N(h) denote the minimum number of nodes in an AVL tree of height h.
+
+- N(1) = 1 (a single node, which is both root and leaf).
+- N(2) = 2 (a root with one child).
+- For h ≥ 3: An AVL tree of height h has a root node whose two subtrees have heights that differ by at most 1. To minimize nodes, one subtree has height h − 1 and the other has height h − 2. Therefore:
+
+ N(h) = 1 + N(h − 1) + N(h − 2), for h ≥ 3.
+
+**Step 2: Relate N(h) to Fibonacci numbers.**
+
+Define F(h) = N(h) + 1. Then:
+- F(1) = N(1) + 1 = 2
+- F(2) = N(2) + 1 = 3
+- For h ≥ 3: F(h) = N(h) + 1 = 1 + N(h−1) + N(h−2) + 1 = (N(h−1) + 1) + (N(h−2) + 1) = F(h−1) + F(h−2).
+
+So F(h) satisfies the Fibonacci recurrence. Using the standard Fibonacci numbering where Fib(1) = 1, Fib(2) = 1, Fib(3) = 2, Fib(4) = 3, Fib(5) = 5, ..., we verify:
+
+- F(1) = 2 = Fib(3) ✓
+- F(2) = 3 = Fib(4) ✓
+- F(h) = F(h−1) + F(h−2) matches Fib(h+2) = Fib(h+1) + Fib(h) ✓
+
+Therefore **F(h) = Fib(h + 2)**, and so:
+
+**N(h) = Fib(h + 2) − 1.**
+
+**Step 3: Lower bound on Fibonacci numbers.**
+
+By the closed-form formula (Binet's formula):
+
+Fib(k) = (φ^k − ψ^k) / √5
+
+where φ = (1 + √5)/2 ≈ 1.618 and ψ = (1 − √5)/2 ≈ −0.618.
+
+Since |ψ| < 1, we have |ψ^k| < 1 for all k ≥ 1, and thus |ψ^k / √5| < 1/√5 < 1/2 for all k ≥ 1. Therefore:
+
+**Fib(k) > φ^k / √5 − 1/2, for all k ≥ 1.**
+
+**Step 4: Derive the height bound.**
+
+Since any AVL tree with n nodes and height h satisfies n ≥ N(h):
+
+n ≥ N(h) = Fib(h + 2) − 1
+
+So:
+
+n + 1 ≥ Fib(h + 2) > φ^(h+2) / √5 − 1/2
+
+Therefore:
+
+n + 3/2 > φ^(h+2) / √5
+
+Multiplying both sides by √5:
+
+√5 · (n + 3/2) > φ^(h+2)
+
+Since n + 3/2 ≤ n + 2, we can write the slightly looser but cleaner bound:
+
+√5 · (n + 2) > φ^(h+2)
+
+Taking log base 2 of both sides:
+
+log₂(√5) + log₂(n + 2) > (h + 2) · log₂(φ)
+
+Solving for h:
+
+h + 2 < [log₂(n + 2) + log₂(√5)] / log₂(φ)
+
+h < (1/log₂φ) · log₂(n + 2) + log₂(√5)/log₂(φ) − 2
+
+**Step 5: Compute the constants.**
+
+- φ = (1 + √5)/2 ≈ 1.6180
+- log₂(φ) ≈ 0.6942
+- c = 1/log₂(φ) = log_φ(2) ≈ 1.4404
+- log₂(√5) = (1/2)log₂(5) ≈ 1.1610
+- log₂(√5)/log₂(φ) = log_φ(√5) ≈ 1.6722
+- b = log_φ(√5) − 2 ≈ −0.3277
+
+**Step 6: Final result.**
+
+$$\boxed{h \leq \frac{1}{\log_2 \varphi} \cdot \log_2(n + 2) + (\log_\varphi\!\sqrt{5} - 2) \approx 1.4404 \cdot \log_2(n + 2) - 0.3277}$$
+
+where c = 1/log₂(φ) ≈ 1.4404, k = 2, and b = log_φ(√5) − 2 ≈ −0.3277.
+
+**Verification with small cases:**
+- h = 1, N(1) = 1, n ≥ 1: bound gives h ≤ 1.4404 · log₂(3) − 0.328 ≈ 1.4404 · 1.585 − 0.328 ≈ 1.955. So h = 1 ≤ 1.955 ✓
+- h = 2, N(2) = 2, n ≥ 2: bound gives h ≤ 1.4404 · log₂(4) − 0.328 = 1.4404 · 2 − 0.328 = 2.553. So h = 2 ≤ 2.553 ✓
+- h = 3, N(3) = 4, n ≥ 4: bound gives h ≤ 1.4404 · log₂(6) − 0.328 ≈ 1.4404 · 2.585 − 0.328 ≈ 3.396. So h = 3 ≤ 3.396 ✓
+- h = 7, N(7) = 32, n ≥ 32: bound gives h ≤ 1.4404 · log₂(34) − 0.328 ≈ 1.4404 · 5.087 − 0.328 ≈ 6.998. So h = 7 ≤ 6.998? Let me check: N(7) = Fib(9) − 1 = 34 − 1 = 33. So n ≥ 33: bound gives 1.4404 · log₂(35) − 0.328 ≈ 1.4404 · 5.129 − 0.328 ≈ 7.058. So h = 7 ≤ 7.058 ✓
+
+This confirms the bound is tight (close to being achieved by worst-case AVL trees), which demonstrates it is indeed a strong upper bound.
Sub-Model Usage
| Model | Input | Output | Cache Read | Cost |
|---|---|---|---|---|
| claude-opus-4-6 | 5 | 22.4K | 128.2K | $0.85 |
BSCS Bench