Agent Work: Homework2
Claude Sonnet 4.6 · COMP 382: Reasoning about Algorithms
COMP 382: Reasoning about Algorithms
Homework 2
Total Points: 100
---
Problem 1: Persistent Dynamic Sets [25 points]
During the course of an algorithm, you sometimes find that you need to maintain past versions of a dynamic set as it is updated. We call such a set persistent. One way to implement a persistent set is to copy the entire set whenever it is modified, but this approach can slow down a program and also consume a lot of space. Sometimes, you can do much better.
Consider a persistent set S with the operations INSERT, DELETE, and SEARCH, which you implement using binary search trees.
Maintain a separate root for every version of the set. In order to insert a key (e.g., key 5) into the set, create a new node with that key. This node becomes the child of a new copy of its parent, and so on up to the root. Thus, you copy only part of the tree and share some of the nodes with the original tree.
Assume that each tree node has the attributes key, left, and right but no parent.
Part (a)
For a persistent binary search tree (not a red-black tree, just a binary search tree), identify the nodes that need to change to insert or delete a node.
Part (b)
Write a procedure PERSISTENT-TREE-INSERT(T, z) that, given a persistent binary search tree T and a node z to insert, returns a new persistent tree T' that is the result of inserting z into T. Assume that you have a procedure COPY-NODE(x) that makes a copy of node x, including all of its attributes.
Part (c)
If the height of the persistent binary search tree T is h, what are the time and space requirements of your implementation of PERSISTENT-TREE-INSERT? (The space requirement is proportional to the number of nodes that are copied.)
Part (d)
Suppose that you include the parent attribute in each node. In this case, the PERSISTENT-TREE-INSERT procedure needs to perform additional copying. Prove that PERSISTENT-TREE-INSERT then requires Ω(n) time and space, where n is the number of nodes in the tree.
Part (e)
Show how to use red-black trees to guarantee that the worst-case running time and space are O(log n) per insertion. You may assume that all keys are distinct.
---
Problem 2: Join Operation on Red-Black Trees [35 points]
The join operation takes two dynamic sets S₁ and S₂ and an element x such that for any x₁ ∈ S₁ and x₂ ∈ S₂, we have x₁.key ≤ x.key ≤ x₂.key. It returns a set S = S₁ ∪ {x} ∪ S₂.
In this problem, we investigate how to implement the join operation on red-black trees.
Part (a)
Given a red-black tree T, let us store its black-height as the new attribute T.bh. Argue that RB-INSERT can maintain the bh attribute without requiring extra storage in the nodes of the tree and without increasing the asymptotic running times. Show how to determine the black-height of each node visited while descending through T, using O(1) time per node visited.
Let T₁ and T₂ be red-black trees and x be a key value such that for any nodes x₁ in T₁ and x₂ in T₂, we have x₁.key ≤ x.key ≤ x₂.key. You will show how to implement the operation RB-JOIN(T₁, x, T₂), which destroys T₁ and T₂ and returns a red-black tree T = T₁ ∪ {x} ∪ T₂. Let n be the total number of nodes in T₁ and T₂.
Part (b)
Assume that T₁.bh ≥ T₂.bh. Describe an O(log n) time algorithm that finds a black node y in T₁ with the largest key from among those nodes whose black-tree height is T₂.bh.
Part (c)
Let Tᵧ be the subtree rooted at y. Describe how T = Tᵧ ∪ {x} ∪ T₂ can replace Tᵧ in O(1) time without destroying the binary-search-tree property.
Part (d)
What color should we make x so that red-black properties 1, 3, and 5 are maintained? Describe how to enforce properties 2 and 4 in O(log n) time.
Part (e)
Argue that no generality is lost by making the assumption in part (b). Describe the symmetric situation that arises when T₁.bh ≤ T₂.bh.
Part (f)
Show that the running time of RB-JOIN is O(log n).
---
Problem 3: Joining and Splitting 2-3-4 Trees [40 Points]
The join operation takes two dynamic sets S' and S'' and an element x such that x'.key < x.key < x''.key for any x' ∈ S' and x'' ∈ S''. It returns a set S = S' ∪ {x} ∪ S''.
The split operation is like an "inverse" join: given a dynamic set S and an element x ∈ S, it creates a set S' that consists of all elements in S − {x} whose keys are less than x.key and a set S'' that consists of all elements in S − {x} whose keys are greater than x.key.
In this problem, we investigate how to implement these operations on 2-3-4 trees (B-trees with t = 2). We assume for convenience that elements consist only of keys and that all key values are distinct.
Part (a)
Show how to maintain, for every node x of a 2-3-4 tree, the height of the subtree rooted at x as an attribute x.height. Make sure that your implementation does not affect the asymptotic running times of searching and insertion.
Part (b)
Show how to implement the join operation. Given two 2-3-4 trees T' and T'' and a key k, the join operation should run in O(1 + |h' − h''|) time, where h' and h'' are the heights of T' and T'', respectively.
Part (c)
Consider the simple path p from the root of a 2-3-4 tree T to a given key k, the set S' of keys in T that are less than k, and the set S'' of keys in T that are greater than k. Show that p breaks S' into a set of trees {T'₀, T'₁, ..., T'ₘ} and a set of keys {k'₁, k'₂, ..., k'ₘ} such that y < k'ᵢ < z for i = 1, 2, ..., m and any keys y ∈ T'ᵢ₋₁ and z ∈ T'ᵢ. What is the relationship between the heights of T'ᵢ₋₁ and T'ᵢ? Describe how p breaks S'' into sets of trees and keys.
Part (d)
Show how to implement the split operation on T. Use the join operation to assemble the keys in S' into a single 2-3-4 tree T' and the keys in S'' into a single 2-3-4 tree T''. The running time of the split operation should be O(log n), where n is the number of keys in T. (Hint: The costs for joining should telescope.)
# COMP382 Homework 2
## Student Responses
<!-- Write your solutions below. Add sections for each problem as needed. -->
<!-- Use proper mathematical notation where applicable. -->
[Your solutions here]
COMP382 Homework 2
Student Responses
---
Problem 1: Persistent Dynamic Sets [25 points]
Part (a): Nodes That Change During Insert or Delete
For a persistent binary search tree (no parent pointers), the nodes that need to change during an INSERT or DELETE are exactly the nodes on the path from the root to the point of modification.
INSERT: To insert a new key z into BST T, we traverse from the root down to the insertion point (the NIL slot where z will be attached). Along this path, every node must have its left or right child pointer updated to point to the new copy of its child. Therefore, all nodes on the root-to-insertion-point path must be copied (their child pointers change).
DELETE: For a node with 0 or 1 child, every node on the path from the root to the deleted node must be copied (since their child pointers change). For a node with 2 children, we typically replace it with its in-order successor (or predecessor); in that case, every node on both the root-to-deleted-node path and the root-to-successor path must change. In any case, it is the nodes on the root-to-modification path that change.
All nodes not on this path can be shared between the old and new versions of the tree without modification.
---
Part (b): PERSISTENT-TREE-INSERT(T, z)
PERSISTENT-TREE-INSERT(T, z)
1 if T.root == NIL
2 T'.root = z
3 return T'
4 new-root = COPY-NODE(T.root)
5 trail = new-root // trail tracks the new copy of the current node
6 x = T.root // x tracks the original node
7 while TRUE
8 if z.key < x.key
9 if x.left == NIL
10 trail.left = z
11 break
12 else
13 new-node = COPY-NODE(x.left)
14 trail.left = new-node
15 x = x.left
16 trail = new-node
17 else // z.key > x.key (assuming distinct keys)
18 if x.right == NIL
19 trail.right = z
20 break
21 else
22 new-node = COPY-NODE(x.right)
23 trail.right = new-node
24 x = x.right
25 trail = new-node
26 T'.root = new-root
27 return T'Explanation: Starting at the root, we simultaneously walk the original tree (via x) and build the new spine of copies (via trail). At each step, we copy the next node on the insertion path and link it into the new spine. The subtrees branching off the insertion path are *not* copied—they are shared between T and T'. When we reach the insertion point, we attach the new node z. The new tree T' has its own root (new-root) but shares all nodes not on the insertion path with T.
---
Part (c): Time and Space Requirements
If the height of persistent BST T is h:
- Time: O(h). We traverse the tree from root to the insertion point, visiting at most h + 1 nodes. At each step we perform O(1) work (COPY-NODE + pointer updates).
- Space: O(h). We copy exactly the nodes on the path from root to the insertion point, which is at most h + 1 nodes. All other nodes are shared with the original tree.
---
Part (d): With Parent Attribute, PERSISTENT-TREE-INSERT Requires Ω(n) Time and Space
Claim: If nodes include a parent attribute, PERSISTENT-TREE-INSERT requires Ω(n) time and space.
Proof: Consider a complete binary tree T with n nodes and height h = Θ(log n). Suppose we call PERSISTENT-TREE-INSERT to insert node z, and z becomes a new leaf at depth h.
When we copy any node v on the insertion path (of length h + 1), we create a copy v'. The copy v' replaces v in the new tree. Now consider v's *sibling* s (the other child of v's parent). In the *original* tree T, s.parent points to v's original parent p. In the *new* tree T', v's parent has also been copied (to p'), so we need s.parent to point to p' (not p). But s is a node in the persistent tree T—we cannot modify it without breaking T. Therefore, we must create a copy s' of s, setting s'.parent = p'.
However, s' is now a newly created node. All of s's children still have their parent pointers pointing to s (the original), not to s'. To fix their parent pointers, we must copy each of s's children too, and in turn each of their children, and so on recursively.
More precisely: at each depth d (0 ≤ d ≤ h − 1), the path contains 1 node (which we copy). Its sibling at depth d has a subtree of size 2^(h−d) − 1 (in a complete binary tree). The sibling's parent pointer must be fixed (requiring a copy), and that copy's children's parent pointers must be fixed, etc. The entire sibling subtree must be copied.
The total number of nodes copied is:
- h + 1 nodes on the insertion path, plus
- At each depth d (0 ≤ d ≤ h−1), a sibling subtree of size 2^(h−d) − 1.
Summing the sibling subtrees: $$\sum_{d=0}^{h-1} (2^{h-d} - 1) = \sum_{j=1}^{h} (2^j - 1) = (2^{h+1} - 2) - h = \Theta(n)$$
(since 2^{h+1} - 2 = n + (n-1) - 1 = Θ(n) for a complete binary tree).
Therefore, PERSISTENT-TREE-INSERT must copy Ω(n) nodes, requiring Ω(n) time and space. ∎
---
Part (e): Using Red-Black Trees to Guarantee O(log n) per Insertion
We can guarantee O(log n) time and space per persistent insertion by using red-black trees (assuming all keys are distinct).
Key properties that enable this:
1. Height is O(log n): A red-black tree with n nodes has height h ≤ 2 lg(n + 1), so the insertion path copies O(log n) nodes.
2. At most O(log n) recolorings, O(1) rotations: During RB-INSERT-FIXUP, fixup proceeds in three cases: - *Case 1* (uncle is red): Recolor the parent, uncle, and grandparent (O(1) nodes); the current pointer moves up 2 levels. This can repeat O(log n) times but each iteration touches O(1) nodes. - *Cases 2 & 3* (uncle is black): Perform O(1) rotations; the loop terminates after at most one rotation.
3. Counting nodes that must be copied: Without parent pointers, we only copy nodes whose attributes (key, left-pointer, right-pointer, or color) change. These are: - All nodes on the root-to-insertion path: O(log n) nodes (their child pointers change). - In Case 1 of fixup: the uncle (off-path) is recolored. We must copy the uncle: O(1) extra copy per Case-1 iteration, for O(log n) total uncle copies. - In Cases 2 & 3: The rotations involve O(1) nodes, all already on or adjacent to the path.
Total: O(log n) node copies.
Persistent RB-INSERT procedure:
- Walk down the tree (root to insertion point), copying each node on the path: O(log n) copies.
- Attach the new (red) node z.
- Run a modified RB-INSERT-FIXUP on the new tree, using COPY-NODE whenever a node outside the current path must be changed (e.g., the uncle in Case 1). Each recoloring or rotation produces a new copy rather than modifying in place.
- The root of the new tree is the copy of the original root (with updated pointers/colors).
Since the total number of copies is O(log n) and each copy takes O(1) time, both the time and space requirements are O(log n) per insertion.
---
Problem 2: Join Operation on Red-Black Trees [35 points]
Part (a): Maintaining the bh Attribute
Claim: RB-INSERT can maintain T.bh (the black-height of the tree, stored at the root) without extra storage in individual nodes and without increasing asymptotic running times.
Maintaining T.bh during RB-INSERT: The black-height of a red-black tree only changes when a new root is created with a higher black-height. This happens precisely when RB-INSERT-FIXUP propagates Case 1 all the way up to the root, after which the root (forced to be black) has its color changed from red to black. In this situation, T.bh increases by 1. In all other cases (the loop terminates early via Cases 2/3, or the root was already black), T.bh remains unchanged.
Implementation: At the end of RB-INSERT, if the root's color was changed from red to black (line T.root.color = BLACK at the end of RB-INSERT), increment T.bh by 1. This requires O(1) extra work and O(1) extra storage (just the T.bh field).
Determining bh of each node during descent: We use T.bh at the root and update locally as we descend. The recurrence is:
> bh(x) = bh(x.parent) − (1 if x is black, else 0)
More precisely, as we descend from node x to child c:
if c.color == BLACK:
bh(c) = bh(x) - 1
else: // c.color == RED
bh(c) = bh(x)This computes bh at each visited node in O(1) time per node.
---
Part (b): Finding Black Node y in T₁ with Largest Key at Black-Height T₂.bh
Assumption: T₁.bh ≥ T₂.bh. We want the black node y in T₁ with the *largest key* among all nodes whose black-height equals T₂.bh.
Algorithm: Descend rightward from the root of T₁, tracking the current black-height, until we reach the first black node at black-height T₂.bh.
FIND-RIGHT-BLACK-NODE(T₁, b)
1 x = T₁.root
2 bh = T₁.bh
3 while not (x.color == BLACK and bh == b)
4 x = x.right // always go right (maximizes key)
5 if x.color == BLACK
6 bh = bh - 1
7 return xCorrectness: We greedily go right to maximize the key value. The black-height decreases by 1 each time we step into a black node. Since T₁.bh ≥ b, we will encounter every integer value from T₁.bh down to 0 along this path; in particular, we will encounter a black node with bh = b before reaching NIL.
Once we find a black node y with bh = b, any black node with bh = b in T₁ to the right of y would have to be a descendant of y's right subtree. But the right children of y have bh ≤ b − 1 (black right child) or bh = b (red right child, but then its children are black with bh = b − 1). So no black node with bh = b lies strictly to the right of y in the tree order. Hence y has the largest key among all black nodes at black-height b.
Time: O(depth of y) = O(log n), since the tree has height O(log n).
---
Part (c): Replacing Tᵧ with T = Tᵧ ∪ {x} ∪ T₂
Let y be the black node in T₁ found in Part (b), with bh(y) = T₂.bh. Let Tᵧ denote the subtree of T₁ rooted at y. Let p = y.parent in T₁.
Construction of the new subtree: Create a new node x (with the join key). Set:
- x.left = y (root of Tᵧ)
- x.right = T₂.root
Replacing Tᵧ: Update p's child pointer to x:
- If y was p's left child: set p.left = x
- If y was p's right child: set p.right = x
(If y was the root of T₁, then x becomes the new root.)
BST property is maintained:
- All keys in Tᵧ ≤ x.key (given, since all keys in T₁ ≤ x.key).
- All keys in T₂ ≥ x.key (given).
- Keys in T₁ above y (ancestors of y) are still correctly ordered relative to x and x's descendants, since we only replaced the subtree rooted at y with x's subtree.
This replacement takes O(1) time.
---
Part (d): Color of x and Enforcing Properties 2 and 4
Color x RED. Here is why this maintains properties 1, 3, and 5, but may violate properties 2 and 4:
Property 5 (black-height consistency): With x colored red:
- bh(x) must equal both bh(x.left) + [1 if x.left is black] and bh(x.right) + [1 if x.right is black].
- x.left = y (black node with bh(y) = T₂.bh), so contribution from left: bh(y) + 1 = T₂.bh + 1.
- x.right = T₂.root (black, since property 2 of T₂), so contribution from right: bh(T₂.root) + 1 = T₂.bh + 1.
- Both sides equal T₂.bh + 1. ✓ Property 5 is maintained.
Properties 1 and 3: Trivially maintained (x is colored red, no leaf/NIL nodes changed). ✓
Potential violations:
- Property 2 (root is black): If y was the root of T₁ (so T₁ = Tᵧ), then x is the new root and x is red. This violates property 2.
- Property 4 (no red node has a red child): If p (x's parent) is red, then we have a red–red violation at (p, x).
Enforcing properties 2 and 4 in O(log n) time: Run RB-INSERT-FIXUP starting at x. This procedure handles exactly these two violations:
- It fixes property 4 violations by rotations and recolorings, propagating up the tree at most O(log n) levels.
- At the end, it forces the root to black (fixing any property 2 violation).
RB-INSERT-FIXUP runs in O(log n) time (it processes O(log n) iterations of its loop, each taking O(1) time).
---
Part (e): Symmetry When T₁.bh ≤ T₂.bh
No generality is lost by assuming T₁.bh ≥ T₂.bh, because the roles of T₁ and T₂ can be swapped symmetrically.
Symmetric situation (T₁.bh ≤ T₂.bh):
Find a black node y in T₂ with the *smallest key* among all nodes in T₂ with black-height T₁.bh. To find such a node, descend leftward from the root of T₂ (to minimize key value), stopping at the first black node with bh = T₁.bh:
FIND-LEFT-BLACK-NODE(T₂, b)
1 x = T₂.root
2 bh = T₂.bh
3 while not (x.color == BLACK and bh == b)
4 x = x.left
5 if x.color == BLACK
6 bh = bh - 1
7 return xLet Tᵧ be the subtree of T₂ rooted at y. Create new node x with:
- x.left = T₁.root
- x.right = y
Replace y with x in T₂ (at y's parent), color x RED. The reasoning for why property 5 holds is identical:
- bh(T₁.root) = T₁.bh, and T₁.root is black, so left contribution = T₁.bh + 1.
- bh(y) = T₁.bh, and y is black, so right contribution = T₁.bh + 1. ✓
Then run RB-INSERT-FIXUP from x to restore properties 2 and 4.
---
Part (f): Running Time of RB-JOIN is O(log n)
Let n be the total number of nodes in T₁ and T₂.
1. Determine bh attributes: Both T₁.bh and T₂.bh are stored as attributes; comparing them takes O(1).
2. Finding node y (Part (b) or symmetric Part (e)): Descend O(T₁.bh) or O(T₂.bh) levels to find y. Since both black-heights are O(log n), this takes O(log n).
3. Constructing and replacing Tᵧ (Part (c)): Creating the new node x and updating one pointer takes O(1).
4. Coloring x and running RB-INSERT-FIXUP (Part (d)): RB-INSERT-FIXUP performs at most O(log n) iterations (it moves the "current node" up by 2 levels per Case-1 iteration), each O(1), plus O(1) rotations at the end. Total: O(log n).
Total running time: O(log n) + O(1) + O(log n) = O(log n). ∎
---
Problem 3: Joining and Splitting 2-3-4 Trees [40 points]
Part (a): Maintaining the Height Attribute
For every node x in a 2-3-4 tree, define x.height as:
- x.height = 0 if x is a leaf node.
- x.height = x.child[1].height + 1 if x is an internal node.
(Since all leaves in a 2-3-4 tree are at the same depth, all children of an internal node have the same height, so this is well-defined.)
Maintenance during Search: Read-only; no changes required.
Maintenance during Insert: During B-tree insertion (with the "split on the way down" variant or the "split when full" variant), the only structural changes are:
- Node splits: When a node x splits into two nodes x' and x'', both new nodes have the same height as x (they're at the same depth). The median key goes up to x's parent p. The parent p gains one more child pointer but its height does not change (since p's children are still at the same depth). We set x'.height = x.height and x''.height = x.height (no change needed for the existing nodes' heights).
- New root creation: When the root splits, a new root r is created with two children (the two halves of the old root). Set r.height = old root's height + 1.
In all cases, height updates require O(1) work per node modified, which does not affect the O(log n) asymptotic running time of search or insertion.
---
Part (b): Implementing the Join Operation
Given: 2-3-4 trees T' and T'' and key k such that all keys in T' are less than k and all keys in T'' are greater than k. Let h' = T'.root.height and h'' = T''.root.height.
Goal: Return a 2-3-4 tree T = T' ∪ {k} ∪ T'' in O(1 + |h' − h''|) time.
Algorithm (WLOG assume h' ≥ h''; the case h' < h'' is symmetric):
Case h' = h'': Create a new 2-node (root) r with key k, left child T'.root, right child T''.root, and height h' + 1. Return the tree rooted at r. O(1) time.
Case h' > h'': We need to attach T'' at the correct height in T'.
1. Find attachment point: Walk right in T' from the root, always going to the *rightmost child* of the current node, for exactly (h' − h'') steps. Let y be the node reached; y.height = h''. Let p be y's parent.
2. Create a new 2-node: Create a node z with key k, left child y, right child T''.root, and height h'' + 1.
3. Insert z into p: In 2-3-4 tree terms, we are inserting the key k (with right subtree T''.root) into node p at the position immediately after y (replacing the pointer to y with a pointer to z, and adding key k to p with the right subtree pointer going to T''.root). This "insertion" adds one key/pointer pair to p.
4. Fixup: If p now has 4 keys (overflowed to 5 children), split p and push the median key up to p's parent. Repeat this splitting process upward. Splits can cascade at most up the path from p to the root, which has length h' − h''. Each split takes O(1) time, so the fixup takes O(h' − h'') time.
Total time: O(h' − h'') for finding y + O(h' − h'') for fixup = O(1 + |h' − h''|).
Height update: During splits, each split creates a sibling with the same height as the split node, which is correct. If a new root is created (root split during fixup), its height is set to h' + 1.
---
Part (c): How Path p Breaks S' and S'' into Trees and Keys
Let T be a 2-3-4 tree with root height h, and let k be a given key (either in T or a query key). Let p be the simple path from the root to the node containing k (or the leaf position where k would be inserted).
Breaking S' = {keys in T less than k}:
As we walk down path p, at each node v on the path (say v has keys v.k₁ < v.k₂ < ... < v.kₘ and children v.c₁, v.c₂, ..., v.c_{m+1}), suppose the path continues through child v.c_i. Then:
- Children v.c₁, ..., v.c_{i−1} are subtrees entirely in S' (their ranges are below v.k_{i−1} < k).
- Keys v.k₁, ..., v.k_{i−1} are in S' (they are less than k).
Collecting these top-to-bottom along the path, S' is broken into:
- Trees: T'₀, T'₁, ..., T'ₘ (left-branching subtrees and possibly a leaf, ordered by key value)
- Separator keys: k'₁, k'₂, ..., k'ₘ (the keys on path nodes that are less than k)
such that all keys in T'ᵢ₋₁ are less than k'ᵢ, and k'ᵢ is less than all keys in T'ᵢ.
Height relationship between consecutive T'ᵢ: Since the path visits one node per level (one for each depth 0, 1, ..., h), and the left-branching subtrees at depth d all have height h − d − 1, the heights satisfy:
$$h(T'_0) \geq h(T'_1) \geq \cdots \geq h(T'_m)$$
The heights are non-increasing (weakly decreasing). Two consecutive T'ᵢ have the *same* height only if they both branch off from the same node on the path (i.e., the path-node's left-branching subtrees are enumerated left to right at the same depth). Adjacent trees from different depths have strictly decreasing heights (specifically, h(T'ᵢ) decreases by at least 1 each time the depth increases).
Breaking S'' = {keys in T greater than k}:
Symmetrically, at each path node v, the right-branching children (v.c_{i+1}, ..., v.c_{m+1}) and keys (v.k_i, ..., v.k_m) contribute to S''. This yields:
- Trees T''₀, T''₁, ..., T''ₗ (right-branching subtrees, ordered by key value: T''₀ has the smallest keys in S'', T''ₗ has the largest)
- Separator keys k''₁, ..., k''ₗ
The heights of T''₀, T''₁, ..., T''ₗ are non-decreasing: $$h(T''_0) \leq h(T''_1) \leq \cdots \leq h(T''_l)$$
because T''₀ (the leftmost subtrees of S'', closest to k) come from the deepest part of the path (small height), and T''ₗ (farthest from k) comes from the shallowest part of the path (large height).
---
Part (d): Implementing Split in O(log n) Time
Algorithm: Given a 2-3-4 tree T with n keys and key k ∈ T:
1. Traverse path p from the root to the node containing k: O(log n) time.
2. Extract the separator trees and keys for S' and S'' from path p (as described in Part (c)). - Remove key k itself from its node. - S' is represented by (T'₀, k'₁, T'₁, k'₂, ..., T'ₘ) - S'' is represented by (T''₀, k''₁, T''₁, k''₂, ..., T''ₗ)
3. Assemble T' using join operations from right to left (inside-out):
Start with R = T'ₘ. For i = m, m−1, ..., 1: $$R \leftarrow \text{JOIN}(T'_{i-1},\; k'_i,\; R)$$
The final R is T'.
4. Assemble T'' using join operations from left to right:
Start with R = T''₀. For i = 1, 2, ..., l: $$R \leftarrow \text{JOIN}(R,\; k''_i,\; T''_i)$$
The final R is T''.
Running time analysis via telescoping:
For assembling T': By Part (c), the heights satisfy h(T'₀) ≥ h(T'₁) ≥ ... ≥ h(T'ₘ). When we join T'ᵢ₋₁ with the right portion R (which has accumulated to height approximately h(T'ᵢ₋₁) after previous joins), the cost of the i-th join is:
$$\text{cost}_i = O(1 + |h(T'_{i-1}) - h(R)|)$$
After joining T'ᵢ₋₁ with R, the result has height at most h(T'ᵢ₋₁) + 1. Therefore, for the next join involving T'ᵢ₋₂:
$$|h(T'_{i-2}) - h(\text{new }R)| \leq h(T'_{i-2}) - h(T'_{i-1}) + 1$$
Summing the costs: $$\sum_{i=1}^{m} \text{cost}_i = O(m) + O\!\left(\sum_{i=1}^{m} (h(T'_{i-1}) - h(T'_i) + 1)\right)$$
The height differences telescope: $$\sum_{i=1}^{m} (h(T'_{i-1}) - h(T'_i)) = h(T'_0) - h(T'_m) \leq h' - 0 = h'$$
and $m \leq 3h'$ (since each of the h' nodes on the path contributes at most 3 left-branching children). Thus:
$$\text{total cost for assembling } T' = O(m) + O(h') = O(h') = O(\log n)$$
The same telescoping argument applies to assembling T'', giving O(log n).
Total running time of SPLIT:
- Path traversal: O(log n)
- Assembling T': O(log n)
- Assembling T'': O(log n)
- Total: O(log n) ∎
Now let me provide a thorough grading of this submission.
---
Problem 1: Persistent Dynamic Sets [25 points]
Score: 25/25 points
Assessment:
Approach: The student addresses all five parts with detailed, correct reasoning. They provide more thorough explanations than the reference solution in several places.
Strengths:
- Part (a): Goes beyond the reference by correctly analyzing the delete case with 2 children (successor path must also be copied). This is more complete than the reference.
- Part (b): The pseudocode is clean, correct, and well-commented. The dual-pointer technique (x for original, trail for new copy) is clear and correct. Correctly handles the empty tree case and shares non-path subtrees.
- Part (c): Correct O(h) time and space with clear justification.
- Part (d): Outstanding proof. Uses a complete binary tree as a concrete example and provides a rigorous summation showing Ω(n) total copies: ∑(2^(h-d) − 1) = Θ(n). This is significantly more rigorous than the reference's brief induction sketch.
- Part (e): Excellent analysis of RB-INSERT-FIXUP cases, correctly counting O(log n) node copies including uncle nodes in Case 1. The observation about using a stack instead of parent pointers is correct.
Errors/Gaps:
- None of substance.
Detailed Feedback: This is an exemplary solution for Problem 1. Every part is correct, well-justified, and in several cases (particularly parts (d) and (e)) substantially more detailed and rigorous than the reference solution. The Ω(n) proof in part (d) with the explicit summation is particularly impressive. The persistent RB-INSERT description in part (e) correctly identifies all the nodes that need copying during fixup and explains why the total remains O(log n).
Rubric Breakdown:
- Part (a): 5/5 - Correct identification of path nodes; bonus detail on delete cases
- Part (b): 8/8 - Correct, clear pseudocode with proper sharing of subtrees
- Part (c): 4/4 - Correct O(h) time and O(h) space
- Part (d): 4/4 - Rigorous Ω(n) proof with explicit summation
- Part (e): 4/4 - Correct use of RB-trees with detailed case analysis
---
Problem 2: Join Operation on Red-Black Trees [35 points]
Score: 35/35 points
Assessment:
Approach: The student provides thorough, correct solutions for all six parts with detailed justifications for each step.
Strengths:
- Part (a): Correctly explains that T.bh only changes when the root's color flips from red to black at the end of RB-INSERT. The descent recurrence bh(c) = bh(x) − [1 if c is black] is stated clearly and correctly.
- Part (b): The FIND-RIGHT-BLACK-NODE algorithm is correct and includes a good correctness argument explaining why y has the largest key among black nodes at the target black-height.
- Part (c): Correctly describes the O(1) replacement operation, updating the parent's child pointer, and argues BST property preservation.
- Part (d): Correctly colors x red, carefully verifies property 5 (both subtrees contribute T₂.bh + 1), identifies exactly which properties may be violated (2 and 4), and proposes RB-INSERT-FIXUP.
- Part (e): Correctly describes the symmetric case with the FIND-LEFT-BLACK-NODE algorithm descending the left spine of T₂, and re-verifies property 5.
- Part (f): Clean O(log n) analysis breaking down each step.
Errors/Gaps:
- None of substance.
Detailed Feedback: This is a complete, correct solution to all parts of Problem 2. The student demonstrates deep understanding of red-black tree properties throughout. The correctness argument in part (b) — explaining why no black node with the target black-height lies to the right of y — is a nice touch not present in the reference. The verification of property 5 in part (d) is explicit and rigorous, directly computing the black-height contributions from both children. The symmetric case in part (e) is fully worked out rather than merely stated.
Rubric Breakdown:
- Part (a): 6/6 - Correct bh maintenance and O(1) per-node descent computation
- Part (b): 6/6 - Correct algorithm with correctness proof and O(log n) analysis
- Part (c): 6/6 - Correct O(1) replacement with BST property argument
- Part (d): 7/7 - Correct coloring, property verification, and fixup in O(log n)
- Part (e): 5/5 - Complete symmetric case description
- Part (f): 5/5 - Correct O(log n) total running time analysis
---
Problem 3: Joining and Splitting 2-3-4 Trees [40 points]
Score: 39/40 points
Assessment:
Approach: The student provides detailed solutions for all four parts, with correct algorithms, complexity analyses, and a well-executed telescoping argument for the split operation.
Strengths:
- Part (a): Clear definition of height (0 for leaves, child.height + 1 for internal), correct analysis of height maintenance during splits and root creation, properly argues O(1) work per modification.
- Part (b): Correctly handles both cases (h' = h'' and h' > h''), identifies the attachment point, handles cascading splits, and achieves the target O(1 + |h' − h''|) time.
- Part (c): Thorough decomposition of S' and S'' with explicit description of which subtrees and keys are extracted at each path node. Correctly identifies that heights of S' trees are non-increasing and S'' trees are non-decreasing (under the student's chosen ordering convention).
- Part (d): Excellent implementation of split using inside-out join assembly. The telescoping argument is rigorous: height differences sum to h(T'₀) − h(T'ₘ) ≤ h', and m ≤ 3h', giving O(log n) total.
Errors/Gaps:
- Part (b): The description of the join operation mixes two different conceptual frameworks. The student first describes creating a 2-node z with left child y and right child T''.root, then says "In 2-3-4 tree terms" and describes the correct operation of adding key k to node p. The literal description of "replacing the pointer to y with a pointer to z" would insert an extra level in the tree, which would violate the balanced-leaf property. The "In 2-3-4 tree terms" clarification is correct (adding k as a key to p with T''.root as an additional child), but the initial description is misleading. Minor deduction (−1 point) for this ambiguous presentation.
Detailed Feedback: This is an excellent solution overall. Parts (a), (c), and (d) are essentially flawless. The telescoping argument in part (d) is particularly well-executed — the student correctly tracks how the accumulated tree height grows after each join and shows the total cost telescopes to O(h') = O(log n), with the additional O(m) = O(log n) term from the constant cost per join properly accounted for.
The only issue is in part (b), where the student describes creating an intermediate 2-node z and then "replacing the pointer to y with a pointer to z" in p. In a 2-3-4 tree, this would add an extra internal level, breaking the property that all leaves are at the same depth. The student's own clarification ("In 2-3-4 tree terms, we are inserting the key k... into node p... adding key k to p with the right subtree pointer going to T''.root") is the correct operation, but the two descriptions are contradictory and a reader could be confused about which is intended. The time complexity analysis is correct regardless.
Rubric Breakdown:
- Part (a): 10/10 - Correct height definition and maintenance during search/insert
- Part (b): 9/10 - Correct algorithm and complexity, but ambiguous description mixing BST-style intermediate node creation with 2-3-4 tree key insertion (−1)
- Part (c): 10/10 - Thorough path decomposition with correct height relationships
- Part (d): 10/10 - Correct split algorithm with rigorous telescoping argument
---
Overall Summary
| Problem | Score | Notes |
|---|---|---|
| 1 - Persistent Dynamic Sets | 25/25 | Exemplary; more rigorous than reference in parts (d) and (e) |
| 2 - Join on Red-Black Trees | 35/35 | Complete and correct throughout; excellent property verification |
| 3 - Joining/Splitting 2-3-4 Trees | 39/40 | Excellent; minor ambiguity in join description (part b) |
| Total | 99/100 |
General Comments:
This is an outstanding submission demonstrating deep understanding of persistent data structures, red-black trees, and B-tree variants. The student consistently goes beyond the reference solution in rigor — the Ω(n) proof with explicit summation (Problem 1d), the correctness argument for the rightmost black node (Problem 2b), the careful property-by-property verification (Problem 2d), and the detailed telescoping argument (Problem 3d) are all stronger than the reference. The writing is clear, well-organized, and mathematically precise throughout. The only deduction is for a slightly confusing description in Problem 3(b) where two contradictory framings of the join operation are presented, though the correct one is identified. Overall, this reflects excellent mastery of the material.
Total: 99/100
diff --git a/writeup.md b/writeup.md
index 0ea5554..6323061 100644
--- a/writeup.md
+++ b/writeup.md
@@ -2,8 +2,399 @@
## Student Responses
-<!-- Write your solutions below. Add sections for each problem as needed. -->
-<!-- Use proper mathematical notation where applicable. -->
+---
-[Your solutions here]
+## Problem 1: Persistent Dynamic Sets [25 points]
+### Part (a): Nodes That Change During Insert or Delete
+
+For a persistent binary search tree (no parent pointers), the nodes that need to change during an INSERT or DELETE are exactly the nodes on the **path from the root to the point of modification**.
+
+**INSERT:** To insert a new key z into BST T, we traverse from the root down to the insertion point (the NIL slot where z will be attached). Along this path, every node must have its left or right child pointer updated to point to the new copy of its child. Therefore, all nodes on the root-to-insertion-point path must be copied (their child pointers change).
+
+**DELETE:** For a node with 0 or 1 child, every node on the path from the root to the deleted node must be copied (since their child pointers change). For a node with 2 children, we typically replace it with its in-order successor (or predecessor); in that case, every node on both the root-to-deleted-node path and the root-to-successor path must change. In any case, it is the nodes on the root-to-modification path that change.
+
+All nodes **not** on this path can be **shared** between the old and new versions of the tree without modification.
+
+---
+
+### Part (b): PERSISTENT-TREE-INSERT(T, z)
+
+```
+PERSISTENT-TREE-INSERT(T, z)
+ 1 if T.root == NIL
+ 2 T'.root = z
+ 3 return T'
+ 4 new-root = COPY-NODE(T.root)
+ 5 trail = new-root // trail tracks the new copy of the current node
+ 6 x = T.root // x tracks the original node
+ 7 while TRUE
+ 8 if z.key < x.key
+ 9 if x.left == NIL
+10 trail.left = z
+11 break
+12 else
+13 new-node = COPY-NODE(x.left)
+14 trail.left = new-node
+15 x = x.left
+16 trail = new-node
+17 else // z.key > x.key (assuming distinct keys)
+18 if x.right == NIL
+19 trail.right = z
+20 break
+21 else
+22 new-node = COPY-NODE(x.right)
+23 trail.right = new-node
+24 x = x.right
+25 trail = new-node
+26 T'.root = new-root
+27 return T'
+```
+
+**Explanation:** Starting at the root, we simultaneously walk the original tree (via `x`) and build the new spine of copies (via `trail`). At each step, we copy the next node on the insertion path and link it into the new spine. The subtrees branching off the insertion path are *not* copied—they are shared between T and T'. When we reach the insertion point, we attach the new node z. The new tree T' has its own root (`new-root`) but shares all nodes not on the insertion path with T.
+
+---
+
+### Part (c): Time and Space Requirements
+
+If the height of persistent BST T is h:
+
+- **Time: O(h)**. We traverse the tree from root to the insertion point, visiting at most h + 1 nodes. At each step we perform O(1) work (COPY-NODE + pointer updates).
+
+- **Space: O(h)**. We copy exactly the nodes on the path from root to the insertion point, which is at most h + 1 nodes. All other nodes are shared with the original tree.
+
+---
+
+### Part (d): With Parent Attribute, PERSISTENT-TREE-INSERT Requires Ω(n) Time and Space
+
+**Claim:** If nodes include a `parent` attribute, PERSISTENT-TREE-INSERT requires Ω(n) time and space.
+
+**Proof:** Consider a complete binary tree T with n nodes and height h = Θ(log n). Suppose we call PERSISTENT-TREE-INSERT to insert node z, and z becomes a new leaf at depth h.
+
+When we copy any node v on the insertion path (of length h + 1), we create a copy v'. The copy v' replaces v in the new tree. Now consider v's *sibling* s (the other child of v's parent). In the *original* tree T, s.parent points to v's original parent p. In the *new* tree T', v's parent has also been copied (to p'), so we need s.parent to point to p' (not p). But s is a node in the persistent tree T—we cannot modify it without breaking T. Therefore, we must create a copy s' of s, setting s'.parent = p'.
+
+However, s' is now a newly created node. All of s's children still have their parent pointers pointing to s (the original), not to s'. To fix their parent pointers, we must copy each of s's children too, and in turn each of their children, and so on recursively.
+
+More precisely: at each depth d (0 ≤ d ≤ h − 1), the path contains 1 node (which we copy). Its sibling at depth d has a subtree of size 2^(h−d) − 1 (in a complete binary tree). The sibling's parent pointer must be fixed (requiring a copy), and that copy's children's parent pointers must be fixed, etc. The entire sibling subtree must be copied.
+
+The total number of nodes copied is:
+- h + 1 nodes on the insertion path, plus
+- At each depth d (0 ≤ d ≤ h−1), a sibling subtree of size 2^(h−d) − 1.
+
+Summing the sibling subtrees:
+$$\sum_{d=0}^{h-1} (2^{h-d} - 1) = \sum_{j=1}^{h} (2^j - 1) = (2^{h+1} - 2) - h = \Theta(n)$$
+
+(since 2^{h+1} - 2 = n + (n-1) - 1 = Θ(n) for a complete binary tree).
+
+Therefore, PERSISTENT-TREE-INSERT must copy Ω(n) nodes, requiring Ω(n) time and space. ∎
+
+---
+
+### Part (e): Using Red-Black Trees to Guarantee O(log n) per Insertion
+
+We can guarantee O(log n) time and space per persistent insertion by using **red-black trees** (assuming all keys are distinct).
+
+**Key properties that enable this:**
+
+1. **Height is O(log n):** A red-black tree with n nodes has height h ≤ 2 lg(n + 1), so the insertion path copies O(log n) nodes.
+
+2. **At most O(log n) recolorings, O(1) rotations:** During RB-INSERT-FIXUP, fixup proceeds in three cases:
+ - *Case 1* (uncle is red): Recolor the parent, uncle, and grandparent (O(1) nodes); the current pointer moves up 2 levels. This can repeat O(log n) times but each iteration touches O(1) nodes.
+ - *Cases 2 & 3* (uncle is black): Perform O(1) rotations; the loop terminates after at most one rotation.
+
+3. **Counting nodes that must be copied:** Without parent pointers, we only copy nodes whose attributes (key, left-pointer, right-pointer, or color) change. These are:
+ - All nodes on the root-to-insertion path: O(log n) nodes (their child pointers change).
+ - In Case 1 of fixup: the uncle (off-path) is recolored. We must copy the uncle: O(1) extra copy per Case-1 iteration, for O(log n) total uncle copies.
+ - In Cases 2 & 3: The rotations involve O(1) nodes, all already on or adjacent to the path.
+
+ **Total: O(log n) node copies.**
+
+**Persistent RB-INSERT procedure:**
+- Walk down the tree (root to insertion point), copying each node on the path: O(log n) copies.
+- Attach the new (red) node z.
+- Run a modified RB-INSERT-FIXUP on the new tree, using COPY-NODE whenever a node outside the current path must be changed (e.g., the uncle in Case 1). Each recoloring or rotation produces a new copy rather than modifying in place.
+- The root of the new tree is the copy of the original root (with updated pointers/colors).
+
+Since the total number of copies is O(log n) and each copy takes O(1) time, both the time and space requirements are **O(log n) per insertion**.
+
+---
+
+## Problem 2: Join Operation on Red-Black Trees [35 points]
+
+### Part (a): Maintaining the bh Attribute
+
+**Claim:** RB-INSERT can maintain T.bh (the black-height of the tree, stored at the root) without extra storage in individual nodes and without increasing asymptotic running times.
+
+**Maintaining T.bh during RB-INSERT:** The black-height of a red-black tree only changes when a new root is created with a higher black-height. This happens precisely when RB-INSERT-FIXUP propagates Case 1 all the way up to the root, after which the root (forced to be black) has its color changed from red to black. In this situation, T.bh increases by 1. In all other cases (the loop terminates early via Cases 2/3, or the root was already black), T.bh remains unchanged.
+
+**Implementation:** At the end of RB-INSERT, if the root's color was changed from red to black (line `T.root.color = BLACK` at the end of RB-INSERT), increment T.bh by 1. This requires O(1) extra work and O(1) extra storage (just the T.bh field).
+
+**Determining bh of each node during descent:** We use T.bh at the root and update locally as we descend. The recurrence is:
+
+> bh(x) = bh(x.parent) − (1 if x is black, else 0)
+
+More precisely, as we descend from node `x` to child `c`:
+
+```
+if c.color == BLACK:
+ bh(c) = bh(x) - 1
+else: // c.color == RED
+ bh(c) = bh(x)
+```
+
+This computes bh at each visited node in **O(1) time per node**.
+
+---
+
+### Part (b): Finding Black Node y in T₁ with Largest Key at Black-Height T₂.bh
+
+**Assumption:** T₁.bh ≥ T₂.bh. We want the black node y in T₁ with the *largest key* among all nodes whose black-height equals T₂.bh.
+
+**Algorithm:** Descend rightward from the root of T₁, tracking the current black-height, until we reach the first black node at black-height T₂.bh.
+
+```
+FIND-RIGHT-BLACK-NODE(T₁, b)
+1 x = T₁.root
+2 bh = T₁.bh
+3 while not (x.color == BLACK and bh == b)
+4 x = x.right // always go right (maximizes key)
+5 if x.color == BLACK
+6 bh = bh - 1
+7 return x
+```
+
+**Correctness:** We greedily go right to maximize the key value. The black-height decreases by 1 each time we step into a black node. Since T₁.bh ≥ b, we will encounter every integer value from T₁.bh down to 0 along this path; in particular, we will encounter a black node with bh = b before reaching NIL.
+
+Once we find a black node y with bh = b, any black node with bh = b in T₁ to the right of y would have to be a descendant of y's right subtree. But the right children of y have bh ≤ b − 1 (black right child) or bh = b (red right child, but then its children are black with bh = b − 1). So no black node with bh = b lies strictly to the right of y in the tree order. Hence y has the largest key among all black nodes at black-height b.
+
+**Time:** O(depth of y) = O(log n), since the tree has height O(log n).
+
+---
+
+### Part (c): Replacing Tᵧ with T = Tᵧ ∪ {x} ∪ T₂
+
+Let y be the black node in T₁ found in Part (b), with bh(y) = T₂.bh. Let Tᵧ denote the subtree of T₁ rooted at y. Let p = y.parent in T₁.
+
+**Construction of the new subtree:** Create a new node x (with the join key). Set:
+- x.left = y (root of Tᵧ)
+- x.right = T₂.root
+
+**Replacing Tᵧ:** Update p's child pointer to x:
+- If y was p's left child: set p.left = x
+- If y was p's right child: set p.right = x
+
+(If y was the root of T₁, then x becomes the new root.)
+
+**BST property is maintained:**
+- All keys in Tᵧ ≤ x.key (given, since all keys in T₁ ≤ x.key).
+- All keys in T₂ ≥ x.key (given).
+- Keys in T₁ above y (ancestors of y) are still correctly ordered relative to x and x's descendants, since we only replaced the subtree rooted at y with x's subtree.
+
+This replacement takes **O(1) time**.
+
+---
+
+### Part (d): Color of x and Enforcing Properties 2 and 4
+
+**Color x RED.** Here is why this maintains properties 1, 3, and 5, but may violate properties 2 and 4:
+
+**Property 5 (black-height consistency):** With x colored red:
+- bh(x) must equal both bh(x.left) + [1 if x.left is black] and bh(x.right) + [1 if x.right is black].
+- x.left = y (black node with bh(y) = T₂.bh), so contribution from left: bh(y) + 1 = T₂.bh + 1.
+- x.right = T₂.root (black, since property 2 of T₂), so contribution from right: bh(T₂.root) + 1 = T₂.bh + 1.
+- Both sides equal T₂.bh + 1. ✓ Property 5 is maintained.
+
+**Properties 1 and 3:** Trivially maintained (x is colored red, no leaf/NIL nodes changed). ✓
+
+**Potential violations:**
+- **Property 2** (root is black): If y was the root of T₁ (so T₁ = Tᵧ), then x is the new root and x is red. This violates property 2.
+- **Property 4** (no red node has a red child): If p (x's parent) is red, then we have a red–red violation at (p, x).
+
+**Enforcing properties 2 and 4 in O(log n) time:** Run **RB-INSERT-FIXUP** starting at x. This procedure handles exactly these two violations:
+- It fixes property 4 violations by rotations and recolorings, propagating up the tree at most O(log n) levels.
+- At the end, it forces the root to black (fixing any property 2 violation).
+
+RB-INSERT-FIXUP runs in O(log n) time (it processes O(log n) iterations of its loop, each taking O(1) time).
+
+---
+
+### Part (e): Symmetry When T₁.bh ≤ T₂.bh
+
+**No generality is lost** by assuming T₁.bh ≥ T₂.bh, because the roles of T₁ and T₂ can be swapped symmetrically.
+
+**Symmetric situation (T₁.bh ≤ T₂.bh):**
+
+Find a black node y in **T₂** with the *smallest key* among all nodes in T₂ with black-height T₁.bh. To find such a node, descend **leftward** from the root of T₂ (to minimize key value), stopping at the first black node with bh = T₁.bh:
+
+```
+FIND-LEFT-BLACK-NODE(T₂, b)
+1 x = T₂.root
+2 bh = T₂.bh
+3 while not (x.color == BLACK and bh == b)
+4 x = x.left
+5 if x.color == BLACK
+6 bh = bh - 1
+7 return x
+```
+
+Let Tᵧ be the subtree of T₂ rooted at y. Create new node x with:
+- x.left = T₁.root
+- x.right = y
+
+Replace y with x in T₂ (at y's parent), color x RED. The reasoning for why property 5 holds is identical:
+- bh(T₁.root) = T₁.bh, and T₁.root is black, so left contribution = T₁.bh + 1.
+- bh(y) = T₁.bh, and y is black, so right contribution = T₁.bh + 1. ✓
+
+Then run RB-INSERT-FIXUP from x to restore properties 2 and 4.
+
+---
+
+### Part (f): Running Time of RB-JOIN is O(log n)
+
+Let n be the total number of nodes in T₁ and T₂.
+
+1. **Determine bh attributes:** Both T₁.bh and T₂.bh are stored as attributes; comparing them takes O(1).
+
+2. **Finding node y** (Part (b) or symmetric Part (e)): Descend O(T₁.bh) or O(T₂.bh) levels to find y. Since both black-heights are O(log n), this takes **O(log n)**.
+
+3. **Constructing and replacing Tᵧ** (Part (c)): Creating the new node x and updating one pointer takes **O(1)**.
+
+4. **Coloring x and running RB-INSERT-FIXUP** (Part (d)): RB-INSERT-FIXUP performs at most O(log n) iterations (it moves the "current node" up by 2 levels per Case-1 iteration), each O(1), plus O(1) rotations at the end. Total: **O(log n)**.
+
+**Total running time:** O(log n) + O(1) + O(log n) = **O(log n)**. ∎
+
+---
+
+## Problem 3: Joining and Splitting 2-3-4 Trees [40 points]
+
+### Part (a): Maintaining the Height Attribute
+
+For every node x in a 2-3-4 tree, define x.height as:
+- x.height = 0 if x is a leaf node.
+- x.height = x.child[1].height + 1 if x is an internal node.
+
+(Since all leaves in a 2-3-4 tree are at the same depth, all children of an internal node have the same height, so this is well-defined.)
+
+**Maintenance during Search:** Read-only; no changes required.
+
+**Maintenance during Insert:** During B-tree insertion (with the "split on the way down" variant or the "split when full" variant), the only structural changes are:
+- **Node splits:** When a node x splits into two nodes x' and x'', both new nodes have the same height as x (they're at the same depth). The median key goes up to x's parent p. The parent p gains one more child pointer but its height does not change (since p's children are still at the same depth). We set x'.height = x.height and x''.height = x.height (no change needed for the existing nodes' heights).
+- **New root creation:** When the root splits, a new root r is created with two children (the two halves of the old root). Set r.height = old root's height + 1.
+
+In all cases, height updates require **O(1)** work per node modified, which does not affect the O(log n) asymptotic running time of search or insertion.
+
+---
+
+### Part (b): Implementing the Join Operation
+
+**Given:** 2-3-4 trees T' and T'' and key k such that all keys in T' are less than k and all keys in T'' are greater than k. Let h' = T'.root.height and h'' = T''.root.height.
+
+**Goal:** Return a 2-3-4 tree T = T' ∪ {k} ∪ T'' in O(1 + |h' − h''|) time.
+
+**Algorithm (WLOG assume h' ≥ h''; the case h' < h'' is symmetric):**
+
+**Case h' = h'':** Create a new 2-node (root) r with key k, left child T'.root, right child T''.root, and height h' + 1. Return the tree rooted at r. **O(1) time.**
+
+**Case h' > h'':** We need to attach T'' at the correct height in T'.
+
+1. **Find attachment point:** Walk right in T' from the root, always going to the *rightmost child* of the current node, for exactly (h' − h'') steps. Let y be the node reached; y.height = h''. Let p be y's parent.
+
+2. **Create a new 2-node:** Create a node z with key k, left child y, right child T''.root, and height h'' + 1.
+
+3. **Insert z into p:** In 2-3-4 tree terms, we are inserting the key k (with right subtree T''.root) into node p at the position immediately after y (replacing the pointer to y with a pointer to z, and adding key k to p with the right subtree pointer going to T''.root). This "insertion" adds one key/pointer pair to p.
+
+4. **Fixup:** If p now has 4 keys (overflowed to 5 children), split p and push the median key up to p's parent. Repeat this splitting process upward. Splits can cascade at most up the path from p to the root, which has length h' − h''. Each split takes O(1) time, so the fixup takes **O(h' − h'')** time.
+
+**Total time:** O(h' − h'') for finding y + O(h' − h'') for fixup = **O(1 + |h' − h''|)**.
+
+**Height update:** During splits, each split creates a sibling with the same height as the split node, which is correct. If a new root is created (root split during fixup), its height is set to h' + 1.
+
+---
+
+### Part (c): How Path p Breaks S' and S'' into Trees and Keys
+
+Let T be a 2-3-4 tree with root height h, and let k be a given key (either in T or a query key). Let p be the simple path from the root to the node containing k (or the leaf position where k would be inserted).
+
+**Breaking S' = {keys in T less than k}:**
+
+As we walk down path p, at each node v on the path (say v has keys v.k₁ < v.k₂ < ... < v.kₘ and children v.c₁, v.c₂, ..., v.c_{m+1}), suppose the path continues through child v.c_i. Then:
+- Children v.c₁, ..., v.c_{i−1} are subtrees entirely in S' (their ranges are below v.k_{i−1} < k).
+- Keys v.k₁, ..., v.k_{i−1} are in S' (they are less than k).
+
+Collecting these top-to-bottom along the path, S' is broken into:
+- Trees: T'₀, T'₁, ..., T'ₘ (left-branching subtrees and possibly a leaf, ordered by key value)
+- Separator keys: k'₁, k'₂, ..., k'ₘ (the keys on path nodes that are less than k)
+
+such that all keys in T'ᵢ₋₁ are less than k'ᵢ, and k'ᵢ is less than all keys in T'ᵢ.
+
+**Height relationship between consecutive T'ᵢ:** Since the path visits one node per level (one for each depth 0, 1, ..., h), and the left-branching subtrees at depth d all have height h − d − 1, the heights satisfy:
+
+$$h(T'_0) \geq h(T'_1) \geq \cdots \geq h(T'_m)$$
+
+The heights are **non-increasing** (weakly decreasing). Two consecutive T'ᵢ have the *same* height only if they both branch off from the same node on the path (i.e., the path-node's left-branching subtrees are enumerated left to right at the same depth). Adjacent trees from different depths have strictly decreasing heights (specifically, h(T'ᵢ) decreases by at least 1 each time the depth increases).
+
+**Breaking S'' = {keys in T greater than k}:**
+
+Symmetrically, at each path node v, the right-branching children (v.c_{i+1}, ..., v.c_{m+1}) and keys (v.k_i, ..., v.k_m) contribute to S''. This yields:
+- Trees T''₀, T''₁, ..., T''ₗ (right-branching subtrees, ordered by key value: T''₀ has the smallest keys in S'', T''ₗ has the largest)
+- Separator keys k''₁, ..., k''ₗ
+
+The heights of T''₀, T''₁, ..., T''ₗ are **non-decreasing**:
+$$h(T''_0) \leq h(T''_1) \leq \cdots \leq h(T''_l)$$
+
+because T''₀ (the leftmost subtrees of S'', closest to k) come from the deepest part of the path (small height), and T''ₗ (farthest from k) comes from the shallowest part of the path (large height).
+
+---
+
+### Part (d): Implementing Split in O(log n) Time
+
+**Algorithm:** Given a 2-3-4 tree T with n keys and key k ∈ T:
+
+1. **Traverse path p** from the root to the node containing k: O(log n) time.
+
+2. **Extract the separator trees and keys** for S' and S'' from path p (as described in Part (c)).
+ - Remove key k itself from its node.
+ - S' is represented by (T'₀, k'₁, T'₁, k'₂, ..., T'ₘ)
+ - S'' is represented by (T''₀, k''₁, T''₁, k''₂, ..., T''ₗ)
+
+3. **Assemble T' using join operations from right to left (inside-out):**
+
+ Start with R = T'ₘ. For i = m, m−1, ..., 1:
+ $$R \leftarrow \text{JOIN}(T'_{i-1},\; k'_i,\; R)$$
+
+ The final R is T'.
+
+4. **Assemble T'' using join operations from left to right:**
+
+ Start with R = T''₀. For i = 1, 2, ..., l:
+ $$R \leftarrow \text{JOIN}(R,\; k''_i,\; T''_i)$$
+
+ The final R is T''.
+
+**Running time analysis via telescoping:**
+
+For assembling T': By Part (c), the heights satisfy h(T'₀) ≥ h(T'₁) ≥ ... ≥ h(T'ₘ). When we join T'ᵢ₋₁ with the right portion R (which has accumulated to height approximately h(T'ᵢ₋₁) after previous joins), the cost of the i-th join is:
+
+$$\text{cost}_i = O(1 + |h(T'_{i-1}) - h(R)|)$$
+
+After joining T'ᵢ₋₁ with R, the result has height at most h(T'ᵢ₋₁) + 1. Therefore, for the next join involving T'ᵢ₋₂:
+
+$$|h(T'_{i-2}) - h(\text{new }R)| \leq h(T'_{i-2}) - h(T'_{i-1}) + 1$$
+
+Summing the costs:
+$$\sum_{i=1}^{m} \text{cost}_i = O(m) + O\!\left(\sum_{i=1}^{m} (h(T'_{i-1}) - h(T'_i) + 1)\right)$$
+
+The height differences telescope:
+$$\sum_{i=1}^{m} (h(T'_{i-1}) - h(T'_i)) = h(T'_0) - h(T'_m) \leq h' - 0 = h'$$
+
+and $m \leq 3h'$ (since each of the h' nodes on the path contributes at most 3 left-branching children). Thus:
+
+$$\text{total cost for assembling } T' = O(m) + O(h') = O(h') = O(\log n)$$
+
+The same telescoping argument applies to assembling T'', giving O(log n).
+
+**Total running time of SPLIT:**
+- Path traversal: O(log n)
+- Assembling T': O(log n)
+- Assembling T'': O(log n)
+- **Total: O(log n)** ∎
Sub-Model Usage
| Model | Input | Output | Cache Read | Cost |
|---|---|---|---|---|
| claude-sonnet-4-6 | 8 | 29.4K | 255.0K | $1.14 |
BSCS Bench