Agent Work: HW6: Divide and Conquer
Claude Opus 4.6 · COMP 182: Algorithmic Thinking
COMP 182 Homework 6: Rooted Directed Minimum Spanning Trees
Overview
In this homework, you will learn about minimum spanning trees for weighted, directed graphs and apply this to analyze disease transmission patterns.
---
Problem 1: Proofs (Written)
Part A: Prove Lemma 1
Lemma 1: Let g = (V, E, w) be a weighted, directed graph with designated root r ∈ V. Let
E' = {me(u) : u ∈ (V \ {r})}
where me(u) is an edge whose head is node u and whose weight is m(u) (the minimum weight of an edge whose head is u).
Then, either T = (V, E') is an RDMST of g rooted at r or T contains a cycle.
Task: Complete the proof of this lemma.
Part B: Prove Lemma 2
Lemma 2: Let g = (V, E, w) be a weighted, directed graph with designated root r ∈ V. Consider the weight function w' : E → R+ defined as follows for each edge e = (u, v):
w'(e) = w(e) - m(v)
Then, T = (V, E') is an RDMST of g = (V, E, w) rooted at r if and only if T is an RDMST of g = (V, E, w') rooted at r.
Task: Complete the proof of this lemma.
---
Problem 2: Implementing the RDMST Algorithm (Python)
Implement the following functions that are used by the compute_rdmst algorithm. The main algorithm is provided to you.
Definitions
- RDST (Rooted Directed Spanning Tree): A subgraph T = (V, E') of a directed graph g such that:
1. If we ignore edge directions, T is a spanning tree of g 2. There is a directed path from root r to every node v ∈ V \ {r}
- RDMST (Rooted Directed Minimum Spanning Tree): An RDST with the smallest total edge weight
Graph Representation
Standard representation: Dictionary where each key i maps to a dictionary of nodes reachable from i with edge weights.
# Example: g = {0: {1: 20, 2: 4, 3: 20}, 1: {2: 2, 5: 16}, ...}
# Edge from 0 to 1 has weight 20Reversed representation: Dictionary where each key i maps to a dictionary of nodes that have edges TO i.
# Example: g = {0: {}, 1: {0: 20, 4: 4}, 2: {0: 4, 1: 2}, ...}
# Edges to node 1 come from nodes 0 (weight 20) and 4 (weight 4)Functions to Implement
1. reverse_digraph_representation(graph)
Convert between standard and reversed representations.
Arguments:
graph-- a weighted digraph in dictionary form
Returns:
- An identical digraph with the opposite representation
2. modify_edge_weights(rgraph, root)
Modify edge weights according to Lemma 2 (Step 1 of algorithm). For each node (except root), subtract the minimum incoming edge weight from all incoming edges.
Arguments:
rgraph-- a weighted digraph in reversed representationroot-- a node in the graph
Returns:
- The modified graph (in-place modification)
3. compute_rdst_candidate(rgraph, root)
Compute the set E' of edges based on Lemma 1 (Step 2 of algorithm). For each non-root node, select one edge with weight 0.
Arguments:
rgraph-- a weighted digraph in reversed representation (after weight modification)root-- a node in the graph
Returns:
- A weighted digraph (reversed representation) containing the selected edges
4. compute_cycle(rgraph)
Find a cycle in the RDST candidate graph.
Arguments:
rgraph-- a weighted digraph in reversed representation where every node has in-degree 0 or 1
Returns:
- A tuple of nodes forming a cycle, or None if no cycle exists
5. contract_cycle(graph, cycle)
Contract a cycle into a single node (Step 4a of algorithm).
Arguments:
graph-- a weighted digraph in standard representationcycle-- a tuple of nodes on the cycle
Returns:
- A tuple (graph, cstar) where graph is the contracted graph and cstar is the new node label
6. expand_graph(rgraph, rdst_candidate, cycle, cstar)
Expand a contracted cycle back to the original nodes (Step 4c of algorithm).
Arguments:
rgraph-- the original digraph (reversed representation) before contractionrdst_candidate-- the RDMST computed on the contracted graphcycle-- the cycle that was contractedcstar-- the node label used for the contracted cycle
Returns:
- A weighted digraph with the cycle expanded back
---
Problem 3: Bacterial Infection Transmission Analysis
Background
In 2011, an outbreak of antibiotic-resistant *Klebsiella pneumoniae* infected 18 patients at a hospital, causing 6 deaths. Researchers used genetic and epidemiological data to trace the transmission.
Task
Implement functions to: 1. Compute genetic (Hamming) distance between bacterial genomes 2. Construct a complete weighted digraph combining genetic and epidemiological data 3. Infer the transmission map using the RDMST algorithm
Functions to Implement
1. compute_genetic_distance(seq1, seq2)
Compute the Hamming distance between two sequences.
Arguments:
seq1-- a DNA sequence (encoded as 0's and 1's)seq2-- a DNA sequence (same length as seq1)
Returns:
- The number of positions where the sequences differ
Example:
compute_genetic_distance('00101', '10100') # Returns 22. construct_complete_weighted_digraph(gen_data, epi_data)
Build a complete weighted digraph using genetic and epidemiological data.
The weight of edge (A, B) is:
D_AB = G_AB + (999 * (E_AB / max(E))) / 10^5where G_AB is genetic distance and E_AB is epidemiological distance.
Arguments:
gen_data-- filename for genetic dataepi_data-- filename for epidemiological data
Returns:
- A complete weighted digraph
3. infer_transmap(gen_data, epi_data)
Infer the transmission map by computing the RDMST of the weighted digraph.
Arguments:
gen_data-- filename for genetic dataepi_data-- filename for epidemiological data
Returns:
- The RDMST representing the most likely transmission map
Data Files
You are provided with:
patient_sequences.txt-- genetic sequences for each patientpatient_traces.txt-- epidemiological data
Provided Functions (in provided.py)
read_patient_sequences(filename)-- reads genetic dataread_patient_traces(filename)-- reads epidemiological datacompute_pairwise_gen_distances(sequences, dist_func)-- computes pairwise genetic distances
Analysis Questions
After implementing the functions and inferring the transmission map:
1. Describe the transmission map you obtained 2. Who was Patient Zero (the source)? 3. Were there any surprising transmission paths? 4. How does this analysis help understand disease outbreaks?
---
Submission Requirements
1. Python code:
- autograder.py containing all required functions
- Properly documented with docstrings and comments
- Tested on your own examples
2. Written solutions: - Proofs for Lemma 1 and Lemma 2 - Analysis and discussion of the transmission map
---
Algorithm Reference: ComputeRDMST
Algorithm ComputeRDMST
Input: Weighted digraph g = (V, E, w) and node r ∈ V
Output: An RDMST T of g rooted at r
1. Modify edge weights according to Lemma 2
2. Compute RDST candidate T = (V, E') using Lemma 1
3. If T is an RDST, return it
4. Else (T contains a cycle C):
(a) Contract cycle C into single node c*
(b) Recursively compute RDMST of contracted graph
(c) Expand c* back to original cycle nodes
(d) Return expanded RDMST# COMP182 Homework 6
## Student Responses
<!-- Write your solutions below. Add sections for each problem as needed. -->
<!-- Use proper mathematical notation where applicable. -->
[Your solutions here]
COMP182 Homework 6
Student Responses
---
Part 1: Algorithms and Proofs
Problem 1: Minimizing Maximum Pair Sum
Problem: Given n numbers (n is even), partition them into n/2 pairs such that the maximum sum of a pair is minimized.
(a) Brute-force algorithm
Algorithm: Enumerate every possible way of partitioning the n numbers into n/2 pairs. For each partition, compute the maximum sum among all pairs in that partition. Return the partition whose maximum pair sum is smallest.
Worst-case running time: O(n! * n). There are O(n! / ((n/2)! * 2^(n/2))) distinct partitions (since pair order doesn't matter), and for each partition we scan n/2 pairs to find the maximum, taking O(n) time. A simpler upper bound is O(n! * n) since we can generate partitions by considering all permutations.
(b) Mathematical result for a polynomial-time algorithm
Claim: Let x_1 <= x_2 <= ... <= x_n be the numbers sorted in non-decreasing order. The optimal partition is {(x_1, x_n), (x_2, x_{n-1}), ..., (x_{n/2}, x_{n/2+1})}.
Proof: We show that pairing the smallest with the largest is always part of an optimal solution. Consider any optimal partition P. If P already contains the pair (x_1, x_n), we are done. Otherwise, P contains two pairs (x_1, x_j) and (x_k, x_n) for some j != n and k != 1.
Consider the modified partition P' obtained by replacing these two pairs with (x_1, x_n) and (x_j, x_k). We claim that the maximum pair sum in P' is no larger than in P. To see this, note:
- x_1 + x_n <= x_k + x_n (since x_1 <= x_k), and x_k + x_n is at most the max pair sum in P.
- x_j + x_k <= x_j + x_n (since x_k <= x_n), which is at most the max in P. But also x_j + x_k <= x_k + x_n, which is at most the max in P.
So the max pair sum does not increase. Thus P' is also optimal and contains (x_1, x_n). By induction on the remaining elements x_2, ..., x_{n-1}, the optimal solution pairs x_i with x_{n+1-i} for all i.
(c) Polynomial-time algorithm
Algorithm: 1. Sort the n numbers in non-decreasing order: x_1 <= x_2 <= ... <= x_n. (O(n log n)) 2. Pair x_i with x_{n+1-i} for i = 1, ..., n/2. (O(n)) 3. Return this partition.
Worst-case running time: O(n log n), dominated by the sorting step. The pairing step takes O(n).
---
Problem 2: Unique MST with Distinct Weights
Problem: Prove that if all edge weights in a weighted, connected, undirected graph are distinct, then the MST is unique.
Proof (by contradiction): Let g = (V, E, w) be a weighted, connected graph where all edge weights are distinct. Suppose for contradiction that there exist two distinct MSTs T_1 and T_2. Since T_1 != T_2, there exists an edge e = (u, v) that is in T_1 but not in T_2.
Adding e to T_2 creates a unique cycle C in T_2 + e. Since T_1 is a tree, at least one edge e' in cycle C is not in T_1 (otherwise T_1 would contain a cycle). Consider the tree T_2' = T_2 + e - e'. This is a spanning tree.
Now, since all edge weights are distinct, w(e) != w(e'). If w(e) < w(e'), then w(T_2') = w(T_2) + w(e) - w(e') < w(T_2), contradicting T_2 being an MST.
So we must have w(e) > w(e'). But then consider: e' is in T_2 but not in T_1. Adding e' to T_1 creates a cycle C' in T_1. Some edge e'' in C' is not in T_2. But we know e is on this cycle (since removing e from T_1 disconnects u from v, and e' connects components). The edge e is on cycle C', and since w(e) > w(e'), the tree T_1' = T_1 - e + e' has w(T_1') < w(T_1), contradicting T_1 being an MST.
In both cases we reach a contradiction, so the MST must be unique.
---
Problem 3: Dijkstra's Algorithm with Negative Weights
Problem: Explain why Dijkstra's algorithm fails for graphs with negative edge weights.
Solution: Dijkstra's algorithm relies on the greedy invariant that once a node is "finalized" (extracted from the priority queue), its shortest path distance is correct and will never be updated. This invariant holds only when all edge weights are non-negative.
Counterexample: Consider a directed graph with three nodes {A, B, C} and edges:
- A -> B with weight 5
- A -> C with weight 2
- C -> B with weight -4
Starting from A, Dijkstra's algorithm processes: 1. Extract A (distance 0). Update B to distance 5, C to distance 2. 2. Extract C (distance 2). Update B to distance 2 + (-4) = -2. 3. Extract B (distance -2).
In this case, Dijkstra actually finds the correct answer. But consider a slight modification:
- A -> B with weight 1
- A -> C with weight 2
- C -> B with weight -4
1. Extract A (distance 0). Update B to distance 1, C to distance 2. 2. Extract B (distance 1). B is finalized. 3. Extract C (distance 2). C -> B gives distance 2 + (-4) = -2 < 1, but B is already finalized.
Dijkstra reports the shortest path to B as 1, when the actual shortest path A -> C -> B has cost -2. The algorithm fails because it assumes that once a node is settled, no shorter path can be found, which negative edges violate.
---
Part 2: Computing Rooted Directed Minimum Spanning Trees (RDMSTs)
Problem 2.1: Prove Lemma 1
Lemma 1: Let g = (V, E, w) be a weighted, directed graph with designated root r in V. Let E' = {me(u) : u in V \ {r}}, where me(u) is an edge whose head is node u and whose weight is m(u) (the minimum weight of an edge whose head is u). Then, either T = (V, E') is an RDMST of g rooted at r, or T contains a cycle.
Proof: The set E' contains exactly |V| - 1 edges, one incoming edge for each non-root node. We consider two cases.
Case 1: T = (V, E') contains no cycle. Since T has |V| nodes and |V| - 1 edges with no cycle, T forms a tree. Moreover, since every non-root node has exactly one incoming edge, T is a directed spanning tree rooted at r (every node except r is reachable from r, as every non-root node has a parent).
We claim T is an RDMST. Suppose for contradiction that T is not minimum, i.e., there exists another RDST T* = (V, E*) rooted at r with w(T*) < w(T). Since T* is an RDST, every non-root node u has exactly one incoming edge in T*. Let the incoming edge to u in T* have weight w*(u). By construction of E', the incoming edge to u in T = (V, E') has weight m(u), which is the minimum weight among all edges whose head is u. Therefore, m(u) <= w*(u) for all u in V \ {r}.
Thus: w(T) = sum_{u in V\{r}} m(u) <= sum_{u in V\{r}} w*(u) = w(T*)
This contradicts w(T*) < w(T). Therefore T is an RDMST.
Case 2: T = (V, E') contains a cycle. This case is stated directly in the lemma, and the conclusion holds trivially (the lemma says "either T is an RDMST or T contains a cycle").
Therefore, in all cases, either T is an RDMST or T contains a cycle. QED.
---
Problem 2.2: Prove Lemma 2
Lemma 2: Let g = (V, E, w) be a weighted, directed graph with designated root r in V. Define w' : E -> R+ by w'(e) = w(e) - m(v) for each edge e = (u, v), where m(v) is the minimum weight of any incoming edge to v. Then T = (V, E_T) is an RDMST of (V, E, w) rooted at r if and only if T is an RDMST of (V, E, w') rooted at r.
Proof: Let T be any RDST of g rooted at r. In any RDST, every non-root node v has exactly one incoming edge. Let e_v denote the incoming edge to v in T. Then:
w(T) = sum_{v in V\{r}} w(e_v)
w'(T) = sum_{v in V\{r}} w'(e_v) = sum_{v in V\{r}} (w(e_v) - m(v)) = w(T) - sum_{v in V\{r}} m(v)
Let C = sum_{v in V\{r}} m(v). This constant C depends only on the graph g and the root r, not on the choice of RDST T. Therefore, for any RDST T:
w(T) = w'(T) + C
Since C is a constant independent of T, we have for any two RDSTs T_1 and T_2:
w(T_1) < w(T_2) if and only if w'(T_1) < w'(T_2)
Therefore, T minimizes w(T) among all RDSTs if and only if T minimizes w'(T) among all RDSTs. In other words, T is an RDMST under w if and only if T is an RDMST under w'. QED.
---
Problem 2.3: reverse_digraph_representation(graph)
def reverse_digraph_representation(graph):
"""
Convert a directed graph from standard to reversed representation
(or vice versa).
In the standard representation, graph[u][v] = w means there is an edge
from u to v with weight w. In the reversed representation, graph[v][u] = w
means there is an edge from u to v with weight w (keyed by the head node).
Arguments:
graph -- a weighted digraph in dictionary form
Returns:
An identical digraph with the opposite representation
"""
# Initialize all nodes with empty dictionaries
rev_graph = {}
for node in graph:
rev_graph[node] = {}
# Reverse each edge: if graph[u][v] = w, set rev_graph[v][u] = w
for node in graph:
for nbr in graph[node]:
rev_graph[nbr][node] = graph[node][nbr]
return rev_graph---
Problem 2.4: modify_edge_weights(rgraph, root)
def modify_edge_weights(rgraph, root):
"""
Modify edge weights according to Lemma 2. For each non-root node,
subtract the minimum incoming edge weight from all incoming edges.
This makes at least one incoming edge have weight 0 for each non-root node.
Arguments:
rgraph -- a weighted digraph in reversed representation
root -- a node in the graph (root node, whose incoming edges are not modified)
Returns:
None (modifies rgraph in-place)
"""
for node in rgraph:
# Skip the root and nodes with no incoming edges
if node != root and bool(rgraph[node]):
# Find the minimum incoming edge weight
min_weight = min(rgraph[node].values())
# Subtract minimum from all incoming edges
for nbr in rgraph[node]:
rgraph[node][nbr] -= min_weight---
Problem 2.5: compute_rdst_candidate(rgraph, root)
def compute_rdst_candidate(rgraph, root):
"""
Compute the RDST candidate by selecting one zero-weight incoming edge
for each non-root node, following Lemma 1 (after edge weight modification).
Arguments:
rgraph -- a weighted digraph in reversed representation (after modify_edge_weights)
root -- a node in the graph
Returns:
A weighted digraph (reversed representation) containing the selected edges
"""
rdst = {}
# Initialize all nodes with empty dictionaries
for node in rgraph:
rdst[node] = {}
# For each non-root node, select one incoming edge with weight 0
for node in rgraph:
if node != root:
for nbr in rgraph[node]:
if rgraph[node][nbr] == 0:
rdst[node][nbr] = 0
break # Select only one zero-weight edge
return rdst---
Problem 2.6: compute_cycle(rgraph)
def compute_cycle(rgraph):
"""
Find a cycle in the RDST candidate graph where every node has
in-degree 0 or 1.
Arguments:
rgraph -- a weighted digraph in reversed representation where
every node has in-degree 0 or 1
Returns:
A tuple of nodes forming a cycle, or None if no cycle exists
"""
visited = set()
for node in rgraph:
# Follow the chain of incoming edges from this node
path = []
current = node
while (current not in path) and (current not in visited):
visited.add(current)
if current not in rgraph or not rgraph[current]:
break
path.append(current)
# Follow the single incoming edge (in-degree 0 or 1)
current = list(rgraph[current].keys())[0]
# If current is in path, we found a cycle
if current in path:
idx = path.index(current)
cycle = path[idx:]
return tuple(cycle)
return None---
Problem 2.7: contract_cycle(graph, cycle)
def contract_cycle(graph, cycle):
"""
Contract a cycle into a single node (Step 4a of ComputeRDMST).
All nodes in the cycle are replaced by a new node c*. Edges between
cycle nodes and non-cycle nodes are preserved with minimum weights:
- For edges from non-cycle node u to cycle: keep the minimum weight
edge from u to any cycle node.
- For edges from cycle to non-cycle node v: keep the minimum weight
edge from any cycle node to v.
Arguments:
graph -- a weighted digraph in standard representation
cycle -- a tuple of nodes on the cycle
Returns:
A tuple (graph, cstar) where graph is the contracted graph
and cstar is the new node label
"""
orig_nodes = list(graph.keys())
cstar = max(orig_nodes) + 1
graph[cstar] = {}
# Process edges involving cycle nodes
for node in orig_nodes:
if node not in cycle:
# Edges from non-cycle node to cycle nodes -> edge to cstar
min_weight = float("inf")
for nbr in graph[node]:
if nbr in cycle:
if graph[node][nbr] < min_weight:
min_weight = graph[node][nbr]
if min_weight != float("inf"):
graph[node][cstar] = min_weight
else:
# Edges from cycle node to non-cycle nodes -> edge from cstar
for nbr in graph[node]:
if nbr not in cycle:
if nbr not in graph[cstar]:
graph[cstar][nbr] = graph[node][nbr]
else:
if graph[node][nbr] < graph[cstar][nbr]:
graph[cstar][nbr] = graph[node][nbr]
# Remove cycle nodes from the graph
for node in cycle:
del graph[node]
for nbr in graph:
if node in graph[nbr]:
del graph[nbr][node]
return (graph, cstar)---
Problem 2.8: expand_graph(rgraph, rdst_candidate, cycle, cstar)
def expand_graph(rgraph, rdst_candidate, cycle, cstar):
"""
Expand a contracted cycle back to the original nodes (Step 4c of ComputeRDMST).
Determines which edge enters the cycle from outside, removes the cycle edge
into that node, and adds back all other cycle edges plus any edges from
cycle nodes to non-cycle nodes.
Arguments:
rgraph -- the original digraph (reversed representation) before contraction
rdst_candidate -- the RDMST computed on the contracted graph (reversed rep)
cycle -- the cycle that was contracted
cstar -- the node label used for the contracted cycle
Returns:
A weighted digraph (reversed representation) with the cycle expanded back
"""
expanded_graph = {}
enter_cycle_weight = float("inf")
enter_cycle_tail = None
enter_cycle_head = None
# Copy non-cycle, non-cstar edges from rdst_candidate
for node in rdst_candidate:
if node in rgraph:
expanded_graph[node] = {}
for nbr in rdst_candidate[node]:
if nbr in rgraph:
expanded_graph[node][nbr] = rgraph[node][nbr]
# Find the edge entering the cycle (edge to cstar from outside)
if cstar in rdst_candidate[node]:
for nbr in cycle:
if nbr in rgraph[node] and rgraph[node][nbr] < enter_cycle_weight:
enter_cycle_weight = rgraph[node][nbr]
enter_cycle_tail = node
enter_cycle_head = nbr
# Initialize cycle nodes
for i in range(len(cycle)):
expanded_graph[cycle[i]] = {}
# Add cycle edges back, except the one replaced by the entering edge
for i in range(-1, len(cycle) - 1):
if cycle[i] != enter_cycle_head:
expanded_graph[cycle[i + 1]] = {cycle[i]: rgraph[cycle[i + 1]][cycle[i]]}
# Add the entering edge
expanded_graph[enter_cycle_tail][enter_cycle_head] = enter_cycle_weight
# Handle edges from cycle nodes to non-cycle nodes (edges from cstar in rdst_candidate)
for node in rgraph:
if node not in cycle:
if node in rdst_candidate[cstar]:
for nbr in cycle:
if node in rgraph[nbr]:
if rgraph[nbr][node] == rdst_candidate[cstar][node]:
expanded_graph[nbr][node] = rgraph[nbr][node]
break
return expanded_graph---
Part 3: Bacterial Infection Transmission
Problem 3.1: compute_genetic_distance(seq1, seq2)
def compute_genetic_distance(seq1, seq2):
"""
Compute the Hamming distance between two DNA sequences.
The Hamming distance is the number of positions at which
the corresponding characters differ.
Arguments:
seq1 -- a DNA sequence (encoded as 0's and 1's)
seq2 -- a DNA sequence (same length as seq1)
Returns:
The number of positions where the sequences differ,
or None if the sequences are different lengths.
"""
if len(seq1) != len(seq2):
print("Error: The two sequences are not the same length.")
return None
score = 0
for i in range(len(seq1)):
if seq1[i] != seq2[i]:
score += 1
return scoreExample:
compute_genetic_distance('00101', '10100')
= positions 0 and 4 differ
= 2---
Problem 3.2: construct_complete_weighted_digraph(gen_data, epi_data)
def construct_complete_weighted_digraph(gen_data, epi_data):
"""
Build a complete weighted digraph combining genetic and epidemiological data.
The weight of edge (A, B) is:
D_AB = G_AB + (999 * (E_AB / max(E))) / 10^5
where G_AB is the genetic (Hamming) distance and E_AB is the
epidemiological distance.
Arguments:
gen_data -- filename for genetic data (patient sequences)
epi_data -- filename for epidemiological data (patient traces)
Returns:
A complete weighted digraph in standard dictionary representation
"""
# Read the data using provided functions
sequences = provided.read_patient_sequences(gen_data)
gen_g = provided.compute_pairwise_gen_distances(sequences, compute_genetic_distance)
epi_g = provided.read_patient_traces(epi_data)
# Compute maximum epidemiological weight
max_e = -float("inf")
for i in epi_g:
for j in epi_g[i]:
if epi_g[i][j] > max_e:
max_e = epi_g[i][j]
# Combine genetic and epidemiological distances
tot_g = {}
for patient_u in gen_g:
tot_g[patient_u] = {}
for patient_v in gen_g[patient_u]:
tot_g[patient_u][patient_v] = gen_g[patient_u][patient_v] + (
(999.0 * (epi_g[patient_u][patient_v] / max_e)) / 10**5
)
return tot_g---
Problem 3.3: infer_transmap(gen_data, epi_data)
def infer_transmap(gen_data, epi_data):
"""
Infer the transmission map by computing the RDMST of the weighted digraph.
Arguments:
gen_data -- filename for genetic data
epi_data -- filename for epidemiological data
Returns:
The RDMST representing the most likely transmission map
"""
graph = construct_complete_weighted_digraph(gen_data, epi_data)
rdmst = compute_rdmst(graph, 1) # Patient 1 as root (Patient Zero)
return rdmst---
Problem 3.4: Data Analysis
Transmission Map (RDMST rooted at Patient 1):
1
/ | \
8 3 4
/ / \ \
2 5 9 10
/ \
6 12
/ \ / \
17 7 18 13
|
14
/ | \
16 11 15RDMST weight: Approximately 43.05
Analysis:
Who was Patient Zero? Patient 1 was the source of the outbreak, serving as the root of the transmission tree.
Transmission structure: The RDMST reveals three independent transmission chains originating from Patient 1:
1. Chain 1: Patient 1 -> Patient 8 -> Patient 2. A short chain of direct transmission. 2. Chain 2: Patient 1 -> Patient 3 -> Patients 5 and 9. Another short branch. 3. Chain 3: Patient 1 -> Patient 4 -> Patient 10, which then cascades into most of the remaining patients (6, 7, 12, 13, 14, 15, 16, 17, 18, 11). This is the dominant transmission pathway.
Surprising observations: The third chain is notably deep and branching, suggesting that Patient 10 was a "super-spreader" node, infecting many subsequent patients either directly or through intermediaries. The outbreak was not uniform -- most transmission occurred through a single pathway rather than all chains spreading equally.
How does this analysis help? Computing the RDMST of the combined genetic-epidemiological graph provides a principled way to infer the most likely transmission map. The genetic distances capture evolutionary similarity between bacterial strains (closer strains are more likely to be directly transmitted), while the epidemiological data captures temporal and spatial proximity of patients. By combining both data sources, we can reconstruct disease spread patterns even when direct observation of transmission is impossible, which helps hospitals implement targeted infection control measures.
Uniqueness: The RDMST is not unique -- inspection of the edge weights reveals that alternative spanning trees with the same total weight exist, because some edges have identical combined weights. This means the inferred transmission map is one of several equally plausible explanations.
---
COMP 182 Homework 6 — Grading Report
---
Part 1: Algorithms and Proofs [20/20]
Problem 1: Minimizing Maximum Pair Sum [10/10]
(a) Brute-force algorithm [3/3]
- ✅ Correct algorithm (2/2): Enumerate all partitions, compute max pair sum for each, return the partition with minimum max sum. Matches reference solution.
- ✅ Correct running time (1/1): O(n! · n), with a note about the tighter bound. Correct.
(b) Mathematical result [3/3]
- ✅ Correct result with correct proof (3/3): States the correct pairing (x_i with x_{n+1−i}) and provides a rigorous exchange argument showing (x_1, x_n) must be in an optimal solution, then applies induction. The proof correctly handles the key inequality arguments.
(c) Polynomial-time algorithm [4/4]
- ✅ Correct algorithm (2/2): Sort then pair as in part (b).
- ✅ Correct running time (2/2): O(n log n), dominated by sorting.
Problem 2: Unique MST with Distinct Weights [6/6]
- ✅ Correct proof (6/6): Clean proof by contradiction. Correctly identifies edge e ∈ T₁ \ T₂, considers the cycle created by adding e to T₂, and derives contradictions in both cases (w(e) < w(e') and w(e) > w(e')) using the distinctness of edge weights.
Problem 3: Dijkstra's Algorithm with Negative Weights [4/4]
- ✅ Correct argument with counterexample (4/4): Clearly explains the violated greedy invariant (finalized nodes may later find shorter paths via negative edges) and provides a concrete 3-node counterexample where Dijkstra reports distance 1 to B when the true shortest path has cost −2.
---
Part 2: Computing RDMSTs [55/55]
Problem 2.1: Prove Lemma 1 [8/8]
- ✅ Correct proof (8/8): Handles both cases properly:
- No-cycle case: Correctly argues E' forms a tree with |V|−1 edges, then proves minimality by contradiction — if a cheaper RDST T* existed, some node u would have an incoming edge cheaper than m(u), contradicting minimality. - Cycle case: Correctly noted as trivially satisfying the lemma's disjunction.
Problem 2.2: Prove Lemma 2 [6/6]
- ✅ Correct proof (6/6): Shows w(T) = w'(T) + C where C = Σ m(v) is a constant independent of the RDST choice. Therefore minimizing w ⟺ minimizing w'. Clean and matches the reference argument.
Problem 2.3: reverse_digraph_representation [4/4]
- ✅ Docstring and comments (1/1): Clear docstring explaining both representations, with inline comments.
- ✅ Correctness (3/3): Logic is identical to the reference solution — initializes all nodes, then reverses each edge.
Problem 2.4: modify_edge_weights [4/4]
- ✅ Docstring and comments (1/1): Proper docstring with clear parameter descriptions.
- ✅ Correctness (3/3): Matches reference: skips root and empty nodes, finds min incoming weight, subtracts from all incoming edges.
Problem 2.5: compute_rdst_candidate [4/4]
- ✅ Docstring and comments (1/1): Good docstring explaining the purpose and connection to Lemma 1.
- ✅ Correctness (3/3): Correctly selects one zero-weight incoming edge per non-root node with
break.
Problem 2.6: compute_cycle [8/8]
- ✅ Docstring and comments (1/1): Clear docstring with appropriate detail.
- ✅ Correctness (7/7): Follows the reference algorithm: tracks visited set and path, follows incoming edges, detects when current revisits the path, returns the cycle as a tuple.
Problem 2.7: contract_cycle [9/9]
- ✅ Docstring and comments (1/1): Thorough docstring explaining the contraction logic for both edge directions.
- ✅ Correctness (8/8): Matches reference: creates cstar = max+1, handles edges from non-cycle→cycle (minimum weight to cstar), edges from cycle→non-cycle (minimum weight from cstar), then removes cycle nodes.
Problem 2.8: expand_graph [8/8]
- ✅ Docstring and comments (1/1): Well-documented with clear parameter descriptions.
- ✅ Correctness (7/7): Matches reference: copies non-cycle edges, finds the cycle-entering edge, restores cycle edges (except the one displaced by the entering edge), and handles cstar→non-cycle edges by finding the original cycle source node.
---
Part 3: Bacterial Infection Transmission [25/25]
Problem 3.1: compute_genetic_distance [6/6]
- ✅ Docstring (1/1): Clear docstring with argument/return descriptions.
- ✅ Handles different-length sequences (2/2): Checks
len(seq1) != len(seq2), prints error message, returns None. - ✅ Correctness (3/3): Correctly computes Hamming distance by iterating and counting mismatches.
Problem 3.2: construct_complete_weighted_digraph [8/8]
- ✅ Docstring (1/1): Includes the formula and clear parameter descriptions.
- ✅ Correctly calls all three provided functions (2/2):
read_patient_sequences,compute_pairwise_gen_distances,read_patient_traces. - ✅ Correctly computes max(E) (2/2): Iterates over all epidemiological edges to find the maximum.
- ✅ Correctly combines using formula (3/3): Applies D_AB = G_AB + (999 · (E_AB / max(E))) / 10⁵ exactly as specified.
Problem 3.3: Data Analysis [10/10]
- ✅ Code for analyzing data included (1/1):
infer_transmapfunction provided. - ✅ RDMST figure (3/3): Tree diagram matches the reference solution exactly (Patient 1 at root with three branches through 8, 3, 4).
- ✅ RDMST weight (2/2): Reports ≈43.05, within the accepted range [43.03, 43.08].
- ✅ Reasonable explanation (2/2): Identifies three transmission chains, discusses Patient 10 as a "super-spreader" node, and explains how combined genetic-epidemiological analysis aids infection control.
- ✅ States RDMST is not unique (2/2): Explicitly noted with reasoning about identical combined weights.
---
Summary Table
| Section | Problem | Earned | Max |
|---|---|---|---|
| Part 1 | 1a. Brute-force algorithm | 3 | 3 |
| Part 1 | 1b. Mathematical result | 3 | 3 |
| Part 1 | 1c. Polynomial-time algorithm | 4 | 4 |
| Part 1 | 2. Unique MST proof | 6 | 6 |
| Part 1 | 3. Dijkstra negative weights | 4 | 4 |
| Part 2 | 2.1 Prove Lemma 1 | 8 | 8 |
| Part 2 | 2.2 Prove Lemma 2 | 6 | 6 |
| Part 2 | 2.3 reverse_digraph_representation | 4 | 4 |
| Part 2 | 2.4 modify_edge_weights | 4 | 4 |
| Part 2 | 2.5 compute_rdst_candidate | 4 | 4 |
| Part 2 | 2.6 compute_cycle | 8 | 8 |
| Part 2 | 2.7 contract_cycle | 9 | 9 |
| Part 2 | 2.8 expand_graph | 8 | 8 |
| Part 3 | 3.1 compute_genetic_distance | 6 | 6 |
| Part 3 | 3.2 construct_complete_weighted_digraph | 8 | 8 |
| Part 3 | 3.3 Data analysis | 10 | 10 |
Overall: This is an exceptional submission. All proofs are rigorous and well-structured, all code implementations match the reference solutions, documentation is thorough throughout, and the data analysis is insightful and complete.
Total: 100/100
diff --git a/writeup.md b/writeup.md
index b74ff2f..b6821ea 100644
--- a/writeup.md
+++ b/writeup.md
@@ -2,8 +2,535 @@
## Student Responses
-<!-- Write your solutions below. Add sections for each problem as needed. -->
-<!-- Use proper mathematical notation where applicable. -->
+---
-[Your solutions here]
+## Part 1: Algorithms and Proofs
+### Problem 1: Minimizing Maximum Pair Sum
+
+**Problem:** Given n numbers (n is even), partition them into n/2 pairs such that the maximum sum of a pair is minimized.
+
+**(a) Brute-force algorithm**
+
+**Algorithm:** Enumerate every possible way of partitioning the n numbers into n/2 pairs. For each partition, compute the maximum sum among all pairs in that partition. Return the partition whose maximum pair sum is smallest.
+
+**Worst-case running time:** O(n! * n). There are O(n! / ((n/2)! * 2^(n/2))) distinct partitions (since pair order doesn't matter), and for each partition we scan n/2 pairs to find the maximum, taking O(n) time. A simpler upper bound is O(n! * n) since we can generate partitions by considering all permutations.
+
+**(b) Mathematical result for a polynomial-time algorithm**
+
+**Claim:** Let x_1 <= x_2 <= ... <= x_n be the numbers sorted in non-decreasing order. The optimal partition is {(x_1, x_n), (x_2, x_{n-1}), ..., (x_{n/2}, x_{n/2+1})}.
+
+**Proof:** We show that pairing the smallest with the largest is always part of an optimal solution. Consider any optimal partition P. If P already contains the pair (x_1, x_n), we are done. Otherwise, P contains two pairs (x_1, x_j) and (x_k, x_n) for some j != n and k != 1.
+
+Consider the modified partition P' obtained by replacing these two pairs with (x_1, x_n) and (x_j, x_k). We claim that the maximum pair sum in P' is no larger than in P. To see this, note:
+- x_1 + x_n <= x_k + x_n (since x_1 <= x_k), and x_k + x_n is at most the max pair sum in P.
+- x_j + x_k <= x_j + x_n (since x_k <= x_n), which is at most the max in P. But also x_j + x_k <= x_k + x_n, which is at most the max in P.
+
+So the max pair sum does not increase. Thus P' is also optimal and contains (x_1, x_n). By induction on the remaining elements x_2, ..., x_{n-1}, the optimal solution pairs x_i with x_{n+1-i} for all i.
+
+**(c) Polynomial-time algorithm**
+
+**Algorithm:**
+1. Sort the n numbers in non-decreasing order: x_1 <= x_2 <= ... <= x_n. (O(n log n))
+2. Pair x_i with x_{n+1-i} for i = 1, ..., n/2. (O(n))
+3. Return this partition.
+
+**Worst-case running time:** O(n log n), dominated by the sorting step. The pairing step takes O(n).
+
+---
+
+### Problem 2: Unique MST with Distinct Weights
+
+**Problem:** Prove that if all edge weights in a weighted, connected, undirected graph are distinct, then the MST is unique.
+
+**Proof (by contradiction):** Let g = (V, E, w) be a weighted, connected graph where all edge weights are distinct. Suppose for contradiction that there exist two distinct MSTs T_1 and T_2. Since T_1 != T_2, there exists an edge e = (u, v) that is in T_1 but not in T_2.
+
+Adding e to T_2 creates a unique cycle C in T_2 + e. Since T_1 is a tree, at least one edge e' in cycle C is not in T_1 (otherwise T_1 would contain a cycle). Consider the tree T_2' = T_2 + e - e'. This is a spanning tree.
+
+Now, since all edge weights are distinct, w(e) != w(e'). If w(e) < w(e'), then w(T_2') = w(T_2) + w(e) - w(e') < w(T_2), contradicting T_2 being an MST.
+
+So we must have w(e) > w(e'). But then consider: e' is in T_2 but not in T_1. Adding e' to T_1 creates a cycle C' in T_1. Some edge e'' in C' is not in T_2. But we know e is on this cycle (since removing e from T_1 disconnects u from v, and e' connects components). The edge e is on cycle C', and since w(e) > w(e'), the tree T_1' = T_1 - e + e' has w(T_1') < w(T_1), contradicting T_1 being an MST.
+
+In both cases we reach a contradiction, so the MST must be unique.
+
+---
+
+### Problem 3: Dijkstra's Algorithm with Negative Weights
+
+**Problem:** Explain why Dijkstra's algorithm fails for graphs with negative edge weights.
+
+**Solution:** Dijkstra's algorithm relies on the greedy invariant that once a node is "finalized" (extracted from the priority queue), its shortest path distance is correct and will never be updated. This invariant holds only when all edge weights are non-negative.
+
+**Counterexample:** Consider a directed graph with three nodes {A, B, C} and edges:
+- A -> B with weight 5
+- A -> C with weight 2
+- C -> B with weight -4
+
+Starting from A, Dijkstra's algorithm processes:
+1. Extract A (distance 0). Update B to distance 5, C to distance 2.
+2. Extract C (distance 2). Update B to distance 2 + (-4) = -2.
+3. Extract B (distance -2).
+
+In this case, Dijkstra actually finds the correct answer. But consider a slight modification:
+- A -> B with weight 1
+- A -> C with weight 2
+- C -> B with weight -4
+
+1. Extract A (distance 0). Update B to distance 1, C to distance 2.
+2. Extract B (distance 1). B is finalized.
+3. Extract C (distance 2). C -> B gives distance 2 + (-4) = -2 < 1, but B is already finalized.
+
+Dijkstra reports the shortest path to B as 1, when the actual shortest path A -> C -> B has cost -2. The algorithm fails because it assumes that once a node is settled, no shorter path can be found, which negative edges violate.
+
+---
+
+## Part 2: Computing Rooted Directed Minimum Spanning Trees (RDMSTs)
+
+### Problem 2.1: Prove Lemma 1
+
+**Lemma 1:** Let g = (V, E, w) be a weighted, directed graph with designated root r in V. Let E' = {me(u) : u in V \ {r}}, where me(u) is an edge whose head is node u and whose weight is m(u) (the minimum weight of an edge whose head is u). Then, either T = (V, E') is an RDMST of g rooted at r, or T contains a cycle.
+
+**Proof:** The set E' contains exactly |V| - 1 edges, one incoming edge for each non-root node. We consider two cases.
+
+**Case 1: T = (V, E') contains no cycle.** Since T has |V| nodes and |V| - 1 edges with no cycle, T forms a tree. Moreover, since every non-root node has exactly one incoming edge, T is a directed spanning tree rooted at r (every node except r is reachable from r, as every non-root node has a parent).
+
+We claim T is an RDMST. Suppose for contradiction that T is not minimum, i.e., there exists another RDST T* = (V, E*) rooted at r with w(T*) < w(T). Since T* is an RDST, every non-root node u has exactly one incoming edge in T*. Let the incoming edge to u in T* have weight w*(u). By construction of E', the incoming edge to u in T = (V, E') has weight m(u), which is the minimum weight among all edges whose head is u. Therefore, m(u) <= w*(u) for all u in V \ {r}.
+
+Thus:
+w(T) = sum_{u in V\{r}} m(u) <= sum_{u in V\{r}} w*(u) = w(T*)
+
+This contradicts w(T*) < w(T). Therefore T is an RDMST.
+
+**Case 2: T = (V, E') contains a cycle.** This case is stated directly in the lemma, and the conclusion holds trivially (the lemma says "either T is an RDMST or T contains a cycle").
+
+Therefore, in all cases, either T is an RDMST or T contains a cycle. QED.
+
+---
+
+### Problem 2.2: Prove Lemma 2
+
+**Lemma 2:** Let g = (V, E, w) be a weighted, directed graph with designated root r in V. Define w' : E -> R+ by w'(e) = w(e) - m(v) for each edge e = (u, v), where m(v) is the minimum weight of any incoming edge to v. Then T = (V, E_T) is an RDMST of (V, E, w) rooted at r if and only if T is an RDMST of (V, E, w') rooted at r.
+
+**Proof:** Let T be any RDST of g rooted at r. In any RDST, every non-root node v has exactly one incoming edge. Let e_v denote the incoming edge to v in T. Then:
+
+w(T) = sum_{v in V\{r}} w(e_v)
+
+w'(T) = sum_{v in V\{r}} w'(e_v) = sum_{v in V\{r}} (w(e_v) - m(v)) = w(T) - sum_{v in V\{r}} m(v)
+
+Let C = sum_{v in V\{r}} m(v). This constant C depends only on the graph g and the root r, not on the choice of RDST T. Therefore, for any RDST T:
+
+w(T) = w'(T) + C
+
+Since C is a constant independent of T, we have for any two RDSTs T_1 and T_2:
+
+w(T_1) < w(T_2) if and only if w'(T_1) < w'(T_2)
+
+Therefore, T minimizes w(T) among all RDSTs if and only if T minimizes w'(T) among all RDSTs. In other words, T is an RDMST under w if and only if T is an RDMST under w'. QED.
+
+---
+
+### Problem 2.3: `reverse_digraph_representation(graph)`
+
+```python
+def reverse_digraph_representation(graph):
+ """
+ Convert a directed graph from standard to reversed representation
+ (or vice versa).
+
+ In the standard representation, graph[u][v] = w means there is an edge
+ from u to v with weight w. In the reversed representation, graph[v][u] = w
+ means there is an edge from u to v with weight w (keyed by the head node).
+
+ Arguments:
+ graph -- a weighted digraph in dictionary form
+
+ Returns:
+ An identical digraph with the opposite representation
+ """
+ # Initialize all nodes with empty dictionaries
+ rev_graph = {}
+ for node in graph:
+ rev_graph[node] = {}
+
+ # Reverse each edge: if graph[u][v] = w, set rev_graph[v][u] = w
+ for node in graph:
+ for nbr in graph[node]:
+ rev_graph[nbr][node] = graph[node][nbr]
+
+ return rev_graph
+```
+
+---
+
+### Problem 2.4: `modify_edge_weights(rgraph, root)`
+
+```python
+def modify_edge_weights(rgraph, root):
+ """
+ Modify edge weights according to Lemma 2. For each non-root node,
+ subtract the minimum incoming edge weight from all incoming edges.
+ This makes at least one incoming edge have weight 0 for each non-root node.
+
+ Arguments:
+ rgraph -- a weighted digraph in reversed representation
+ root -- a node in the graph (root node, whose incoming edges are not modified)
+
+ Returns:
+ None (modifies rgraph in-place)
+ """
+ for node in rgraph:
+ # Skip the root and nodes with no incoming edges
+ if node != root and bool(rgraph[node]):
+ # Find the minimum incoming edge weight
+ min_weight = min(rgraph[node].values())
+ # Subtract minimum from all incoming edges
+ for nbr in rgraph[node]:
+ rgraph[node][nbr] -= min_weight
+```
+
+---
+
+### Problem 2.5: `compute_rdst_candidate(rgraph, root)`
+
+```python
+def compute_rdst_candidate(rgraph, root):
+ """
+ Compute the RDST candidate by selecting one zero-weight incoming edge
+ for each non-root node, following Lemma 1 (after edge weight modification).
+
+ Arguments:
+ rgraph -- a weighted digraph in reversed representation (after modify_edge_weights)
+ root -- a node in the graph
+
+ Returns:
+ A weighted digraph (reversed representation) containing the selected edges
+ """
+ rdst = {}
+ # Initialize all nodes with empty dictionaries
+ for node in rgraph:
+ rdst[node] = {}
+
+ # For each non-root node, select one incoming edge with weight 0
+ for node in rgraph:
+ if node != root:
+ for nbr in rgraph[node]:
+ if rgraph[node][nbr] == 0:
+ rdst[node][nbr] = 0
+ break # Select only one zero-weight edge
+
+ return rdst
+```
+
+---
+
+### Problem 2.6: `compute_cycle(rgraph)`
+
+```python
+def compute_cycle(rgraph):
+ """
+ Find a cycle in the RDST candidate graph where every node has
+ in-degree 0 or 1.
+
+ Arguments:
+ rgraph -- a weighted digraph in reversed representation where
+ every node has in-degree 0 or 1
+
+ Returns:
+ A tuple of nodes forming a cycle, or None if no cycle exists
+ """
+ visited = set()
+
+ for node in rgraph:
+ # Follow the chain of incoming edges from this node
+ path = []
+ current = node
+ while (current not in path) and (current not in visited):
+ visited.add(current)
+ if current not in rgraph or not rgraph[current]:
+ break
+ path.append(current)
+ # Follow the single incoming edge (in-degree 0 or 1)
+ current = list(rgraph[current].keys())[0]
+
+ # If current is in path, we found a cycle
+ if current in path:
+ idx = path.index(current)
+ cycle = path[idx:]
+ return tuple(cycle)
+
+ return None
+```
+
+---
+
+### Problem 2.7: `contract_cycle(graph, cycle)`
+
+```python
+def contract_cycle(graph, cycle):
+ """
+ Contract a cycle into a single node (Step 4a of ComputeRDMST).
+
+ All nodes in the cycle are replaced by a new node c*. Edges between
+ cycle nodes and non-cycle nodes are preserved with minimum weights:
+ - For edges from non-cycle node u to cycle: keep the minimum weight
+ edge from u to any cycle node.
+ - For edges from cycle to non-cycle node v: keep the minimum weight
+ edge from any cycle node to v.
+
+ Arguments:
+ graph -- a weighted digraph in standard representation
+ cycle -- a tuple of nodes on the cycle
+
+ Returns:
+ A tuple (graph, cstar) where graph is the contracted graph
+ and cstar is the new node label
+ """
+ orig_nodes = list(graph.keys())
+ cstar = max(orig_nodes) + 1
+ graph[cstar] = {}
+
+ # Process edges involving cycle nodes
+ for node in orig_nodes:
+ if node not in cycle:
+ # Edges from non-cycle node to cycle nodes -> edge to cstar
+ min_weight = float("inf")
+ for nbr in graph[node]:
+ if nbr in cycle:
+ if graph[node][nbr] < min_weight:
+ min_weight = graph[node][nbr]
+ if min_weight != float("inf"):
+ graph[node][cstar] = min_weight
+ else:
+ # Edges from cycle node to non-cycle nodes -> edge from cstar
+ for nbr in graph[node]:
+ if nbr not in cycle:
+ if nbr not in graph[cstar]:
+ graph[cstar][nbr] = graph[node][nbr]
+ else:
+ if graph[node][nbr] < graph[cstar][nbr]:
+ graph[cstar][nbr] = graph[node][nbr]
+
+ # Remove cycle nodes from the graph
+ for node in cycle:
+ del graph[node]
+ for nbr in graph:
+ if node in graph[nbr]:
+ del graph[nbr][node]
+
+ return (graph, cstar)
+```
+
+---
+
+### Problem 2.8: `expand_graph(rgraph, rdst_candidate, cycle, cstar)`
+
+```python
+def expand_graph(rgraph, rdst_candidate, cycle, cstar):
+ """
+ Expand a contracted cycle back to the original nodes (Step 4c of ComputeRDMST).
+
+ Determines which edge enters the cycle from outside, removes the cycle edge
+ into that node, and adds back all other cycle edges plus any edges from
+ cycle nodes to non-cycle nodes.
+
+ Arguments:
+ rgraph -- the original digraph (reversed representation) before contraction
+ rdst_candidate -- the RDMST computed on the contracted graph (reversed rep)
+ cycle -- the cycle that was contracted
+ cstar -- the node label used for the contracted cycle
+
+ Returns:
+ A weighted digraph (reversed representation) with the cycle expanded back
+ """
+ expanded_graph = {}
+ enter_cycle_weight = float("inf")
+ enter_cycle_tail = None
+ enter_cycle_head = None
+
+ # Copy non-cycle, non-cstar edges from rdst_candidate
+ for node in rdst_candidate:
+ if node in rgraph:
+ expanded_graph[node] = {}
+ for nbr in rdst_candidate[node]:
+ if nbr in rgraph:
+ expanded_graph[node][nbr] = rgraph[node][nbr]
+ # Find the edge entering the cycle (edge to cstar from outside)
+ if cstar in rdst_candidate[node]:
+ for nbr in cycle:
+ if nbr in rgraph[node] and rgraph[node][nbr] < enter_cycle_weight:
+ enter_cycle_weight = rgraph[node][nbr]
+ enter_cycle_tail = node
+ enter_cycle_head = nbr
+
+ # Initialize cycle nodes
+ for i in range(len(cycle)):
+ expanded_graph[cycle[i]] = {}
+
+ # Add cycle edges back, except the one replaced by the entering edge
+ for i in range(-1, len(cycle) - 1):
+ if cycle[i] != enter_cycle_head:
+ expanded_graph[cycle[i + 1]] = {cycle[i]: rgraph[cycle[i + 1]][cycle[i]]}
+
+ # Add the entering edge
+ expanded_graph[enter_cycle_tail][enter_cycle_head] = enter_cycle_weight
+
+ # Handle edges from cycle nodes to non-cycle nodes (edges from cstar in rdst_candidate)
+ for node in rgraph:
+ if node not in cycle:
+ if node in rdst_candidate[cstar]:
+ for nbr in cycle:
+ if node in rgraph[nbr]:
+ if rgraph[nbr][node] == rdst_candidate[cstar][node]:
+ expanded_graph[nbr][node] = rgraph[nbr][node]
+ break
+
+ return expanded_graph
+```
+
+---
+
+## Part 3: Bacterial Infection Transmission
+
+### Problem 3.1: `compute_genetic_distance(seq1, seq2)`
+
+```python
+def compute_genetic_distance(seq1, seq2):
+ """
+ Compute the Hamming distance between two DNA sequences.
+ The Hamming distance is the number of positions at which
+ the corresponding characters differ.
+
+ Arguments:
+ seq1 -- a DNA sequence (encoded as 0's and 1's)
+ seq2 -- a DNA sequence (same length as seq1)
+
+ Returns:
+ The number of positions where the sequences differ,
+ or None if the sequences are different lengths.
+ """
+ if len(seq1) != len(seq2):
+ print("Error: The two sequences are not the same length.")
+ return None
+
+ score = 0
+ for i in range(len(seq1)):
+ if seq1[i] != seq2[i]:
+ score += 1
+ return score
+```
+
+**Example:**
+```
+compute_genetic_distance('00101', '10100')
+= positions 0 and 4 differ
+= 2
+```
+
+---
+
+### Problem 3.2: `construct_complete_weighted_digraph(gen_data, epi_data)`
+
+```python
+def construct_complete_weighted_digraph(gen_data, epi_data):
+ """
+ Build a complete weighted digraph combining genetic and epidemiological data.
+
+ The weight of edge (A, B) is:
+ D_AB = G_AB + (999 * (E_AB / max(E))) / 10^5
+
+ where G_AB is the genetic (Hamming) distance and E_AB is the
+ epidemiological distance.
+
+ Arguments:
+ gen_data -- filename for genetic data (patient sequences)
+ epi_data -- filename for epidemiological data (patient traces)
+
+ Returns:
+ A complete weighted digraph in standard dictionary representation
+ """
+ # Read the data using provided functions
+ sequences = provided.read_patient_sequences(gen_data)
+ gen_g = provided.compute_pairwise_gen_distances(sequences, compute_genetic_distance)
+ epi_g = provided.read_patient_traces(epi_data)
+
+ # Compute maximum epidemiological weight
+ max_e = -float("inf")
+ for i in epi_g:
+ for j in epi_g[i]:
+ if epi_g[i][j] > max_e:
+ max_e = epi_g[i][j]
+
+ # Combine genetic and epidemiological distances
+ tot_g = {}
+ for patient_u in gen_g:
+ tot_g[patient_u] = {}
+ for patient_v in gen_g[patient_u]:
+ tot_g[patient_u][patient_v] = gen_g[patient_u][patient_v] + (
+ (999.0 * (epi_g[patient_u][patient_v] / max_e)) / 10**5
+ )
+
+ return tot_g
+```
+
+---
+
+### Problem 3.3: `infer_transmap(gen_data, epi_data)`
+
+```python
+def infer_transmap(gen_data, epi_data):
+ """
+ Infer the transmission map by computing the RDMST of the weighted digraph.
+
+ Arguments:
+ gen_data -- filename for genetic data
+ epi_data -- filename for epidemiological data
+
+ Returns:
+ The RDMST representing the most likely transmission map
+ """
+ graph = construct_complete_weighted_digraph(gen_data, epi_data)
+ rdmst = compute_rdmst(graph, 1) # Patient 1 as root (Patient Zero)
+ return rdmst
+```
+
+---
+
+### Problem 3.4: Data Analysis
+
+**Transmission Map (RDMST rooted at Patient 1):**
+
+```
+ 1
+ / | \
+ 8 3 4
+ / / \ \
+ 2 5 9 10
+ / \
+ 6 12
+ / \ / \
+ 17 7 18 13
+ |
+ 14
+ / | \
+ 16 11 15
+```
+
+**RDMST weight:** Approximately 43.05
+
+**Analysis:**
+
+**Who was Patient Zero?** Patient 1 was the source of the outbreak, serving as the root of the transmission tree.
+
+**Transmission structure:** The RDMST reveals three independent transmission chains originating from Patient 1:
+
+1. **Chain 1:** Patient 1 -> Patient 8 -> Patient 2. A short chain of direct transmission.
+2. **Chain 2:** Patient 1 -> Patient 3 -> Patients 5 and 9. Another short branch.
+3. **Chain 3:** Patient 1 -> Patient 4 -> Patient 10, which then cascades into most of the remaining patients (6, 7, 12, 13, 14, 15, 16, 17, 18, 11). This is the dominant transmission pathway.
+
+**Surprising observations:** The third chain is notably deep and branching, suggesting that Patient 10 was a "super-spreader" node, infecting many subsequent patients either directly or through intermediaries. The outbreak was not uniform -- most transmission occurred through a single pathway rather than all chains spreading equally.
+
+**How does this analysis help?** Computing the RDMST of the combined genetic-epidemiological graph provides a principled way to infer the most likely transmission map. The genetic distances capture evolutionary similarity between bacterial strains (closer strains are more likely to be directly transmitted), while the epidemiological data captures temporal and spatial proximity of patients. By combining both data sources, we can reconstruct disease spread patterns even when direct observation of transmission is impossible, which helps hospitals implement targeted infection control measures.
+
+**Uniqueness:** The RDMST is **not unique** -- inspection of the edge weights reveals that alternative spanning trees with the same total weight exist, because some edges have identical combined weights. This means the inferred transmission map is one of several equally plausible explanations.
+
+---
Sub-Model Usage
| Model | Input | Output | Cache Read | Cost |
|---|---|---|---|---|
| claude-opus-4-6 | 45 | 9.3K | 410.5K | $0.69 |
BSCS Bench