Agent Work: Web Proxy

Claude Opus 4.6 · COMP 321: Introduction to Computer Systems

COMP 321: Introduction to Computer Systems

Project 6: Web Proxy

Overview

In this project, you will write a concurrent web proxy that logs requests. A web proxy is a program that acts as a middleman between a web browser and an *end server*. Instead of contacting the end server directly to get a web page, the browser contacts the proxy, which forwards the request on to the end server. When the end server replies to the proxy, the proxy sends the reply on to the browser.

Proxies are used for many purposes. Sometimes proxies are used in firewalls, such that the proxy is the only way for a browser inside the firewall to contact an end server outside. The proxy may perform transformations on web pages, for instance, removing ads from web pages. Proxies are also used as *anonymizers*. By stripping a request of all identifying information, a proxy can make the browser anonymous to the end server. Proxies can even be used to cache web pages, by storing a copy of, say, an image when a request for it is first made, and then serving that image in response to future requests rather than going to the end server.

In the first part of the project, you will write a simple sequential proxy that repeatedly waits for a request, forwards the request to the end server, and returns the result back to the browser, keeping a log of such requests in a disk file. This part will help you understand the basics about network programming and the HTTP protocol.

In the second part of the project, you will upgrade your proxy so that it uses threads to deal with multiple clients concurrently. This part will give you some experience with concurrency and synchronization, which are crucial computer systems concepts.

Part I: Implementing a Sequential Web Proxy

In this part, you will implement a sequential logging web proxy. Your proxy must open a socket and listen for a connection request. When your proxy receives a connection request, it must accept the connection, read the HTTP request, and parse the request to determine the name of the end server. Your proxy must then open a connection to that end server, forward the request to it, receive the reply, and forward the reply to the browser.

Your proxy must be able to support both HTTP/1.0 and HTTP/1.1 requests. However, your proxy does not need to support persistent connections.

Your proxy need only support the HTTP GET method. It is not required to support HEAD, POST, or any other HTTP method, aside from GET.

Since your proxy is a middleman between client and end server, it will have elements of both. It will act as a server to the web browser, and as a client to the end server. Thus you will get experience with both client and server programming.

Logging

Your proxy must keep track of all completed transactions in a log file named proxy.log. Each entry in this log file must be a line of the form:

Date: browserIP URL size

where browserIP is the IP address of the client, URL is the requested URL, size is the total size in bytes of the response that was returned by the end server, including the first line, headers, and body. For instance:

Thu 16 Apr 2020 20:06:08 CDT: 10.87.76.143 http://www.cs.cmu.edu/~conglonl/ 2584

In effect, since your proxy does not support persistent connections, size is the total number of bytes that your proxy receives from the end server between the opening and closing of the connection.

Only requests that are met by a response from an end server must be logged.

Since opening and closing files are costly operations, your proxy must only open the log file once, during initialization. In other words, your proxy may not open and close the log file each time it writes a log entry.

We have provided the function create_log_entry in proxy.c to create a log entry in the required format. Note that create_log_entry does not put a trailing newline on the returned string. Also, note that you are responsible for freeing the memory that stores the returned string.

Error Handling

In most cases, when your proxy is unable to complete a transaction, it must itself create an HTTP response containing an HTML error page and write this response to the client. This error page must describe why the transaction failed, for example, because the request is malformed or the end server does not exist. The exception to this mandate is when your proxy has already begun forwarding an HTTP response from the end server to the client. Once the end server's response is partially written to the client, your proxy can not send an error page.

We have provided the function client_error in proxy.c to write an HTTP response containing an HTML error page to the client. There are standardized meanings for a small set of the possible err_num and short_msg parameter values, for example, 400 and "Bad Request" are appropriate when the request is malformed (See RFC 2616 for a complete list). In contrast, there are no standardized values for the cause and long_msg parameters. Use them as you see fit.

Port Numbers

Your proxy must listen for its connection requests on the port number passed in on the command line:

./proxy 18181

You may use any port number p, where 18000 <= p <= 18200, and where p is not currently being used by any other services.

Part II: Dealing with multiple requests concurrently

Real proxies do not process requests sequentially. They deal with multiple requests concurrently. Once you have a working sequential logging proxy, you will alter it to handle multiple requests concurrently. Specifically, in this part, you will implement the *prethreading* approach to handling each new connection request (CS:APP 12.5.5).

To implement the prethreading approach, you must buffer the new connection requests as they come in. To implement this buffering, you may adapt the sbuf that is discussed in the notes and textbook. However, to synchronize access to your buffer, you must use Pthread mutex and condition variables. You may not use semaphores.

Note that, with prethreading, it is possible for multiple peer threads to write to the log file concurrently. Thus, you will need to ensure that the log file entries are written atomically. Otherwise, the log file might become corrupted.

Files

Your workspace contains:

proxy.c - Main proxy source file (implement your solution here)
Makefile - Build specification
writeup.md - Skeleton writeup file
lib/ - Symlink to csapp library (csapp.h, csapp.c)

The provided proxy.c file contains:

parse_uri - Helper function to parse URIs
create_log_entry - Helper function to create log entries
client_error - Helper function to write HTTP error responses

Notes

A good way to get going on your proxy is to start with the basic echo server (CS:APP 11.4.9) and then gradually add functionality that turns the server into a proxy.

Initially, you should debug your proxy using telnet as the client (CS:APP 11.5.3).

Later, test your proxy with a real browser. Explore the browser settings until you find "proxies", then enter the host and port where you're running yours. You should only set the HTTP proxy, as that is all your code is going to be able to handle.

When you test your proxy with a real browser, you should start with very simple web pages. There is a simple, text-only web page at:

``http://www.cs.cmu.edu/~conglonl`` Once you get that working, you can try more complicated sites with multiple files and images.

You will find it very useful to assign each thread a small unique integer ID and then pass this ID as one of the arguments to the thread routine. If you display this ID in each of your debugging output statements, then you can accurately track the activity of each thread.

Be very careful about calling thread-unsafe functions inside a thread.

Since the log file is being written to by multiple threads, in order for the output from different threads to not be scrambled, all of the output from a single thread needs to be printed out atomically. Fortunately, the POSIX standard requires that individual stream operations, such as fprintf and fwrite, be atomic.

Be careful about memory and file descriptor leaks. When the processing for an HTTP request fails for any reason, the thread must close all open socket descriptors and free all memory resources.

Use the RIO (Robust I/O) package (CS:APP 10.4) for all I/O on sockets. Do not use standard I/O on sockets. You will quickly run into problems if you do. However, standard I/O calls, such as fopen, fprintf, and fwrite, are fine for I/O on the log file.

The Rio_readn, Rio_readlineb, and Rio_writen error checking wrappers in csapp.c are not appropriate for a proxy because they terminate the process when they encounter an error.

The HTTP/1.1 specification does not place an upper limit on the length of a URI. Moreover, in testing your proxy at web sites with rich content, you may encounter a URI that is longer than csapp.h's defined MAXLINE. You should explore how rio_readlineb behaves when the length of the line being read exceeds the given buffer size.

When a browser is using a proxy, it sends a full URI in the first line, e.g., http://www.cs.cmu.edu/~conglonl, to the proxy. In contrast, when a browser is directly connected to an end server, it only sends the path part of the URI, e.g., /~conglonl, to the server. So that the request message received by the end server is as if it came from a browser, your proxy should rewrite the start line that it sends to the end server to include just the path, not the full URI.

Reads and writes can fail for a variety of reasons. The most common read failure is an errno == ECONNRESET error caused by reading from a connection that has already been closed by the peer on the other end, typically an overloaded end server. The most common write failure is an errno == EPIPE error caused by writing to a connection that has been closed by its peer on the other end. This can occur, for example, when a user hits their browser's "Stop" button during a long transfer.

Writing to a connection that has been closed by the peer elicits an error with errno set to EPIPE the first time. Writing to such a connection a second time elicits a SIGPIPE signal whose default action is to terminate the process. One relatively easy way to keep your proxy from crashing is to use the SIG_IGN argument to the Signal function (CS:APP 8.5.3) to explicitly ignore these SIGPIPE signals (or you can catch them and print an appropriate warning message using the sio functions).

Modern web browsers and servers support persistent connections, which allow back-to-back requests to reuse the same connection. Your proxy will not do so. However, your browser is likely to set the headers Connection, Keep-Alive, and/or Proxy-Connection to indicate that it would like to use persistent connections. If you pass these headers on to the end server, it will assume that you can support them. If you do not support persistent connections, then subsequent requests on that connection will fail, so some or all of the web page will not load in your browser. Therefore, you should strip the Connection, Keep-Alive, and Proxy-Connection headers out of all requests, if they are present. Furthermore, HTTP/1.1 requires a Connection: close header be sent if you want the connection to close. Note that you must leave the other headers intact as many browsers and servers make use of them and will not work correctly without them.

Testing

Use the grade tool to run the automated test suite:

# From the workspace directory
bin/grade .

# Or specify the workspace path
bin/grade ./workspaces/agent_proxy

The grade tool will compile your code and run tests against various web sites. Your proxy should correctly handle:

http://neverssl.com
http://www.cs.cmu.edu/~conglonl
http://httpforever.com
http://csapp.cs.cmu.edu
http://www.squid-cache.org
http://www.openoffice.org
http://www.unm.edu
http://www.washington.edu

Focus on getting basic functionality working first (sequential proxy with logging), then add concurrent handling.

--- *COMP 321: Introduction to Computer Systems, Rice University, Spring 2024*

proxy.c6.3 KB

/*
 * COMP 321 Project 6: Web Proxy
 *
 * This program implements a multithreaded HTTP proxy.
 *
 * <Replace with your name(s) and NetID(s).>
 */ 

#include <assert.h>

#include "csapp.h"

static void	client_error(int fd, const char *cause, int err_num, 
		    const char *short_msg, const char *long_msg);
static char    *create_log_entry(const struct sockaddr_in *sockaddr,
		    const char *uri, int size);
static int	parse_uri(const char *uri, char **hostnamep, char **portp,
		    char **pathnamep);

/* 
 * Requires:
 *   <to be filled in by the student(s)> 
 *
 * Effects:
 *   <to be filled in by the student(s)> 
 */
int
main(int argc, char **argv)
{

	/* Check the arguments. */
	if (argc != 2) {
		fprintf(stderr, "Usage: %s <port number>\n", argv[0]);
		exit(0);
	}

	/* Return success. */
	return (0);
}

/*
 * Requires:
 *   The parameter "uri" must point to a properly NUL-terminated string.
 *
 * Effects:
 *   Given a URI from an HTTP proxy GET request (i.e., a URL), extract the
 *   host name, port, and path name.  Create strings containing the host name,
 *   port, and path name, and return them through the parameters "hostnamep",
 *   "portp", "pathnamep", respectively.  (The caller must free the memory
 *   storing these strings.)  Return -1 if there are any problems and 0
 *   otherwise.
 */
static int
parse_uri(const char *uri, char **hostnamep, char **portp, char **pathnamep)
{
	const char *pathname_begin, *port_begin, *port_end;

	if (strncasecmp(uri, "http://", 7) != 0)
		return (-1);

	/* Extract the host name. */
	const char *host_begin = uri + 7;
	const char *host_end = strpbrk(host_begin, ":/ \r\n");
	if (host_end == NULL)
		host_end = host_begin + strlen(host_begin);
	int len = host_end - host_begin;
	char *hostname = Malloc(len + 1);
	strncpy(hostname, host_begin, len);
	hostname[len] = '\0';
	*hostnamep = hostname;

	/* Look for a port number.  If none is found, use port 80. */
	if (*host_end == ':') {
		port_begin = host_end + 1;
		port_end = strpbrk(port_begin, "/ \r\n");
		if (port_end == NULL)
			port_end = port_begin + strlen(port_begin);
		len = port_end - port_begin;
	} else {
		port_begin = "80";
		port_end = host_end;
		len = 2;
	}
	char *port = Malloc(len + 1);
	strncpy(port, port_begin, len);
	port[len] = '\0';
	*portp = port;

	/* Extract the path. */
	if (*port_end == '/') {
		pathname_begin = port_end;
		const char *pathname_end = strpbrk(pathname_begin, " \r\n");
		if (pathname_end == NULL)
			pathname_end = pathname_begin + strlen(pathname_begin);
		len = pathname_end - pathname_begin;
	} else {
		pathname_begin = "/";
		len = 1;
	}
	char *pathname = Malloc(len + 1);
	strncpy(pathname, pathname_begin, len);
	pathname[len] = '\0';
	*pathnamep = pathname;

	return (0);
}

/*
 * Requires:
 *   The parameter "sockaddr" must point to a valid sockaddr_in structure.  The
 *   parameter "uri" must point to a properly NUL-terminated string.
 *
 * Effects:
 *   Returns a string containing a properly formatted log entry.  This log
 *   entry is based upon the socket address of the requesting client
 *   ("sockaddr"), the URI from the request ("uri"), and the size in bytes of
 *   the response from the server ("size").
 */
static char *
create_log_entry(const struct sockaddr_in *sockaddr, const char *uri, int size)
{
	struct tm result;

	/*
	 * Create a large enough array of characters to store a log entry.
	 * Although the length of the URI can exceed MAXLINE, the combined
	 * lengths of the other fields and separators cannot.
	 */
	const size_t log_maxlen = MAXLINE + strlen(uri);
	char *const log_str = Malloc(log_maxlen + 1);

	/* Get a formatted time string. */
	time_t now = time(NULL);
	int log_strlen = strftime(log_str, MAXLINE, "%a %d %b %Y %H:%M:%S %Z: ",
	    localtime_r(&now, &result));

	/*
	 * Convert the IP address in network byte order to dotted decimal
	 * form.
	 */
	Inet_ntop(AF_INET, &sockaddr->sin_addr, &log_str[log_strlen],
	    INET_ADDRSTRLEN);
	log_strlen += strlen(&log_str[log_strlen]);

	/*
	 * Assert that the time and IP address fields occupy less than half of
	 * the space that is reserved for the non-URI fields.
	 */
	assert(log_strlen < MAXLINE / 2);

	/*
	 * Add the URI and response size onto the end of the log entry.
	 */
	snprintf(&log_str[log_strlen], log_maxlen - log_strlen, " %s %d", uri,
	    size);

	return (log_str);
}

/*
 * Requires:
 *   The parameter "fd" must be an open socket that is connected to the client.
 *   The parameters "cause", "short_msg", and "long_msg" must point to properly 
 *   NUL-terminated strings that describe the reason why the HTTP transaction
 *   failed.  The string "short_msg" may not exceed 32 characters in length,
 *   and the string "long_msg" may not exceed 80 characters in length.
 *
 * Effects:
 *   Constructs an HTML page describing the reason why the HTTP transaction
 *   failed, and writes an HTTP/1.0 response containing that page as the
 *   content.  The cause appearing in the HTML page is truncated if the
 *   string "cause" exceeds 2048 characters in length.
 */
static void
client_error(int fd, const char *cause, int err_num, const char *short_msg,
    const char *long_msg)
{
	char body[MAXBUF], headers[MAXBUF], truncated_cause[2049];

	assert(strlen(short_msg) <= 32);
	assert(strlen(long_msg) <= 80);
	/* Ensure that "body" is much larger than "truncated_cause". */
	assert(sizeof(truncated_cause) < MAXBUF / 2);

	/*
	 * Create a truncated "cause" string so that the response body will not
	 * exceed MAXBUF.
	 */
	strncpy(truncated_cause, cause, sizeof(truncated_cause) - 1);
	truncated_cause[sizeof(truncated_cause) - 1] = '\0';

	/* Build the HTTP response body. */
	snprintf(body, MAXBUF,
	    "<html><title>Proxy Error</title><body bgcolor=""ffffff"">\r\n"
	    "%d: %s\r\n"
	    "<p>%s: %s\r\n"
	    "<hr><em>The COMP 321 Web proxy</em>\r\n",
	    err_num, short_msg, long_msg, truncated_cause);

	/* Build the HTTP response headers. */
	snprintf(headers, MAXBUF,
	    "HTTP/1.0 %d %s\r\n"
	    "Content-type: text/html\r\n"
	    "Content-length: %d\r\n"
	    "\r\n",
	    err_num, short_msg, (int)strlen(body));

	/* Write the HTTP response. */
	if (rio_writen(fd, headers, strlen(headers)) != -1)
		rio_writen(fd, body, strlen(body));
}

// Prevent "unused function" and "unused variable" warnings.
static const void *dummy_ref[] = { client_error, create_log_entry, dummy_ref,
    parse_uri };

proxy.c6.3 KB

/*
 * COMP 321 Project 6: Web Proxy
 *
 * This program implements a multithreaded HTTP proxy.
 *
 * <Replace with your name(s) and NetID(s).>
 */ 

#include <assert.h>

#include "csapp.h"

static void	client_error(int fd, const char *cause, int err_num, 
		    const char *short_msg, const char *long_msg);
static char    *create_log_entry(const struct sockaddr_in *sockaddr,
		    const char *uri, int size);
static int	parse_uri(const char *uri, char **hostnamep, char **portp,
		    char **pathnamep);

/* 
 * Requires:
 *   <to be filled in by the student(s)> 
 *
 * Effects:
 *   <to be filled in by the student(s)> 
 */
int
main(int argc, char **argv)
{

	/* Check the arguments. */
	if (argc != 2) {
		fprintf(stderr, "Usage: %s <port number>\n", argv[0]);
		exit(0);
	}

	/* Return success. */
	return (0);
}

/*
 * Requires:
 *   The parameter "uri" must point to a properly NUL-terminated string.
 *
 * Effects:
 *   Given a URI from an HTTP proxy GET request (i.e., a URL), extract the
 *   host name, port, and path name.  Create strings containing the host name,
 *   port, and path name, and return them through the parameters "hostnamep",
 *   "portp", "pathnamep", respectively.  (The caller must free the memory
 *   storing these strings.)  Return -1 if there are any problems and 0
 *   otherwise.
 */
static int
parse_uri(const char *uri, char **hostnamep, char **portp, char **pathnamep)
{
	const char *pathname_begin, *port_begin, *port_end;

	if (strncasecmp(uri, "http://", 7) != 0)
		return (-1);

	/* Extract the host name. */
	const char *host_begin = uri + 7;
	const char *host_end = strpbrk(host_begin, ":/ \r\n");
	if (host_end == NULL)
		host_end = host_begin + strlen(host_begin);
	int len = host_end - host_begin;
	char *hostname = Malloc(len + 1);
	strncpy(hostname, host_begin, len);
	hostname[len] = '\0';
	*hostnamep = hostname;

	/* Look for a port number.  If none is found, use port 80. */
	if (*host_end == ':') {
		port_begin = host_end + 1;
		port_end = strpbrk(port_begin, "/ \r\n");
		if (port_end == NULL)
			port_end = port_begin + strlen(port_begin);
		len = port_end - port_begin;
	} else {
		port_begin = "80";
		port_end = host_end;
		len = 2;
	}
	char *port = Malloc(len + 1);
	strncpy(port, port_begin, len);
	port[len] = '\0';
	*portp = port;

	/* Extract the path. */
	if (*port_end == '/') {
		pathname_begin = port_end;
		const char *pathname_end = strpbrk(pathname_begin, " \r\n");
		if (pathname_end == NULL)
			pathname_end = pathname_begin + strlen(pathname_begin);
		len = pathname_end - pathname_begin;
	} else {
		pathname_begin = "/";
		len = 1;
	}
	char *pathname = Malloc(len + 1);
	strncpy(pathname, pathname_begin, len);
	pathname[len] = '\0';
	*pathnamep = pathname;

	return (0);
}

/*
 * Requires:
 *   The parameter "sockaddr" must point to a valid sockaddr_in structure.  The
 *   parameter "uri" must point to a properly NUL-terminated string.
 *
 * Effects:
 *   Returns a string containing a properly formatted log entry.  This log
 *   entry is based upon the socket address of the requesting client
 *   ("sockaddr"), the URI from the request ("uri"), and the size in bytes of
 *   the response from the server ("size").
 */
static char *
create_log_entry(const struct sockaddr_in *sockaddr, const char *uri, int size)
{
	struct tm result;

	/*
	 * Create a large enough array of characters to store a log entry.
	 * Although the length of the URI can exceed MAXLINE, the combined
	 * lengths of the other fields and separators cannot.
	 */
	const size_t log_maxlen = MAXLINE + strlen(uri);
	char *const log_str = Malloc(log_maxlen + 1);

	/* Get a formatted time string. */
	time_t now = time(NULL);
	int log_strlen = strftime(log_str, MAXLINE, "%a %d %b %Y %H:%M:%S %Z: ",
	    localtime_r(&now, &result));

	/*
	 * Convert the IP address in network byte order to dotted decimal
	 * form.
	 */
	Inet_ntop(AF_INET, &sockaddr->sin_addr, &log_str[log_strlen],
	    INET_ADDRSTRLEN);
	log_strlen += strlen(&log_str[log_strlen]);

	/*
	 * Assert that the time and IP address fields occupy less than half of
	 * the space that is reserved for the non-URI fields.
	 */
	assert(log_strlen < MAXLINE / 2);

	/*
	 * Add the URI and response size onto the end of the log entry.
	 */
	snprintf(&log_str[log_strlen], log_maxlen - log_strlen, " %s %d", uri,
	    size);

	return (log_str);
}

/*
 * Requires:
 *   The parameter "fd" must be an open socket that is connected to the client.
 *   The parameters "cause", "short_msg", and "long_msg" must point to properly 
 *   NUL-terminated strings that describe the reason why the HTTP transaction
 *   failed.  The string "short_msg" may not exceed 32 characters in length,
 *   and the string "long_msg" may not exceed 80 characters in length.
 *
 * Effects:
 *   Constructs an HTML page describing the reason why the HTTP transaction
 *   failed, and writes an HTTP/1.0 response containing that page as the
 *   content.  The cause appearing in the HTML page is truncated if the
 *   string "cause" exceeds 2048 characters in length.
 */
static void
client_error(int fd, const char *cause, int err_num, const char *short_msg,
    const char *long_msg)
{
	char body[MAXBUF], headers[MAXBUF], truncated_cause[2049];

	assert(strlen(short_msg) <= 32);
	assert(strlen(long_msg) <= 80);
	/* Ensure that "body" is much larger than "truncated_cause". */
	assert(sizeof(truncated_cause) < MAXBUF / 2);

	/*
	 * Create a truncated "cause" string so that the response body will not
	 * exceed MAXBUF.
	 */
	strncpy(truncated_cause, cause, sizeof(truncated_cause) - 1);
	truncated_cause[sizeof(truncated_cause) - 1] = '\0';

	/* Build the HTTP response body. */
	snprintf(body, MAXBUF,
	    "<html><title>Proxy Error</title><body bgcolor=""ffffff"">\r\n"
	    "%d: %s\r\n"
	    "<p>%s: %s\r\n"
	    "<hr><em>The COMP 321 Web proxy</em>\r\n",
	    err_num, short_msg, long_msg, truncated_cause);

	/* Build the HTTP response headers. */
	snprintf(headers, MAXBUF,
	    "HTTP/1.0 %d %s\r\n"
	    "Content-type: text/html\r\n"
	    "Content-length: %d\r\n"
	    "\r\n",
	    err_num, short_msg, (int)strlen(body));

	/* Write the HTTP response. */
	if (rio_writen(fd, headers, strlen(headers)) != -1)
		rio_writen(fd, body, strlen(body));
}

// Prevent "unused function" and "unused variable" warnings.
static const void *dummy_ref[] = { client_error, create_log_entry, dummy_ref,
    parse_uri };

proxy.c16.5 KB

/*
 * COMP 321 Project 6: Web Proxy
 *
 * This program implements a multithreaded HTTP proxy.
 *
 * Implements a prethreaded concurrent proxy that:
 * - Accepts HTTP GET requests from clients
 * - Forwards requests to end servers (rewriting URI to path only)
 * - Forwards responses back to clients
 * - Logs completed transactions to proxy.log
 * - Uses bounded buffer with mutex/condvar for concurrency
 */

#include <assert.h>

#include "csapp.h"

static void	client_error(int fd, const char *cause, int err_num,
		    const char *short_msg, const char *long_msg);
static char    *create_log_entry(const struct sockaddr_in *sockaddr,
		    const char *uri, int size);
static int	parse_uri(const char *uri, char **hostnamep, char **portp,
		    char **pathnamep);

/*
 * Prethreading constants.
 */
#define NTHREADS  8
#define SBUFSIZE  64

/*
 * Bounded buffer for connection file descriptors.
 * Synchronized with Pthread mutex and condition variables (no semaphores).
 */
typedef struct {
	int *buf;               /* Buffer array */
	int n;                  /* Maximum number of slots */
	int front;              /* Index for next removal */
	int rear;               /* Index for next insertion */
	pthread_mutex_t mutex;  /* Protects accesses to buf */
	pthread_cond_t slots;   /* Signaled when a slot becomes available */
	pthread_cond_t items;   /* Signaled when an item becomes available */
} sbuf_t;

/* Global variables. */
static sbuf_t sbuf;               /* Shared buffer of connected descriptors */
static FILE *log_fp;              /* Log file pointer (opened once) */
static pthread_mutex_t log_mutex; /* Protects log file writes */

/*
 * Requires:
 *   sp points to a valid sbuf_t, n > 0.
 *
 * Effects:
 *   Initializes the bounded buffer with capacity n.
 */
static void
sbuf_init(sbuf_t *sp, int n)
{

	sp->buf = Calloc(n, sizeof(int));
	sp->n = n;
	sp->front = sp->rear = 0;
	pthread_mutex_init(&sp->mutex, NULL);
	pthread_cond_init(&sp->slots, NULL);
	pthread_cond_init(&sp->items, NULL);
}

/*
 * Requires:
 *   sp points to an initialized sbuf_t.
 *
 * Effects:
 *   Inserts item into the buffer, blocking if the buffer is full.
 */
static void
sbuf_insert(sbuf_t *sp, int item)
{

	pthread_mutex_lock(&sp->mutex);
	while ((sp->rear - sp->front) >= sp->n)
		pthread_cond_wait(&sp->slots, &sp->mutex);
	sp->buf[(sp->rear++) % sp->n] = item;
	pthread_cond_signal(&sp->items);
	pthread_mutex_unlock(&sp->mutex);
}

/*
 * Requires:
 *   sp points to an initialized sbuf_t.
 *
 * Effects:
 *   Removes and returns an item from the buffer, blocking if empty.
 */
static int
sbuf_remove(sbuf_t *sp)
{
	int item;

	pthread_mutex_lock(&sp->mutex);
	while (sp->rear == sp->front)
		pthread_cond_wait(&sp->items, &sp->mutex);
	item = sp->buf[(sp->front++) % sp->n];
	pthread_cond_signal(&sp->slots);
	pthread_mutex_unlock(&sp->mutex);
	return (item);
}

/*
 * Requires:
 *   rp points to an initialized rio_t.
 *
 * Effects:
 *   Reads a complete line from the rio buffer, handling lines longer
 *   than MAXLINE by dynamically allocating memory.  Returns a
 *   dynamically allocated string, or NULL on error/EOF with no data.
 *   The caller must free the returned string.
 */
static char *
read_full_line(rio_t *rp)
{
	char buf[MAXLINE];
	char *line = NULL;
	size_t total_len = 0;
	ssize_t n;

	for (;;) {
		n = rio_readlineb(rp, buf, MAXLINE);
		if (n <= 0) {
			if (total_len == 0) {
				free(line);
				return (NULL);
			}
			break;
		}
		char *new_line = realloc(line, total_len + n + 1);
		if (new_line == NULL) {
			free(line);
			return (NULL);
		}
		line = new_line;
		memcpy(line + total_len, buf, n);
		total_len += n;
		line[total_len] = '\0';
		if (buf[n - 1] == '\n')
			break;
	}
	return (line);
}

/*
 * Requires:
 *   connfd is an open socket connected to a client.
 *
 * Effects:
 *   Handles a single HTTP GET request: reads the request from the client,
 *   forwards it to the end server, forwards the response back to the
 *   client, and logs the completed transaction.  Closes connfd and any
 *   server connection when done.  Frees all allocated memory.
 */
static void
handle_request(int connfd)
{
	rio_t rio_client;
	char *hostname = NULL, *port = NULL, *pathname = NULL;
	char *request_line = NULL, *uri = NULL;
	char *request = NULL;
	int serverfd = -1;
	char buf[MAXLINE];
	struct sockaddr_in clientaddr;
	socklen_t clientlen;

	/* Get client address for logging. */
	clientlen = sizeof(clientaddr);
	if (getpeername(connfd, (SA *)&clientaddr, &clientlen) < 0) {
		close(connfd);
		return;
	}

	/* Initialize RIO for client socket. */
	rio_readinitb(&rio_client, connfd);

	/* Read the request line (may be longer than MAXLINE). */
	request_line = read_full_line(&rio_client);
	if (request_line == NULL) {
		close(connfd);
		return;
	}

	/*
	 * Parse the request line: METHOD URI VERSION
	 */
	char *p = request_line;

	/* Skip leading whitespace. */
	while (*p && isspace((unsigned char)*p))
		p++;

	/* Extract method. */
	char *method_start = p;
	while (*p && !isspace((unsigned char)*p))
		p++;
	if (*p == '\0') {
		client_error(connfd, request_line, 400, "Bad Request",
		    "Malformed request line");
		free(request_line);
		close(connfd);
		return;
	}
	*p++ = '\0';

	/* Skip spaces between method and URI. */
	while (*p == ' ' || *p == '\t')
		p++;

	/* Extract URI. */
	char *uri_start = p;
	while (*p && !isspace((unsigned char)*p))
		p++;
	if (uri_start == p) {
		client_error(connfd, method_start, 400, "Bad Request",
		    "Missing URI");
		free(request_line);
		close(connfd);
		return;
	}
	*p = '\0';

	/* Make a copy of the URI for logging. */
	uri = strdup(uri_start);
	if (uri == NULL) {
		free(request_line);
		close(connfd);
		return;
	}

	/* Only support GET. */
	if (strcasecmp(method_start, "GET") != 0) {
		client_error(connfd, method_start, 501, "Not Implemented",
		    "Proxy only supports GET");
		free(uri);
		free(request_line);
		close(connfd);
		return;
	}

	/* Parse the URI to extract hostname, port, pathname. */
	if (parse_uri(uri, &hostname, &port, &pathname) < 0) {
		client_error(connfd, uri, 400, "Bad Request",
		    "Could not parse URI");
		free(uri);
		free(request_line);
		close(connfd);
		return;
	}
	free(request_line);
	request_line = NULL;

	/*
	 * Build the request to forward to the end server.
	 */
	size_t req_cap = strlen(pathname) + strlen(hostname) +
	    strlen(port) + MAXBUF;
	size_t req_len = 0;
	request = malloc(req_cap);
	if (request == NULL)
		goto cleanup_no_server;

	/* Request line: GET <path> HTTP/1.0 */
	int n = snprintf(request, req_cap, "GET %s HTTP/1.0\r\n", pathname);
	req_len = n;

	/* Host header. */
	if (strcmp(port, "80") == 0)
		n = snprintf(request + req_len, req_cap - req_len,
		    "Host: %s\r\n", hostname);
	else
		n = snprintf(request + req_len, req_cap - req_len,
		    "Host: %s:%s\r\n", hostname, port);
	req_len += n;

	/* Connection: close header. */
	n = snprintf(request + req_len, req_cap - req_len,
	    "Connection: close\r\n");
	req_len += n;

	/* Read and forward remaining client headers. */
	char *hdr_line;
	while ((hdr_line = read_full_line(&rio_client)) != NULL) {
		/* Empty line terminates headers. */
		if (strcmp(hdr_line, "\r\n") == 0 ||
		    strcmp(hdr_line, "\n") == 0) {
			free(hdr_line);
			break;
		}

		/* Strip Connection, Keep-Alive, Proxy-Connection, Host. */
		if (strncasecmp(hdr_line, "Connection:", 11) == 0 ||
		    strncasecmp(hdr_line, "Keep-Alive:", 11) == 0 ||
		    strncasecmp(hdr_line, "Proxy-Connection:", 17) == 0 ||
		    strncasecmp(hdr_line, "Host:", 5) == 0) {
			free(hdr_line);
			continue;
		}

		/* Grow buffer if needed. */
		size_t hdr_len = strlen(hdr_line);
		while (req_len + hdr_len + 3 > req_cap) {
			req_cap *= 2;
			char *new_req = realloc(request, req_cap);
			if (new_req == NULL) {
				free(hdr_line);
				free(request);
				request = NULL;
				goto cleanup_no_server;
			}
			request = new_req;
		}
		memcpy(request + req_len, hdr_line, hdr_len);
		req_len += hdr_len;
		free(hdr_line);
	}

	/* Terminate headers with blank line. */
	memcpy(request + req_len, "\r\n", 2);
	req_len += 2;
	request[req_len] = '\0';

	/* Connect to end server. */
	serverfd = open_clientfd(hostname, port);
	if (serverfd < 0) {
		client_error(connfd, hostname, 502, "Bad Gateway",
		    "Could not connect to end server");
		free(request);
		request = NULL;
		goto cleanup_no_server;
	}

	/* Send request to end server. */
	if (rio_writen(serverfd, request, req_len) < 0) {
		free(request);
		request = NULL;
		goto cleanup;
	}
	free(request);
	request = NULL;

	/*
	 * Forward response from end server to client, counting bytes.
	 * Use read() instead of rio_readn to stream data immediately
	 * without buffering up to MAXLINE bytes.
	 */
	int total_size = 0;
	ssize_t nread;
	int write_failed = 0;
	while (1) {
		nread = read(serverfd, buf, MAXLINE);
		if (nread > 0) {
			total_size += nread;
			if (!write_failed) {
				if (rio_writen(connfd, buf, nread) < 0)
					write_failed = 1;
			}
		} else if (nread == 0) {
			break; /* EOF */
		} else {
			if (errno == EINTR)
				continue;
			break; /* Error */
		}
	}

	/* Log the completed transaction. */
	if (total_size > 0) {
		char *log_entry = create_log_entry(&clientaddr, uri,
		    total_size);
		pthread_mutex_lock(&log_mutex);
		fprintf(log_fp, "%s\n", log_entry);
		fflush(log_fp);
		pthread_mutex_unlock(&log_mutex);
		free(log_entry);
	}

cleanup:
	close(serverfd);
cleanup_no_server:
	free(uri);
	free(hostname);
	free(port);
	free(pathname);
	close(connfd);
}

/*
 * Requires:
 *   vargp is unused.
 *
 * Effects:
 *   Worker thread routine.  Detaches itself, then repeatedly removes
 *   a connected descriptor from the shared buffer and handles the request.
 */
static void *
thread(void *vargp)
{
	int connfd;

	(void)vargp;
	Pthread_detach(Pthread_self());
	while (1) {
		connfd = sbuf_remove(&sbuf);
		handle_request(connfd);
	}
	return (NULL);
}

/*
 * Requires:
 *   argc == 2, argv[1] is a valid port number string.
 *
 * Effects:
 *   Starts the proxy server: ignores SIGPIPE, opens the log file,
 *   initializes the bounded buffer, creates worker threads, opens a
 *   listening socket, and accepts connections in a loop.
 */
int
main(int argc, char **argv)
{
	int listenfd, connfd;
	struct sockaddr_in clientaddr;
	socklen_t clientlen;
	pthread_t tid;
	int i;

	/* Check the arguments. */
	if (argc != 2) {
		fprintf(stderr, "Usage: %s <port number>\n", argv[0]);
		exit(0);
	}

	/* Ignore SIGPIPE to prevent crashing on broken connections. */
	Signal(SIGPIPE, SIG_IGN);

	/* Open log file once during initialization. */
	log_fp = Fopen("proxy.log", "a");

	/* Initialize log mutex. */
	pthread_mutex_init(&log_mutex, NULL);

	/* Initialize connection buffer. */
	sbuf_init(&sbuf, SBUFSIZE);

	/* Create worker threads. */
	for (i = 0; i < NTHREADS; i++)
		Pthread_create(&tid, NULL, thread, NULL);

	/* Open listening socket. */
	listenfd = open_listenfd(argv[1]);
	if (listenfd < 0) {
		fprintf(stderr, "open_listenfd error on port %s\n", argv[1]);
		exit(1);
	}

	/* Accept connections and dispatch to worker threads. */
	while (1) {
		clientlen = sizeof(clientaddr);
		connfd = accept(listenfd, (SA *)&clientaddr, &clientlen);
		if (connfd < 0) {
			if (errno == EINTR)
				continue;
			continue;
		}
		sbuf_insert(&sbuf, connfd);
	}

	/* Return success. */
	return (0);
}

/*
 * Requires:
 *   The parameter "uri" must point to a properly NUL-terminated string.
 *
 * Effects:
 *   Given a URI from an HTTP proxy GET request (i.e., a URL), extract the
 *   host name, port, and path name.  Create strings containing the host name,
 *   port, and path name, and return them through the parameters "hostnamep",
 *   "portp", "pathnamep", respectively.  (The caller must free the memory
 *   storing these strings.)  Return -1 if there are any problems and 0
 *   otherwise.
 */
static int
parse_uri(const char *uri, char **hostnamep, char **portp, char **pathnamep)
{
	const char *pathname_begin, *port_begin, *port_end;

	if (strncasecmp(uri, "http://", 7) != 0)
		return (-1);

	/* Extract the host name. */
	const char *host_begin = uri + 7;
	const char *host_end = strpbrk(host_begin, ":/ \r\n");
	if (host_end == NULL)
		host_end = host_begin + strlen(host_begin);
	int len = host_end - host_begin;
	char *hostname = Malloc(len + 1);
	strncpy(hostname, host_begin, len);
	hostname[len] = '\0';
	*hostnamep = hostname;

	/* Look for a port number.  If none is found, use port 80. */
	if (*host_end == ':') {
		port_begin = host_end + 1;
		port_end = strpbrk(port_begin, "/ \r\n");
		if (port_end == NULL)
			port_end = port_begin + strlen(port_begin);
		len = port_end - port_begin;
	} else {
		port_begin = "80";
		port_end = host_end;
		len = 2;
	}
	char *port = Malloc(len + 1);
	strncpy(port, port_begin, len);
	port[len] = '\0';
	*portp = port;

	/* Extract the path. */
	if (*port_end == '/') {
		pathname_begin = port_end;
		const char *pathname_end = strpbrk(pathname_begin, " \r\n");
		if (pathname_end == NULL)
			pathname_end = pathname_begin + strlen(pathname_begin);
		len = pathname_end - pathname_begin;
	} else {
		pathname_begin = "/";
		len = 1;
	}
	char *pathname = Malloc(len + 1);
	strncpy(pathname, pathname_begin, len);
	pathname[len] = '\0';
	*pathnamep = pathname;

	return (0);
}

/*
 * Requires:
 *   The parameter "sockaddr" must point to a valid sockaddr_in structure.  The
 *   parameter "uri" must point to a properly NUL-terminated string.
 *
 * Effects:
 *   Returns a string containing a properly formatted log entry.  This log
 *   entry is based upon the socket address of the requesting client
 *   ("sockaddr"), the URI from the request ("uri"), and the size in bytes of
 *   the response from the server ("size").
 */
static char *
create_log_entry(const struct sockaddr_in *sockaddr, const char *uri, int size)
{
	struct tm result;

	/*
	 * Create a large enough array of characters to store a log entry.
	 * Although the length of the URI can exceed MAXLINE, the combined
	 * lengths of the other fields and separators cannot.
	 */
	const size_t log_maxlen = MAXLINE + strlen(uri);
	char *const log_str = Malloc(log_maxlen + 1);

	/* Get a formatted time string. */
	time_t now = time(NULL);
	int log_strlen = strftime(log_str, MAXLINE, "%a %d %b %Y %H:%M:%S %Z: ",
	    localtime_r(&now, &result));

	/*
	 * Convert the IP address in network byte order to dotted decimal
	 * form.
	 */
	Inet_ntop(AF_INET, &sockaddr->sin_addr, &log_str[log_strlen],
	    INET_ADDRSTRLEN);
	log_strlen += strlen(&log_str[log_strlen]);

	/*
	 * Assert that the time and IP address fields occupy less than half of
	 * the space that is reserved for the non-URI fields.
	 */
	assert(log_strlen < MAXLINE / 2);

	/*
	 * Add the URI and response size onto the end of the log entry.
	 */
	snprintf(&log_str[log_strlen], log_maxlen - log_strlen, " %s %d", uri,
	    size);

	return (log_str);
}

/*
 * Requires:
 *   The parameter "fd" must be an open socket that is connected to the client.
 *   The parameters "cause", "short_msg", and "long_msg" must point to properly
 *   NUL-terminated strings that describe the reason why the HTTP transaction
 *   failed.  The string "short_msg" may not exceed 32 characters in length,
 *   and the string "long_msg" may not exceed 80 characters in length.
 *
 * Effects:
 *   Constructs an HTML page describing the reason why the HTTP transaction
 *   failed, and writes an HTTP/1.0 response containing that page as the
 *   content.  The cause appearing in the HTML page is truncated if the
 *   string "cause" exceeds 2048 characters in length.
 */
static void
client_error(int fd, const char *cause, int err_num, const char *short_msg,
    const char *long_msg)
{
	char body[MAXBUF], headers[MAXBUF], truncated_cause[2049];

	assert(strlen(short_msg) <= 32);
	assert(strlen(long_msg) <= 80);
	/* Ensure that "body" is much larger than "truncated_cause". */
	assert(sizeof(truncated_cause) < MAXBUF / 2);

	/*
	 * Create a truncated "cause" string so that the response body will not
	 * exceed MAXBUF.
	 */
	strncpy(truncated_cause, cause, sizeof(truncated_cause) - 1);
	truncated_cause[sizeof(truncated_cause) - 1] = '\0';

	/* Build the HTTP response body. */
	snprintf(body, MAXBUF,
	    "<html><title>Proxy Error</title><body bgcolor=""ffffff"">\r\n"
	    "%d: %s\r\n"
	    "<p>%s: %s\r\n"
	    "<hr><em>The COMP 321 Web proxy</em>\r\n",
	    err_num, short_msg, long_msg, truncated_cause);

	/* Build the HTTP response headers. */
	snprintf(headers, MAXBUF,
	    "HTTP/1.0 %d %s\r\n"
	    "Content-type: text/html\r\n"
	    "Content-length: %d\r\n"
	    "\r\n",
	    err_num, short_msg, (int)strlen(body));

	/* Write the HTTP response. */
	if (rio_writen(fd, headers, strlen(headers)) != -1)
		rio_writen(fd, body, strlen(body));
}

// Prevent "unused function" and "unused variable" warnings.
static const void *dummy_ref[] = { client_error, create_log_entry, dummy_ref,
    parse_uri };

COMP 321 Project 6: Web Proxy

Student Information

[Replace with your name(s) and NetID(s).]

Design Discussion

1. Describe the steps taken by your proxy to service a single HTTP transaction from beginning to end. (10-14 sentences)

When a client connects, the main thread accepts the connection and inserts the connected file descriptor into a bounded buffer. A pre-created worker thread removes the descriptor from the buffer and begins handling the request. The worker first obtains the client's address using getpeername for later logging. It then initializes a RIO read buffer on the client socket and reads the full request line, handling lines longer than MAXLINE via dynamic memory allocation. The request line is parsed to extract the method, URI, and version. If the method is not GET, or the URI cannot be parsed, the proxy sends an appropriate error response to the client and closes the connection. Otherwise, parse_uri extracts the hostname, port, and path from the URI. The proxy builds a new HTTP request with the path (not the full URI) in the start line, adds a Host header and Connection: close header, and forwards any other client headers (stripping Connection, Keep-Alive, and Proxy-Connection). The proxy opens a connection to the end server using open_clientfd with the parsed hostname and port. It writes the constructed request to the server using rio_writen. It then enters a loop reading the server's response using rio_readn and forwarding each chunk to the client via rio_writen, counting the total bytes received. After the server closes the connection (EOF), the proxy creates a log entry using create_log_entry and writes it to the log file under mutex protection. Finally, the proxy closes both the server and client sockets and frees all dynamically allocated memory.

2. Did you modify the first line of the request message? If so, how? (1-3 sentences)

Yes. The proxy rewrites the request line to use only the path portion of the URI instead of the full URL, and changes the HTTP version to HTTP/1.0. For example, "GET http://www.example.com/page HTTP/1.1" becomes "GET /page HTTP/1.0".

3. Did you add/remove/modify any request headers? If so, how? (1-3 sentences)

The proxy strips the Connection, Keep-Alive, Proxy-Connection, and Host headers from the client's request. It then adds its own Host header (constructed from the parsed URI hostname and port) and a Connection: close header to ensure the end server closes the connection after responding.

4. How did you forward the response message? (2-4 sentences)

The proxy reads the entire response from the end server using rio_readn in a loop with MAXLINE-sized chunks, forwarding each chunk to the client via rio_writen. It counts the total number of bytes received from the server for logging purposes. If writing to the client fails (e.g., broken pipe), the proxy sets a flag and continues draining the server response to get an accurate byte count. Since we send Connection: close, the server closes its end after sending the full response, causing rio_readn to return 0 and terminate the loop.

6. How many threads did your proxy use to implement concurrency? Explain how you chose this number. (3-6 sentences)

The proxy uses 8 pre-created worker threads plus the main thread, for 9 threads total. The main thread handles only accept calls and inserts connected descriptors into the bounded buffer. The 8 worker threads each loop removing descriptors and handling requests. I chose 8 worker threads as a reasonable number that provides good concurrency for typical web proxy workloads without excessive thread overhead. The bounded buffer has a capacity of 64 slots, which allows the main thread to queue up connections even when all workers are busy. Using prethreading avoids the overhead of creating and destroying threads for each connection.

7. How did you write to the access log file? (1-2 sentences)

After completing a transaction, the worker thread calls create_log_entry to format the log string, then uses fprintf to write the entry followed by a newline to the log file, and fflush to ensure the entry is written to disk immediately.

8. How do you ensure atomicity when writing to the access log file? (1-2 sentences)

A pthread mutex (log_mutex) is acquired before writing to the log file and released after the write and flush complete. This ensures that log entries from different worker threads do not interleave.

9. When did you open/close the access log file? (2 sentences)

The log file is opened once in main during initialization using Fopen in append mode, before any worker threads are created. The log file is never explicitly closed since the proxy runs indefinitely until terminated, at which point the OS reclaims all resources.

Section 1: HTTP Transaction Steps (10 points)

Score: 10/10 points

Assessment:

What was provided: A comprehensive, 14-sentence description covering the entire HTTP transaction lifecycle, including prethreading details, request parsing, URI rewriting, header modification, server connection, response forwarding, logging, and cleanup.

Strengths:

Covers all major steps in logical order: accept → buffer → worker pickup → read request → parse → connect → forward → read response → forward response → log → close → free
Includes prethreading details (bounded buffer, worker thread removal) which shows deep understanding of the concurrent architecture
Mentions important implementation details: dynamic memory allocation for long lines, getpeername for logging, stripping persistent-connection headers
Addresses error handling (non-GET methods, unparsable URIs)
Mentions mutex protection for log writing
Discusses memory management (freeing dynamically allocated memory)

Errors/Gaps:

None significant. The answer is thorough and well-organized.

Detailed Feedback: This is an excellent answer that covers all expected steps with specific implementation details. The student demonstrates a strong understanding of both the HTTP protocol mechanics and the systems-level concerns (RIO I/O, memory management, synchronization). The answer naturally incorporates the concurrent design rather than treating it as separate from the transaction flow.

Rubric Breakdown:

Accept connection: ✓ (accept + bounded buffer insertion)
Read/parse request: ✓ (RIO buffer, request line parsing, method/URI/version extraction)
Parse URI: ✓ (hostname, port, path via parse_uri)
Connect to server: ✓ (open_clientfd)
Forward request: ✓ (rewritten request line, modified headers, rio_writen)
Read response: ✓ (rio_readn loop, byte counting)
Forward response: ✓ (rio_writen chunks to client)
Log transaction: ✓ (create_log_entry, mutex-protected write)
Close connections: ✓ (both server and client sockets)
Error handling: ✓ (non-GET, unparsable URI → error response)
Cleanup: ✓ (free dynamically allocated memory)

---

Section 2: Request Line Modification (4 points)

Score: 4/4 points

Assessment:

What was provided: A clear, concise explanation stating both modifications with a concrete example.

Strengths:

Explicitly identifies both modifications: absolute URI → relative path, and HTTP/1.1 → HTTP/1.0
Provides a concrete before/after example
Concise and precise

Errors/Gaps:

None.

Detailed Feedback: The answer perfectly matches the expected response. The student clearly explains both the URI conversion and the version downgrade, and the example makes the transformation unambiguous.

Rubric Breakdown:

URI conversion (absolute → relative): 2/2
HTTP version change (1.1 → 1.0): 2/2

---

Section 3: Header Modification (4 points)

Score: 4/4 points

Assessment:

What was provided: A detailed explanation of header stripping and addition, covering Connection, Keep-Alive, Proxy-Connection, and Host headers.

Strengths:

Specifically lists all headers that are stripped: Connection, Keep-Alive, Proxy-Connection, Host
Explains that Host is re-added with the parsed hostname and port
Explains that Connection: close is added
Provides reasoning (ensure server closes connection after responding)

Errors/Gaps:

None significant. Stripping the original Host header and re-adding a constructed one is a valid and careful approach.

Detailed Feedback: This is a complete answer that covers all the key header modifications. The student demonstrates understanding of why these modifications are necessary—particularly the need to prevent persistent connections and to ensure the Host header matches the target server.

Rubric Breakdown:

Host header handling: ✓ (strip original, add constructed one)
Connection header modification: ✓ (strip original, add "Connection: close")
Additional headers (Keep-Alive, Proxy-Connection): ✓ (stripped)

---

Section 4: Response Forwarding (4 points)

Score: 4/4 points

Assessment:

What was provided: A detailed 4-sentence explanation of the response forwarding strategy, including reading approach, chunk-based forwarding, error handling, and EOF detection.

Strengths:

Clearly describes the reading strategy: rio_readn in a loop with MAXLINE-sized chunks
Explains byte counting for logging
Addresses error handling: broken pipe detection, draining server response for accurate byte count
Explains reliance on Connection: close for EOF-based termination
Shows understanding that the proxy doesn't need Content-Length parsing when using read-until-EOF

Errors/Gaps:

None. The approach is sound and well-explained.

Detailed Feedback: This is an excellent answer. The student describes a straightforward but robust forwarding strategy. The detail about continuing to drain the server response even after a client write failure (to get an accurate byte count) shows sophisticated systems thinking. The explanation of how Connection: close enables the read-until-EOF approach demonstrates understanding of the protocol mechanics.

Rubric Breakdown:

Reading strategy (EOF-based with rio_readn): 1.5/1.5
Buffering/forwarding approach (chunk-based): 1/1
Error handling (broken pipe, draining): 1.5/1.5

---

Section 5: Thread Count (6 points)

Score: 6/6 points

Assessment:

What was provided: A 6-sentence explanation covering thread count, architecture, and justification including bounded buffer sizing.

Strengths:

States specific number: 8 worker threads + 1 main thread = 9 total
Clearly describes the prethreading architecture (main thread accepts, workers process)
Explains bounded buffer capacity (64 slots) and its purpose
Provides practical justification: good concurrency without excessive overhead
Notes the advantage of prethreading over thread-per-connection (avoids creation/destruction overhead)

Errors/Gaps:

The justification for specifically choosing 8 could be slightly stronger (e.g., relating to CPU cores or benchmarking), but the practical reasoning given is adequate.

Detailed Feedback: The answer is thorough and well-structured. The student demonstrates clear understanding of the prethreading model from CS:APP 12.5.5. The explanation covers the role of each thread type, the bounded buffer design, and the rationale for the approach. The justification, while not tied to specific hardware metrics, is reasonable and practical.

Rubric Breakdown:

Specific thread count stated: ✓ (8 workers + 1 main)
Design choice explained (prethreading): ✓
Reasoning for number: ✓ (reasonable concurrency, avoids overhead)
Additional design details (buffer size, architecture): ✓

---

Section 6: Log File Writing (4 points)

Score: 4/4 points

Assessment:

What was provided: A clear description of the logging mechanism including formatting, writing, and flushing.

Strengths:

Specifies create_log_entry for formatting
Identifies fprintf as the I/O function
Mentions newline addition (important since create_log_entry doesn't add one)
Mentions fflush for immediate disk persistence
States when writes occur (after completing a transaction)

Errors/Gaps:

None.

Detailed Feedback: The answer covers all expected elements: the I/O function (fprintf), the format (via create_log_entry + newline), and the timing (after transaction completion). The mention of fflush shows awareness of buffering concerns in a multi-threaded environment.

Rubric Breakdown:

I/O function specified (fprintf): ✓
Format/content described: ✓
Timing of writes: ✓
Flushing behavior: ✓ (bonus detail)

---

Section 7: Log File Atomicity (4 points)

Score: 4/4 points

Assessment:

What was provided: A concise 2-sentence explanation of mutex-based synchronization for log file access.

Strengths:

Identifies the specific mechanism: pthread mutex (log_mutex)
Describes the acquire-write-flush-release protocol
Explains the purpose: preventing interleaving of log entries from different threads

Errors/Gaps:

None.

Detailed Feedback: This is a textbook-correct answer. The student uses a pthread mutex, which aligns with the project requirement to use Pthread mutex and condition variables (not semaphores). The description clearly conveys the critical section: acquire mutex → write + flush → release mutex.

Rubric Breakdown:

Valid synchronization mechanism (pthread mutex): ✓
How it's used (acquire before write, release after): ✓
Purpose explained (prevent interleaving): ✓

---

Section 8: Log File Open/Close Timing (4 points)

Score: 4/4 points

Assessment:

What was provided: A 2-sentence explanation of the file open/close strategy.

Strengths:

Clearly states: opened once during initialization in main, before worker threads are created
Specifies append mode (Fopen in append mode)
Addresses the close timing honestly: not explicitly closed, OS reclaims on termination
Consistent with project instructions (open once, not per-write)

Errors/Gaps:

The lack of explicit close could be seen as a minor concern, but the student's reasoning is valid—the proxy runs indefinitely and the OS handles cleanup. This is an acceptable design choice.

Detailed Feedback: The answer correctly follows the project requirement that the log file be opened only once during initialization. The student's choice not to explicitly close the file is pragmatic for a long-running server process and is a valid design decision. The mention of append mode shows attention to correctness (entries are preserved across restarts).

Rubric Breakdown:

When opened: ✓ (once at startup, before threads)
When closed: ✓ (OS reclaims on termination—valid reasoning)
Reasoning: ✓ (efficiency, proxy runs indefinitely)

---

Overall Summary

Section	Score	Notes
1. HTTP Transaction Steps	10/10	Comprehensive, all steps covered with implementation details
2. Request Line Modification	4/4	Both URI conversion and version change clearly explained
3. Header Modification	4/4	All relevant headers addressed with reasoning
4. Response Forwarding	4/4	Strong explanation including error handling and EOF strategy
5. Thread Count	6/6	Clear prethreading design with practical justification
6. Log File Writing	4/4	Complete: function, format, timing, and flushing
7. Log File Atomicity	4/4	Correct mutex-based synchronization
8. Log File Open/Close Timing	4/4	Opened once at startup, valid close strategy
Total	40/40

General Comments:

This is an outstanding submission that demonstrates deep understanding of network programming, concurrent systems design, and the HTTP protocol. Every answer is technically accurate, specific, and well-organized. The student consistently goes beyond the minimum by providing implementation-level details (e.g., specific function names like rio_readn, getpeername, open_clientfd), addressing edge cases (e.g., broken pipe handling, long URIs), and explaining design rationale. The answers reflect genuine implementation experience rather than theoretical knowledge alone. The prethreading architecture is correctly described throughout, and the synchronization strategy is sound. Note: Question numbering in the submission skips from 4 to 6, but all 8 required sections are answered.

Total: 40/40

.mcp.json

diff --git a/.mcp.json b/.mcp.json
deleted file mode 100644
index 6d27c9a..0000000
--- a/.mcp.json
+++ /dev/null
@@ -1,11 +0,0 @@
-{
-  "mcpServers": {
-    "bscs-bench": {
-      "command": "/Users/bebe/Code/bscs-bench/bin/bscs-mcp",
-      "args": [
-        "--workspace",
-        "./lib b/lib
index af4ebc4..7620d8e 120000
--- a/lib
+++ b/lib
@@ -1 +1 @@
-/Users/bebe/Code/bscs-bench/comp321/lib
\ No newline at end of file
+/opt/csapp
\ No newline at end of file

proxy.log

diff --git a/proxy.log b/proxy.log
deleted file mode 100644
index 7e9b90a..0000000
--- a/proxy.log
+++ /dev/null
@@ -1,6 +0,0 @@
-Mon 02 Mar 2026 03:36:06 PST: 0.0.0.0 http://neverssl.com/ 373
-Mon 02 Mar 2026 03:36:10 PST: 0.0.0.0 http://neverssl.com/ 373
-Mon 02 Mar 2026 03:36:12 PST: 0.0.0.0 http://www.openoffice.org/ 44030
-Mon 02 Mar 2026 03:36:13 PST: 0.0.0.0 http://www.squid-cache.org/ 7407
-Mon 02 Mar 2026 03:36:15 PST: 0.0.0.0 http://www.unm.edu/ 119
-Mon 02 Mar 2026 03:36:16 PST: 0.0.0.0 http://www.washington.edu/ 77928

Duration

28m 59s

Turns

Cost

$5.16

Status

Success

API Time

20m 26s

Tokens

5.2M

Claude Code

v2.1.63

Sub-Model Usage

Model	Input	Output	Cache Read	Cost
claude-opus-4-6	50	67.5K	5.0M	$5.02
claude-haiku-4-5-20251001	47	3.5K	544.9K	$0.14