Back to Claude Sonnet 4.6

Agent Work: Stock Prediction

Claude Sonnet 4.6 · COMP 140: Computational Thinking

Module 3: Stock Market Prediction

The goal of this assignment is to use the computational thinking process to solve a real-world problem. We will follow the process of: 1. Reading and understanding the problem description. 2. Determining the inputs and outputs. 3. Decomposing the problem into subproblems, as appropriate. 4. Designing a computational recipe (algorithm) to solve the subproblems/problem. 5. Implementing your solution.

Be sure to read the entire assignment before beginning.

Testing Your Solution

Use the grade tool to test your implementation:

bin/grade ./workspaces/<your_workspace>

Or if working in a workspace, simply use the grade tool provided by the agent harness.

Important Note: Because there is randomness in this assignment, the tests must run your code many times to ensure it is working properly. This means that it will take longer to test your code (possibly a minute or more). Further, if you have print statements in your predict function or anything that it calls, tests may time out.

---

1. The Problem

In this assignment, you will predict the behavior of stocks in the stock market. You will first build a model of a particular stock's (or stock index's) behavior based on past performance. You will then use that model to predict what the stock's price will do (go up a little/lot, go down a little/lot) for the next few days in the future.

2. A Solution Strategy

First, make sure you understand the problem we are trying to solve. Once you do, we need to develop a solution strategy. We are going to use Markov chains to solve this problem.

We will use a Markov chain as a statistical model of a stock's performance. We will determine the likelihood that a stock will go up or down by building this model from the history of the stock's behavior. Modeling stock market performance is a difficult and important problem. We will only be scratching the surface of what is possible. While we will be able to get reasonably accurate predictions, you should not run off and start investing your money based on the results of this project!

It is quite difficult to predict the exact price fluctuations of a stock from day to day. Instead we will try to predict how large the change will be. Given $P_i$, the price on day $i$, and $P_{i+1}$, the price on day $i+1$, we will consider $\delta_{i+1} = (P_{i+1} - P_i) / P_i$ to be the change in price from day $i$ to day $i+1$. Using our model, we will try to predict in which of the following four ranges $\delta_{i+1}$ will lie: 1. $(-\infty, -0.01)$ (larger than 1% decrease) 2. $[-0.01, 0)$ 3. $[0, 0.01)$ 4. $[0.01,\infty)$ (1% or larger increase)

Note that the mathematical notation $[a, b)$ means from $a$ up to, but not including, $b$. Further, $(a, b)$ means from $a$ to $b$, but not including either $a$ or $b$.

We will represent these four "bins" with the numbers 0 through 3, respectively, as indicated in the list above; *i.e.*, bin 2 is $[0, 0.01)$.

To make this prediction, we will construct an $n^{th}$ order Markov chain using the history of these changes for 2 years for each of two stocks, Google (GOOG) and First Solar Inc. (FSLR), and a stock index, the Dow Jones Industrial Average (DJIA).

3. Breaking Down the Problem

You should start your implementation with the provided template (solution.py in your workspace).

A. Modeling

i. Recipe

Clearly describe the recipe for taking a sequence of numbers (corresponding to the bins 0-3 above) and a number $n$ (assume $n \ge 1$) as input. Your recipe should produce an $n^{th}$ order Markov chain as output.

You should describe this recipe in clear, concise English. Do not use any Python! You can use bulleted or numbered lists for clarity, if you desire.

ii. Code

Write a function, markov_chain(data, order), which implements your recipe. This function should take a list of numbers, the order of the model to be created (which must be at least 1), and return a dictionary. The keys of the returned dictionary should be tuples with order elements in them. The values of the returned dictionary should be another dictionary with keys that are the bin for the next day and values that are the probability of the change in stock price ($\delta_{i+1}$) being in that bin the following day. For example, the return value of the function with order = 2 might be:

{(1, 2): {1: 0.66666, 3: 0.33333}, (2, 1): {2: 0.5, 1: 0.5}, (1, 1): {2: 1.0}}

This dictionary indicates that if you are currently in a state in which the changes for the past two days have fallen into bins 1 and 2, respectively, then there is a probability of approximately $\frac{2}{3}$ that the following day will be in bin 1 and a probability of approximately $\frac{1}{3}$ that the following day will be in bin 3. Note that the transition probabilities from the other states that are not keys in the dictionary (such as (2, 2), (2, 3), etc.) are unknown.

As always, you may write whatever helper functions you deem necessary.

B. Predicting

i. Recipe

Clearly describe the recipe for predicting the next $m$ states given an $n^{th}$ order Markov chain and the last $n$ states. Your recipe should take a Markov chain, a list of numbers (corresponding to the bins 0-3 above) of the last $n$ states, and a number $m$ as input. Your recipe should produce a list of numbers (corresponding to the bins 0-3 above) of the next $m$ states.

If you find yourself in a state that is not in the Markov chain, you should randomly predict that the next day's change will be in any of the 4 bins with equal probability.

You should describe this recipe in clear, concise English. Do not use any Python! You can use bulleted or numbered lists for clarity, if you desire.

ii. Code

Write a function, predict(model, last, num) which implements your recipe. The model parameter should be a Markov chain in the form produced by markov_chain. The last parameter should be a list of numbers (0-3) whose length is the same as the order of the model. The num parameter is the number of states to predict in the future. The function should return a list of numbers (0-3) of length num.

Note: You may not use Python's random.choices function when implementing this function. You must implement your own method of making a random weighted choice.

C. Computing Error

Write a function, mse(result, expected), that takes two sequences of equal length as input and returns the mean squared error.

4. Running Experiments

A. Code

Write a function, run_experiment(train, order, test, future, actual, trials), that uses the code you have written so far to predict the behavior of a given stock or index. The parameters to this function are as follows:

  • train: a list of historical data of how the stock's price has changed. Each element is a number from 0-3, corresponding to the bins above.
  • order: the order of the Markov chain that should be used.
  • test: a list of historical data of how the stock's price has changed. This is in the same format as the train data, but is from a different time period.
  • future: the number of days that you should predict into the future, past the test data.
  • actual: the actual behavior of the stock for future days past the end of the test data.
  • trials: the number of times you should run the experiment.

The function should return the average of the mean squared error across all trials runs. This gives you an indication of how well your predictor is working.

Note: This function is actually quite simple, as it primarily makes use of code that you have already written. If you are writing anything complex, you need to step back and look at what you have already written for part 3!

B. Experiments

Run the experiments that are given in the template file. The provided code calls your run_experiment function with different values of order so you can see how well each model predicts each stock. Report the results in your writeup along with your recipes and discussion.

5. Discussion (Written Response)

Discuss the results of the experiments you ran in 4.B (be sure to include the results). In particular, discuss the following questions: 1. What is the order of the model that works best for each stock/index? If the orders are not the same, discuss why that might be the case. 2. Which stock/index can you predict with the lowest error? Based on the plots of the day-to-day change in stocks and the histogram of bins, can you guess why that stock/index is easiest to predict? 3. Given that we have divided the day-to-day price change into 4 bins, how many possible states are there in an $n^{th}$ order Markov chain for predicting the change in stock price? 4. The training data we gave you covers two years of data, with 502 data points per stock/index. With that data, is it possible to see all of the possible states in an $n^{th}$ order Markov chain? What are the constraints on $n$? How do you think it would affect the accuracy of the model if there were not enough data?

Answer these questions in clear, precise English. Do not write more than a few sentences for each question.

Write your response in writeup.md.

--- *COMP 140: Computational Thinking, Rice University*