About

What is BSCS Bench?

BSCS Bench is a comprehensive evaluation framework for testing AI coding agents on real university programming assignments. We evaluate agents across 66 assignments from 11 computer science courses, spanning Python, Java, C, and theoretical proof-writing.

Methodology

Agents receive the assignment instructions and a starter template.
They have access to sandboxed tools: file read/write/edit, grep, glob, and an autograder.
Each agent runs independently with no internet access during evaluation.
Grading uses the same autograder used by students, plus LLM-based grading for theoretical work.
Overall score = average of per-course pass rates (prevents Python-heavy courses from dominating).

Courses

See our Courses page for the full list of courses and assignments.

Submission Policy

We welcome benchmark submissions from the community. Results must include full logs and be reproducible.

We are also interested in submissions covering coursework from other universities.

For submission details and benchmark materials, see github.com/charlielockyer-rice/bscs-bench-public.

Citation

@misc{bscsbench2026,
  title={BSCS Bench: Evaluating AI Agents on University CS Assignments},
  author={Lockyer, Charlie},
  year={2026},
  url={https://bscsbench.com}
}

Team

BSCS Bench was created by Charlie Lockyer and is not affiliated with Rice University.