About

What is BSCS Bench?

BSCS Bench is a comprehensive evaluation framework for testing AI coding agents on real university programming assignments. We evaluate agents across 54 assignments from 9 computer science courses, spanning Python, Java, C, and theoretical proof-writing.

Methodology

  • Agents receive the assignment instructions and a starter template.
  • They have access to sandboxed tools: file read/write/edit, grep, glob, and an autograder.
  • Each agent runs independently with no internet access during evaluation.
  • Grading uses the same autograder used by students, plus LLM-based grading for theoretical work.
  • Overall score = average of per-course pass rates (prevents Python-heavy courses from dominating).

Courses

See our Courses page for the full list of courses and assignments.

Submission Policy

We welcome benchmark submissions from the community. Results must include full logs and be reproducible.

Contact us to submit results for a new model.

Citation

@misc{bscsbench2026,
  title={BSCS Bench: Evaluating AI Agents on University CS Assignments},
  author={Lockyer, Charlie},
  year={2026},
  url={https://bscsbench.com}
}

Team

BSCS Bench was created by Charlie Lockyer and is not affiliated with Rice University.