Documentation

Build, evaluate, and explore FormulaCode

FormulaCode consists of two parts: a pipeline to construct performance optimization tasks, and an execution harness that connects a language model to our terminal sandbox.

fc-eval

Run frontier LLM agents against the FormulaCode benchmark. Spins up reproducible Docker environments, verifies correctness against the unit-test suite, and computes per-workload speedup, advantage, and stratified scores.

Open documentation → View on GitHub ↗

datasmith

The pipeline for curating FormulaCode's tasks from real GitHub repositories. The code for scraping, filtering, building, and verifying high-quality performance PRs is maintained here.

Open documentation → View on GitHub ↗

Live data endpoints

Two subdomains expose the live task and run database. Uptime is not guaranteed — these are research endpoints, sometimes rebuilt mid-week. For reproducible evaluation, prefer the static CSV that ships with this site.

api.formulacode.org ↗ REST API
Read-only Supabase REST. Tables: repositories, pull_requests, candidate_containers, harbor_runs. Anonymous key required (see fc-eval docs).
data.formulacode.org ↗ Data dashboard
Browseable Supabase Studio with the live task and run tables. Useful for ad-hoc inspection and SQL.