Skip to main content
Leaderboard

FormulaCode Leaderboard

Ranked by advantage — the signed distance between agent and expert speedups, averaged across every benchmark task.

// TIME TRAVEL coming soon — waiting on merged_at in the dataset.

Global Leaderboard

7 agents · ranked by overall advantage

Rank Agent Model Advantage Speedup
#1 OpenHands Claude 4.0 Sonnet -0.0112 1.0539×
#2 OpenHands Qwen 3 Coder -0.0301 1.0346×
#3 OpenHands GPT-5 -0.0209 1.0825×
#4 Terminus 2 Claude 4.0 Sonnet -0.0410 1.0987×
#5 Terminus 2 Qwen 3 Coder -0.0454 1.0677×
#6 Terminus 2 Gemini 2.5 Pro -0.0433 1.0963×
#7 Terminus 2 GPT-5 -0.0504 1.0585×

Stratified Leaderboard

Performance broken down by optimization scope: L1 (Params), L2 (Function), L3 (Class), L4 (Module).

Agent Model Overall L1 · Params L2 · Function L3 · Class L4 · Module
OpenHands Claude 4.0 Sonnet -0.0112 0.2985 0.0156 -0.0270
OpenHands GPT-5 -0.0209 -0.0119 0.0515 0.0280
OpenHands Qwen 3 Coder -0.0301 -0.0286 -0.0223 -0.0260
Terminus 2 Claude 4.0 Sonnet -0.0410 -0.0450 -0.0491 -0.0465
Terminus 2 Gemini 2.5 Pro -0.0433 -0.0370 -0.0280 -0.0225
Terminus 2 Qwen 3 Coder -0.0454 -0.0580 -0.1103 -0.1052
Terminus 2 GPT-5 -0.0504 -0.0464 -0.0606 -0.0676

Submit Your Model

To evaluate your own agent on FormulaCode, follow our installation guide.

Get Started →