FormulaCode Leaderboard
Ranked by advantage — the signed distance between agent and
expert speedups, averaged across every benchmark task.
Global Leaderboard
7 agents · ranked by overall advantage
| Rank ▲ | Agent ⇅ | Model ⇅ | Advantage ⇅ | Speedup ⇅ |
|---|---|---|---|---|
| #1 | OpenHands | GPT-5 | -0.0186 | 1.0848x |
| #2 | OpenHands | Qwen 3 Coder | -0.0299 | 1.0348x |
| #3 | Terminus 2 | GPT-5 | -0.0490 | 1.0586x |
| #4 | OpenHands | Claude 4.0 Sonnet | -0.0096 | 1.0553x |
| #5 | Terminus 2 | Qwen 3 Coder | -0.0454 | 1.0677x |
| #8 | Terminus 2 | Gemini 2.5 Pro | -0.0847 | 1.0549x |
| #9 | Terminus 2 | Claude 4.0 Sonnet | -0.0422 | 1.0975x |
Stratified Leaderboard
Performance broken down by optimization scope: L1 (Function), L2 (Class), L3 (Module).
| Agent ⇅ | Model ⇅ | Overall ▼ | L1 · Function ⇅ | L2 · Class ⇅ | L3 · Module ⇅ |
|---|---|---|---|---|---|
| OpenHands | Claude 4.0 Sonnet | -0.0096 | -0.0038 | 0.0217 | 0.2957 |
| OpenHands | GPT-5 | -0.0186 | 0.0489 | 0.0583 | -0.0095 |
| OpenHands | Qwen 3 Coder | -0.0299 | -0.0302 | -0.0290 | -0.0286 |
| Terminus 2 | Claude 4.0 Sonnet | -0.0422 | -0.0506 | -0.0511 | -0.0533 |
| Terminus 2 | Qwen 3 Coder | -0.0454 | -0.0560 | -0.0631 | -0.0626 |
| Terminus 2 | GPT-5 | -0.0490 | -0.0477 | -0.0536 | -0.0515 |
| Terminus 2 | Gemini 2.5 Pro | -0.0847 | -0.0639 | -0.0694 | -0.0784 |
Submit
Submit Your Model
To evaluate your own agent on FormulaCode, follow our installation guide.
Get Started →