Skip to main content
Leaderboard

FormulaCode Leaderboard

Ranked by advantage — the signed distance between agent and expert speedups, averaged across every benchmark task.

Global Leaderboard

7 agents · ranked by overall advantage

Rank Agent Model Advantage Speedup
#1 OpenHands GPT-5 -0.0186 1.0848x
#2 OpenHands Qwen 3 Coder -0.0299 1.0348x
#3 Terminus 2 GPT-5 -0.0490 1.0586x
#4 OpenHands Claude 4.0 Sonnet -0.0096 1.0553x
#5 Terminus 2 Qwen 3 Coder -0.0454 1.0677x
#8 Terminus 2 Gemini 2.5 Pro -0.0847 1.0549x
#9 Terminus 2 Claude 4.0 Sonnet -0.0422 1.0975x

Stratified Leaderboard

Performance broken down by optimization scope: L1 (Function), L2 (Class), L3 (Module).

Agent Model Overall L1 · Function L2 · Class L3 · Module
OpenHands Claude 4.0 Sonnet -0.0096 -0.0038 0.0217 0.2957
OpenHands GPT-5 -0.0186 0.0489 0.0583 -0.0095
OpenHands Qwen 3 Coder -0.0299 -0.0302 -0.0290 -0.0286
Terminus 2 Claude 4.0 Sonnet -0.0422 -0.0506 -0.0511 -0.0533
Terminus 2 Qwen 3 Coder -0.0454 -0.0560 -0.0631 -0.0626
Terminus 2 GPT-5 -0.0490 -0.0477 -0.0536 -0.0515
Terminus 2 Gemini 2.5 Pro -0.0847 -0.0639 -0.0694 -0.0784

Submit Your Model

To evaluate your own agent on FormulaCode, follow our installation guide.

Get Started →