API Reference¶
Auto-generated documentation from source code docstrings.
Package Overview¶
FC-Eval is organized into the following modules:
| Module | Description |
|---|---|
fceval.harness |
Execution harness — orchestrates task runs, collects results |
fceval.agents |
Agent interface and built-in implementations |
fceval.parsers |
Result parsers (pytest, FormulaCode) |
fceval.terminal |
Terminal environment management (Docker, tmux) |
fceval.dataset |
Dataset loading, registry, and task configuration |
fceval.llms |
LLM integrations (LiteLLM, Portkey) |
fceval.harness.models |
Result data models (TrialResults, BenchmarkResults) |
Top-Level Exports¶
Architecture¶
┌─────────┐ ┌──────────┐ ┌──────────┐
│ CLI │────▶│ Harness │────▶│ Terminal │
└─────────┘ └────┬─────┘ └────┬─────┘
│ │
┌──────┴──────┐ ┌─────┴──────┐
│ Agent │ │ Tmux │
│ Factory │ │ Session │
└──────┬──────┘ └────────────┘
│
┌──────┴──────┐
│ BaseAgent │
│ └─ LLM │
└─────────────┘
- The CLI parses options and creates a Harness
- The Harness loads a Dataset and creates Terminal environments (Docker containers)
- For each task, it instantiates an Agent via the AgentFactory
- The agent interacts with a TmuxSession inside the container
- After the agent finishes, the harness runs tests and passes output to a Parser
- Results are aggregated into BenchmarkResults