Dataset¶
The dataset module handles loading, filtering, and iterating over task directories.
Dataset¶
Dataset
¶
Dataset(name=None, version=None, path=None, task_ids=None, n_tasks=None, exclude_task_ids=None, registry_url=None, local_registry_path=None)
A class for loading and iterating over tasks in a dataset.
Initialize the dataset.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
name
|
str | None
|
Name of the dataset to cache from fc-eval server |
None
|
version
|
str | None
|
Version of the dataset to cache |
None
|
path
|
Path | None
|
Path to the dataset directory |
None
|
task_ids
|
list[str] | None
|
Optional list of specific task IDs or glob patterns to load. |
None
|
n_tasks
|
int | None
|
Optional number of tasks to load |
None
|
exclude_task_ids
|
list[str] | None
|
Optional list of task IDs or glob patterns to exclude. |
None
|
registry_url
|
str | None
|
The URL of the registry to use for the dataset. |
None
|
local_registry_path
|
Path | None
|
The path to the local registry file to use for the dataset. If provided, will use the local registry instead of the remote registry. |
None
|
Source code in fceval/dataset/dataset.py
config
instance-attribute
¶
config = DatasetConfig(name=name, version=version, path=path, task_ids=task_ids, n_tasks=n_tasks, exclude_task_ids=exclude_task_ids or [], registry_url=registry_url, local_registry_path=local_registry_path)
from_config
classmethod
¶
Create a Dataset instance from a DatasetConfig.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
DatasetConfig
|
DatasetConfig instance |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
A new Dataset instance |
Source code in fceval/dataset/dataset.py
from_yaml
classmethod
¶
Create a Dataset instance from a YAML configuration file.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
yaml_path
|
Path | str
|
Path to the YAML configuration file |
required |
Returns:
| Type | Description |
|---|---|
Dataset
|
A new Dataset instance |
Source code in fceval/dataset/dataset.py
__iter__
¶
__len__
¶
sort_by_duration
¶
Sort tasks by estimated duration (longest first) for optimal concurrent execution.
Source code in fceval/dataset/dataset.py
DatasetConfig
¶
Bases: BaseModel
Configuration for loading a dataset.
exclude_task_ids
class-attribute
instance-attribute
¶
validate_config
¶
Source code in fceval/dataset/dataset.py
Dataset Metadata¶
Task Models¶
Task
¶
Bases: BaseModel
author_name
class-attribute
instance-attribute
¶
author_email
class-attribute
instance-attribute
¶
category
class-attribute
instance-attribute
¶
category = Field(default='software_engineering', description='High-level category that describes the type of task this is.')
tags
class-attribute
instance-attribute
¶
tags = Field(default=[], description='Tags that describe the type of task this is. Reference other tasks to see examples.')
parser_name
class-attribute
instance-attribute
¶
parser_name = Field(default=PYTEST, description='Name of the parser to use for test results')
max_agent_timeout_sec
class-attribute
instance-attribute
¶
max_agent_timeout_sec = Field(default=360.0, description='Maximum timeout in seconds for the agent to run.')
max_test_timeout_sec
class-attribute
instance-attribute
¶
max_test_timeout_sec = Field(default=60.0, description='Maximum timeout in seconds for each individual test')
max_setup_timeout_sec
class-attribute
instance-attribute
¶
max_setup_timeout_sec = Field(default=600.0, description='Maximum timeout in seconds for the setup script')
run_tests_in_same_shell
class-attribute
instance-attribute
¶
run_tests_in_same_shell = Field(default=False, description='Run the tests in the same shell as the agent. This is useful if you need to test shell-scoped attributes.')
disable_asciinema
class-attribute
instance-attribute
¶
disable_asciinema = Field(default=False, description='Disable asciinema recording for this task while keeping Docker containerization. When enabled, the task runs in Docker but without terminal recording.')
estimated_duration_sec
class-attribute
instance-attribute
¶
estimated_duration_sec = Field(default=None, description='Estimated duration in seconds for this task based on historical data.')
expert_time_estimate_min
class-attribute
instance-attribute
¶
expert_time_estimate_min = Field(default=None, description='Estimated time in minutes for an expert in the field to solve this task using the Internet but no LLMs or oracle solutions.')
junior_time_estimate_min
class-attribute
instance-attribute
¶
junior_time_estimate_min = Field(default=None, description='Estimated time in minutes for an average junior engineer to solve this task using the Internet but no LLMs or oracle solutions.')
effective_estimated_duration_sec
property
¶
Get the estimated duration, using a calculated default if not specified.
from_yaml
classmethod
¶
to_yaml
¶
Source code in fceval/handlers/trial_handler.py
TaskPaths
¶
Manages paths for task input files and directories.
Folder structure:
input_path/ ├── task.yaml # Task configuration ├── solution.sh # Solution script (or solution.yaml) ├── run-tests.sh # Test runner script ├── run-setup.sh # Setup script ├── docker-compose.yaml # Docker configuration └── tests/ # Test directory
Source code in fceval/handlers/trial_handler.py
TrialPaths
¶
Manages paths for trial output files and directories.
Folder structure:
output_path/ └── {task_id}/ └── {trial_name}/ ├── sessions/ # Session data ├── panes/ # Terminal pane outputs │ ├── pre-agent.txt │ ├── post-agent.txt │ └── post-test.txt ├── commands.txt # Command history ├── results.json # Test results └── agent-logs/ # Agent logging directory