Agents¶
Agents are the core abstraction for interacting with tasks. Every agent receives a task instruction and a tmux session, and must return an AgentResult.
Base Agent¶
BaseAgent
¶
Bases: ABC
Source code in fceval/agents/base_agent.py
version
property
¶
The version of the agent. Can be any string (e.g. could be a date or a semantic version, a single digit, etc.).
Can be dynamic based on a kwarg in the constructor.
Examples:
Default version (latest, or undefined)¶
agent = ClaudeCodeAgent() agent.version # -> "latest"
Custom version¶
agent = ClaudeCodeAgent(version="1.2.3") agent.version # -> "1.2.3"
With other parameters¶
agent = ClaudeCodeAgent( model_name="anthropic/claude-3-sonnet-20240229", version="0.5.0" ) agent.version # -> "0.5.0"
The version, if defined, is used in the installation script: npm install -g @anthropic-ai/claude-code@{version}
prompt_template
property
¶
The path to a custom prompt template file. If specified, this template will be used to render the instruction before passing it to the agent.
The template must be a Jinja2 template that includes an "instruction" variable.
Examples:
Default behavior (no template)¶
agent = OpenHandsAgent() agent.prompt_template # -> None
Custom template¶
agent = OpenHandsAgent(prompt_template="./custom_prompt.j2") agent.prompt_template # -> "./custom_prompt.j2"
Usage with CLI
uv run tb run --agent openhands --task-id hello-world --agent-kwarg prompt_template=./openhands_template.j2
name
abstractmethod
staticmethod
¶
perform_task
abstractmethod
¶
Source code in fceval/agents/base_agent.py
AgentResult
¶
Bases: BaseModel
total_input_tokens
class-attribute
instance-attribute
¶
total_input_tokens = Field(default=0, description='The total number of input tokens used by the agent to complete the task.')
total_output_tokens
class-attribute
instance-attribute
¶
total_output_tokens = Field(default=0, description='The total number of output tokens used by the agent to complete the task.')
total_cost
class-attribute
instance-attribute
¶
total_cost = Field(default=0.0, description='The total cost in USD of all LLM calls made by the agent.')
failure_mode
class-attribute
instance-attribute
¶
failure_mode = Field(default=NONE, description="The failure mode of the agent's execution, if any.")
timestamped_markers
class-attribute
instance-attribute
¶
timestamped_markers = Field(default=[], description="A list of timestamped markers from the agent's execution of the task. Each marker is a tuple of (timestamp, marker_text). The timestamp is the time in seconds since the start of the task, and the marker_text is the text of the marker. The timestamp can be found by calling `get_asciinema_timestamp()` on the `TmuxSession` object.")
Agent Factory¶
AgentFactory
¶
get_agent
staticmethod
¶
Source code in fceval/agents/agent_factory.py
get_agent_class
staticmethod
¶
Source code in fceval/agents/agent_factory.py
get_agent_from_import_path
staticmethod
¶
Source code in fceval/agents/agent_factory.py
Agent Name¶
AgentName
¶
Bases: Enum
Failure Modes¶
FailureMode
¶
Bases: Enum
Built-in Agents¶
Oracle Agent¶
OracleAgent
¶
Bases: BaseAgent
Source code in fceval/agents/oracle_agent.py
name
staticmethod
¶
perform_task
¶
Source code in fceval/agents/oracle_agent.py
Nop Agent¶
NopAgent
¶
Naive Agent¶
NaiveAgent
¶
Bases: BaseAgent
Source code in fceval/agents/naive_agent.py
name
staticmethod
¶
perform_task
¶
Source code in fceval/agents/naive_agent.py
Terminus-2¶
Terminus2
¶
Terminus2(model_name, max_episodes=None, parser_name='json', api_base=None, temperature=0.7, model_kwargs=None, **kwargs)
Bases: BaseAgent
Source code in fceval/agents/terminus_2/terminus_2.py
name
staticmethod
¶
perform_task
¶
perform_task(instruction, session, logging_dir=None, time_limit_seconds=None, portkey_metadata=None, portkey_trace_id=None)
Source code in fceval/agents/terminus_2/terminus_2.py
Installed Agents¶
AbstractInstalledAgent
¶
Bases: BaseAgent, ABC
The container agent logs path is mounted to the host. Your agent can log to it if you want to view trajectories after the task finishes.
Source code in fceval/agents/installed_agents/abstract_installed_agent.py
perform_task
¶
Source code in fceval/agents/installed_agents/abstract_installed_agent.py
272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 | |