agent_trace — a structured record of what your agent did. They’re framework-agnostic: works with LangChain, LlamaIndex, CrewAI, or any custom agent.
Setting up an agent trace
ToolCallAccuracy
Checks that the agent called the expected tools. By default, order doesn’t matter (set match).require_order=False | Score |
|---|---|
| All expected tools called | 1.0 |
| Half expected tools called | 0.5 |
| No expected tools called | 0.0 |
require_order=True, uses sequence alignment — partially correct order scores between 0 and 1.
ToolArgumentAccuracy
LLM judge that evaluates whether the arguments passed to tools were appropriate and well-formed.PlanQuality
LLM judge that evaluates the overall quality of the agent’s plan — logic, completeness, and efficiency.- Does the plan address the task?
- Are the steps in a logical order?
- Are there unnecessary or redundant steps?
- Is anything missing?

