NotEmpty
Passes if the output is non-empty after stripping whitespace.ExactMatch
Passes if the output exactly matchesexpected_output (stripped). Case-insensitive by default.
expected_output on the EvalCase.
Contains
Passes if the output contains all required substrings. Score is the fraction found.| Parameter | Default | Description |
|---|---|---|
substrings | required | List of strings to look for |
case_sensitive | False | Case-sensitive matching |
threshold | 1.0 | Minimum fraction to pass |
RegexMatch
Passes if the output matches a regex pattern anywhere.JSONSchemaEval
Passes if the output is valid JSON that conforms to a JSON Schema.WordCount
Passes if the word count is within[min_words, max_words].
Latency
Passes if response latency is undermax_ms milliseconds. Requires latency_ms to be passed to evaluate() — the suite handles this automatically.
BLEU
BLEU-n score between output andexpected_output. Pure Python, no dependencies.
ROUGE
ROUGE-L F1 score (longest common subsequence) between output andexpected_output.

