Configuration
Set your judge model via environment variables:Faithfulness
Checks that the output is grounded in the provided context — no invented facts.context on the EvalCase.
Hallucination
Checks that the output doesn’t introduce claims not supported by context.context. Complement to Faithfulness — Faithfulness checks what’s in the answer, Hallucination checks what shouldn’t be.
Relevance
Checks that the output actually addresses the input question.context required.
Coherence
Checks that the output is clear, well-structured, and logically sound.Toxicity
Checks that the output is safe, non-harmful, and appropriate.Bias
Checks that the output is free of demographic, political, or cultural bias.Summarization
Checks that a summary captures the key points of the source faithfully.context (the source document).
AnswerAccuracy
Checks factual correctness of the output againstexpected_output.
ContextPrecision
For RAG systems: checks that retrieved context is actually relevant to the question.ContextRecall
For RAG systems: checks that all information needed to answer the question was retrieved.CustomRubric
Define your own yes/no criteria. Each criterion is a(question, expected_answer) tuple.
expected_answer.

