multivon-eval at your existing docs, knowledge base, or transcripts and have it produce ready-to-run cases. Useful for cold-starting an eval suite, expanding coverage, or building hallucination benchmarks.
Generation uses the same LLM judge backend as the rest of the SDK, so set ANTHROPIC_API_KEY or OPENAI_API_KEY before running.
From raw text
generate_from_text parameters:
| Parameter | Type | Default | Description |
|---|---|---|---|
text | string | required | Source text — docs, knowledge base, FAQ, transcripts, etc. |
n | int | 10 | Number of cases to generate. |
task | string | "qa" | One of "qa", "summarization", or "hallucination". |
context_window | int | 3000 | Max characters of source included per generation prompt. Long inputs are split into overlapping chunks. |
list[EvalCase] ready to pass to suite.add_cases().
From a file
.txt, .md, .rst, .py, etc.) and forwards to generate_from_text.
| Parameter | Type | Default | Description |
|---|---|---|---|
path | string | required | Path to the source file. |
n | int | 10 | Number of cases to generate. |
task | string | "qa" | Same task choices as generate_from_text. |
Task types
qa— produces question/answer pairs grounded in the source. EachEvalCasehasinput(the question),expected_output(the answer), andcontext(the source excerpt).summarization— produces source chunks with reference summaries.inputis the chunk,expected_outputis the expected summary.hallucination— produces faithful-answer cases withexpected_output="faithful", suitable for pairing withHallucinationorFaithfulnessevaluators.
Hallucination benchmark pairs
For building hallucination detection benchmarks (HaluEval-style), generate explicit faithful + hallucinated answer pairs:| Parameter | Type | Default | Description |
|---|---|---|---|
text | string | required | Source text to ground questions in. |
n | int | 10 | Number of pairs to generate. |
list[dict]. Each dict has:
| Key | Description |
|---|---|
question | A specific factual question answerable from the text. |
context | The relevant excerpt from the source. |
faithful_answer | An answer directly grounded in the context. |
hallucinated_answer | A plausible-sounding answer with at least one false claim. |
End-to-end example
CLI
Generate cases from the terminal and write them to JSONL:| Flag | Description |
|---|---|
--from <path> | Source file. |
--text <text> | Raw text source (alternative to --from). |
--n <int> | Number of cases. Defaults to 10. |
--task | One of qa, summarization, hallucination. Defaults to qa. |
--output, -o | Save to JSONL. If omitted, prints a preview to stdout. |

