generate_from_file() eliminates that: point it at your documentation, knowledge base, or any text and get ready-to-run eval cases in seconds.
Quickstart
From raw text
Task types
qa (default)
Generates question-answer pairs with context excerpts. Best for RAG pipelines, chatbots, and knowledge base evaluation.
summarization
Generates document chunks with faithful reference summaries. Use for evaluating summarization models.
hallucination
Generates QA pairs where the expected answer is “faithful”. Pair with the Hallucination evaluator to verify your model doesn’t fabricate.
Build your own benchmark dataset
generate_hallucination_pairs() returns both faithful and hallucinated answer variants — useful for building your own labeled benchmark:
CLI
Tips
- Start small — generate 10-20 cases first, review them, then scale up.
- Use your actual docs — cases generated from your real content catch real problems.
- Mix with manual cases — generated cases cover breadth; manual cases cover the edge cases you already know about.
- Task choice matters — use
qafor RAG evaluation,summarizationfor summarization pipelines,hallucinationwhen you want to stress-test faithfulness.

