# multivon-eval ## Docs - [Agent Evaluators](https://evaldocs.multivon.ai/evaluators/agent.md): Evaluate tool use, planning, and task completion in agentic systems. - [Compliance Evaluators](https://evaldocs.multivon.ai/evaluators/compliance.md): Local PII detection and schema validation — no API calls required. - [Consistency Evaluators](https://evaldocs.multivon.ai/evaluators/consistency.md): Zero-resource hallucination detection via repeated sampling. - [Conversation Evaluators](https://evaldocs.multivon.ai/evaluators/conversation.md): Evaluate multi-turn chat quality across an entire session. - [Deterministic Evaluators](https://evaldocs.multivon.ai/evaluators/deterministic.md): Instant, free checks that need no LLM. - [LLM Judge Evaluators](https://evaldocs.multivon.ai/evaluators/llm-judge.md): QAG-based scoring for quality you can't measure with strings. - [CI/CD Integration](https://evaldocs.multivon.ai/guides/ci-cd.md): Run evals as a quality gate in your deployment pipeline. - [Compliance & Privacy](https://evaldocs.multivon.ai/guides/compliance.md): PII detection, schema validation, and tamper-evident audit trails — all local, no API calls. - [Custom Evaluators](https://evaldocs.multivon.ai/guides/custom-evaluators.md): Build your own evaluators by extending the base class. - [Loading Datasets](https://evaldocs.multivon.ai/guides/datasets.md): Load eval cases from JSONL and CSV files. - [Real-World Examples](https://evaldocs.multivon.ai/guides/examples.md): End-to-end walkthroughs for support bots, RAG pipelines, and coding agents. - [Experiment Tracking](https://evaldocs.multivon.ai/guides/experiments.md): Compare eval runs across model versions and catch regressions before they ship. - [Factory Suites](https://evaldocs.multivon.ai/guides/factory-suites.md): Pre-configured eval suites by use case — no evaluator selection needed. - [Synthetic Dataset Generation](https://evaldocs.multivon.ai/guides/generate.md): Generate eval cases from your docs — no labeled data required. - [Framework Integrations](https://evaldocs.multivon.ai/guides/integrations.md): Capture agent traces from LangChain, LangSmith, or any custom agent with a consistent OOP interface. - [Reliability & Flakiness Detection](https://evaldocs.multivon.ai/guides/reliability.md): Handle LLM non-determinism with multi-run evaluation, flakiness detection, and statistical significance testing. - [Statistical Rigor](https://evaldocs.multivon.ai/guides/statistical-rigor.md): Confidence intervals, power analysis, and when to trust your eval scores. - [Synthetic dataset generation](https://evaldocs.multivon.ai/guides/synthetic-data.md): Generate eval cases from raw text or files — no labeled data required. - [Introduction](https://evaldocs.multivon.ai/introduction.md): AI evaluation for teams that ship models to production. - [Quickstart](https://evaldocs.multivon.ai/quickstart.md): Run your first eval in 5 minutes. ## OpenAPI Specs - [openapi](https://evaldocs.multivon.ai/api-reference/openapi.json)