multivon-eval is an evaluation library that produces evidence. It is not a compliance certification, and it does not absolve the deploying organization of any obligation. When configured against a regulated framework, its output is the kind of record an auditor can use to assess conformance with the EU AI Act, NIST AI RMF, or HIPAA Security Rule technical safeguards. This page is the scope statement. The pages that follow (EU AI Act, Audit trail, Compliance Bundle) get specific.Documentation Index
Fetch the complete documentation index at: https://evaldocs.multivon.ai/llms.txt
Use this file to discover all available pages before exploring further.
What multivon-eval is, mechanically
When you callComplianceReporter.record(report), the library appends a JSON record to a local NDJSON file. The record contains:
| Field | What it captures |
|---|---|
record_id | 12-char hex ID (truncated from a UUID4) — sufficient for cross-referencing within a suite’s log; pair with timestamp and suite_name for global uniqueness |
timestamp | UTC ISO-8601 at record time |
framework | "eu-ai-act" / "nist-ai-rmf" / "hipaa" / "none" |
chain_version | Format version of the chained payload (currently 1) |
prev_hash | SHA-256 of the previous record’s payload (or 64 zeros for the first) |
summary | Pass/fail counts, error counts, pass rate, stability score, your tags |
evaluator_results | Per-evaluator avg score + pass rate + control mappings |
provenance | Package version, git SHA (with dirty flag), Python + OS, full SuiteLock with evaluator fingerprints, judge configs used, calibration entries hit, and the cases hash |
record_hash | SHA-256 of the entire payload above (excluding this field) |
multivon_eval/compliance.py:563–642.
The records are linked into a hash chain — deleting or editing any record mid-log invalidates every subsequent record’s prev_hash. See the audit-trail page for the algorithm and the verifier.
Data flow — where bytes go
JudgeConfig: Anthropic, OpenAI, Google, an on-prem vLLM/Ollama instance, or any OpenAI-compatible URL. If your DPIA precludes cloud judges, point JudgeConfig at a local model and no judge data leaves either.
Implication for your DPIA / RoPA: for the eval workflow, your organization is both the data controller and the data processor. Multivon is not a sub-processor. If you use a cloud LLM judge, that vendor is the sub-processor for the judge call only; the rest of the eval (cases, outputs, audit log) never reaches them.
Frameworks mapped today
| Framework | Measurable controls | Process controls | Source |
|---|---|---|---|
| EU AI Act (Regulation (EU) 2024/1689) | 5 (Art. 9(2)(b), 10(2)(f-g), 10(5), 15(1), 15(2)) | 5 (Art. 11, 12, 13, 14, 15(4-5)) | compliance.py:163–180 |
| NIST AI RMF 1.0 | 5 (MEASURE 2.3, 2.5, 2.6, 2.10, 2.11) | 5 (GOVERN 1.1, MEASURE 2.7, 2.8, 2.9, MANAGE 4.1) | compliance.py:231–245 |
| HIPAA Security Rule (45 CFR §164.312) + Privacy Rule (§164.514(b)(2) Safe Harbor) | 4 — three Security Rule technical safeguards (§164.312(a), (b), (c)) + one Privacy Rule de-identification standard (§164.514(b)(2)) | 4 (§164.308, §164.310, §164.316, BAA) | compliance.py:299–316 |
ComplianceReporter, every evaluator result gets annotated with the controls it provides evidence for. The mappings are in _EU_AI_ACT_BY_EVALUATOR, _NIST_BY_EVALUATOR, and _HIPAA_BY_EVALUATOR — these dictionaries are auditable in the source. We list an evaluator against a control only when its output is direct evidence for that control; an auditor can re-derive every claim by reading the mapping tables and the evaluator implementations they reference.
Pre-flight coverage analysis
Before you run an eval suite against a regulated system, callreporter.coverage(suite) to see exactly which controls your evaluators exercise:
compliance.py:791–821.
What multivon-eval does NOT do
We are explicit about scope so a compliance buyer doesn’t discover the boundary in the middle of an audit.- No certification. multivon-eval produces evidence; auditors decide whether evidence is sufficient. We do not issue certificates of conformity.
- No legal opinion. The Article and subcategory mappings are our best reading of the published frameworks. We are not a law firm. A regulatory question about your specific deployment should go to your legal counsel.
- No organizational governance. The process controls in each framework (training records, role assignments, incident response, third-party risk management, business associate agreements) require organizational measures — multivon-eval cannot produce them.
- No real-time monitoring. A
ComplianceReporterrecords eval runs as you trigger them. Post-deployment monitoring (NIST MANAGE 4.1) requires you to call it from a scheduled job or production loop — the library doesn’t pull metrics itself. - No PHI / PII handling promise beyond evaluator output.
PIIEvaluator(jurisdiction="hipaa")regex-matches 13 of the 18 HIPAA Safe Harbor identifiers (MRN, NPI, DEA, license, device IDs, account numbers, certificate numbers, health-plan numbers, VINs, admission/discharge dates, fax, URLs). The 5 that regex cannot reliably detect — personal names, geographic subdivisions smaller than state, full-face photos, biometric identifiers, and arbitrary unique identifying numbers/characteristics — require upstream de-identification or human review. The evaluator does not redact PHI in transit, encrypt at rest, or enforce access control. Those are infrastructure concerns owned by the deploying team. - No vendor-of-record relationship for the cloud judges. If you configure an OpenAI judge, OpenAI is your sub-processor for the judge call. multivon-eval does not proxy or wrap that relationship.
- No telemetry, no account, no callback. The library does not phone home. There is no cloud component.
When the Compliance Bundle helps
Everything described above is in the open-source library, free under Apache 2.0. The Compliance Bundle adds the human services around it: framework-mapping updates as regulations change, calibrated judge threshold packs per new model release, customer-branded auditor templates, a named technical contact with an SLA, and a legally reviewed attestation letter you can include in your compliance file. It is in early access; the page describes what it does and does not include today.Quick links
- EU AI Act — Article-by-article coverage
- Audit trail — Hash chain mechanics + verifier
- Compliance Bundle — Scope + early-access status
- Sample audit package zip (5.5 KB) — what an auditor actually receives
- Security & data handling
multivon_eval/compliance.py— full source for the reporter, control catalog, and verifier

