The multivon-eval library is open source under Apache 2.0 and will remain that way. It provides the mechanical compliance plumbing — chained audit logs, Article-level control mappings, calibrated judge thresholds, an auditor-attachable evidence package — that a regulated team can deploy themselves. The Compliance Bundle is a paid subscription that wraps the OSS with services that require ongoing human work: maintenance of framework mappings as regulations move, calibration of judge thresholds against new model releases, a customer-branded auditor template, a named technical contact with a written response time, and a methodology attestation reviewed by counsel in the customer’s jurisdiction. The Bundle is in early access. This page is the current scope.Documentation Index
Fetch the complete documentation index at: https://evaldocs.multivon.ai/llms.txt
Use this file to discover all available pages before exploring further.
The Bundle is currently sales-led: the first few customer engagements shape which services move from roadmap to default. Pricing is not yet a public price list — each engagement is sized against scope. If the services below match your need, the first step is a 30-minute scoping call — see How to engage.
What the OSS already covers
If you have not read the Compliance Overview, do that first. The library, free under Apache 2.0, produces:- A tamper-evident SHA-256-chained NDJSON audit log via
ComplianceReporter - Evaluator → control mappings for EU AI Act (5 measurable + 5 process controls — Art. 9(2)(b), 10(2)(f-g), 10(5), 15(1), 15(2) measurable; Art. 11, 12, 13, 14, 15(4-5) process), NIST AI RMF 1.0 (5 measurable + 5 process subcategories), HIPAA Security Rule + Privacy Rule Safe Harbor (4 + 4 controls). See
compliance.py:163–337for the exact mapping dictionaries - Pre-flight
coverage()report identifying which framework controls a suite exercises and which are gaps - Per-(judge × evaluator) calibrated thresholds with provenance (dataset hash, N, F1, measurement date) in
_calibration_data/v2.json audit-packageCLI bundling log + verifier + calibration + manifest into an auditor-attachable zip- Optional anchoring of the chain tip to GitHub Actions output, or your own callback for S3 Object Lock / Sigstore / internal ledger
What the Compliance Bundle adds
1. Quarterly framework-mapping updates
Frameworks move. The EU AI Act has Member-State implementing acts and Commission guidelines that re-interpret Article scope. NIST AI RMF receives crosswalks (e.g., to ISO/IEC 42001). HIPAA receives OCR guidance. The Bundle includes a quarterly review by a compliance-fluent maintainer of:- EU AI Act (Regulation (EU) 2024/1689 + delegated acts + Commission guidelines + relevant ENISA outputs and AI Office publications; EDPB outputs are tracked only where they directly bear on Art. 10(5) personal-data processing)
- NIST AI RMF 1.0 (and its companion playbooks / crosswalks)
- HIPAA Security Rule (45 CFR Part 164) + Safe Harbor de-identification guidance
- ISO/IEC 42001 mappings (roadmap, see below)
compliance.py’s catalog, ship a new release, and notify Bundle subscribers via webhook or email so you can re-run audit-package against the new mapping. Subscribers receive a per-quarter changelog with the regulator citation that drove each change.
2. Calibrated threshold packs per new judge model
When a new judge model from a supported provider ships (next-generation Claude, GPT, or Gemini releases — examples only, not yet-released specific products), we re-run the calibration benchmark against the same human-labeled datasets and publish an updated_calibration_data/v*.json with new thresholds + provenance (dataset hash, N, measured F1, measurement date).
OSS users get the threshold pack with the next library release; Bundle subscribers receive the pack ahead of release plus a written attestation of the calibration methodology, the human-labeled reference set used, and the F1 confidence interval. This is the bit your internal compliance officer needs to defend the choice of threshold during an audit.
3. Auditor-facing report template with your branding
The OSSaudit-package zip ships with a generic README.md cover and a manifest.json listing files. The Bundle includes a customer-branded template:
- Cover page with your organization’s letterhead
- Maintainer-signed methodology statement (PDF) bound into the zip
- “About this report” page tuned to your jurisdiction (EU vs. US vs. UK)
- Direct citations from the framework documents the auditor will be checking against
4. Threshold + framework drift alerts
A subscriber webhook (or email, or Slack) fires when:- A framework mapping in the catalog changes (e.g., Commission guidance reinterprets Art. 9(2)(b) scope)
- A judge model you previously calibrated against is deprecated by the upstream vendor
- The threshold pack for an evaluator × judge pair you use moves outside its prior 95% CI
- A regulator publishes guidance with a citation we mapped to
anchor_fn pattern the OSS already supports — see Audit trail / external anchoring. The Bundle service runs the watch loop.
5. 48h business-hours support SLA
A named technical contact at Multivon responds within 48 business hours to:- Custom calibration runs against your dataset
- Mapping questions (“does Evaluator X count as evidence for Control Y under our specific deployment context?”)
- Audit-package customizations
- Integration with your specific CI / artifact store / log archive
6. Methodology attestation letter
A PDF signed by Multivon’s named technical lead that states:- The methodology multivon-eval uses (QAG scoring, hash-chain audit, per-judge calibration with named human-labeled reference datasets)
- The limits of that methodology (no real-time monitoring, no conformity assessment, no regulator-issued certification, no warranty of audit outcome)
- Maintenance commitments (quarterly framework review cadence, calibration packs published per new judge release)
- Citations to the specific source-code revisions and calibration files the methodology relies on
What we are NOT including today
Honest list of items that are not in the Bundle as currently scoped:- SOC 2 attestation. multivon-eval is a library that runs in your environment; we are not the data processor. A SOC 2 audit of Multivon would not cover the workflow your auditor cares about. If your procurement requires SOC 2 of all suppliers including library vendors, we can supply a written security-posture statement that addresses the same controls; we cannot supply a SOC 2 Type II report.
- A notified-body conformity assessment under Article 43. That is a regulator-recognized process and is not within the scope of a library + bundle.
- A guarantee that your deployment is compliant. No vendor can credibly make this guarantee. We provide the evidence; you and your counsel argue conformance from the evidence.
- 24/7 incident response. Best-effort 48h business-hours response is in scope. 24/7 paging is Enterprise.
- Member-State-specific transposition layers. EU AI Act is Union-level; Member States are implementing it with national specifics (Spain’s AESIA, France’s ANSSI guidance, Germany’s BSI). Quarterly review tracks the major Member States in the catalog; per-Member-State transposition layers are Enterprise.
Enterprise tier (custom)
For organizations that need more than the Bundle, the Enterprise tier (priced individually) typically includes:- SAML/SSO + audit-log ingest from cloud judges
- On-premises threshold calibration service so ground-truth labels never leave your environment
- A dedicated technical-account-manager + quarterly customer review
- 24/7 paging via PagerDuty / Opsgenie integration
- Custom evaluator development against your specific use case
- DPA, DPIA support letters, SCC review, and security-questionnaire turnaround
- Member-State-specific EU AI Act transposition tracking
Roadmap (subject to change based on early-access feedback)
Items below are dependencies-on-customer-commitment, not calendar promises. We move forward when 2–3 customers have signed scopes that depend on a given item.| Tranche | Item |
|---|---|
| Bundle MVP | Framework changelog feed; calibration-pack release flow; branded template; attestation letter (legal review batched per jurisdiction; EU, US, UK on day one) |
| After 3 customers | Drift webhook service (currently customer rolls own via anchor_fn); ISO/IEC 42001 mapping catalog; UK AI Regulation Bill mapping (when enacted) |
| After ~10 customers | On-prem calibration service; per-jurisdiction EU AI Act transposition layers for the largest Member-State markets |
| Speculative | Domain-specialist judges (medical, legal, code) — 1–3B parameter models distilled against the relevant calibration corpora, runnable on-prem |
How to engage
Two paths, depending on where you are:“We’re evaluating multivon-eval and want to know if the OSS is enough.”
Read the Overview, EU AI Act, and Audit trail pages. Runmultivon-eval init -t regulated -d my-eval && cd my-eval && python eval.py. Inspect the produced audit-package. If it covers what you need, ship.
”We need the Bundle services described above — let’s talk.”
Email hello@multivon.ai with:- Your industry and regulatory exposure (healthcare / financial services / public sector / other)
- The frameworks you’re working against (EU AI Act, NIST AI RMF, HIPAA, SOC 2 mappings, ISO/IEC 42001 …)
- Your approximate team size and AI-system maturity (pre-production, in production, in audit, post-audit)
- A rough timeline for when you’d want the Bundle live
See also
- Compliance Overview — what multivon-eval is and is not, mechanically
- EU AI Act coverage — Article-by-Article scope
- Audit trail — hash chain, verifier, external anchoring
- Sample audit-package zip — what an auditor actually receives
- Security & data handling — data flow, telemetry stance, vulnerability reporting

