multivon-eval’s audit trail is a SHA-256-chained NDJSON log. EveryDocumentation Index
Fetch the complete documentation index at: https://evaldocs.multivon.ai/llms.txt
Use this file to discover all available pages before exploring further.
ComplianceReporter.record(...) call appends one record. Each record references the previous record’s hash. Tampering with any record invalidates the chain from that point forward — verify() walks the log and reports the first inconsistency.
This page is the mechanical specification. Anyone reproducing or auditing the chain (without the multivon-eval library installed, even) can do so from the description below.
The record format
Each NDJSON line is one record. Fields:record_id— 12-character hex (truncated fromuuid4().hex). Sufficient for cross-referencing within a single suite’s log; use it together withtimestampandsuite_namefor global uniqueness. Not a UUID — calling it a UUID would be inaccurate.timestamp— UTC ISO-8601 captured at record append time.chain_version— Currently1. Bumped only on a breaking change to the canonical serialization.prev_hash— SHA-256 of the previous record’s payload (excluding the previous record’srecord_hashfield). For the first record in a chain, it is 64 zeros — the genesis sentinel.record_type—"summary"for aggregate-per-run records,"case"for decision-level records (Art. 12 satisfies).record_hash— SHA-256 of this record’s payload (excluding therecord_hashfield itself).
compliance.py:680–718.
The hash algorithm
The canonical hash of a payload is:separators=(",", ":")— no whitespace. Two records that differ only in whitespace would otherwise hash differently.sort_keys=False— the field order in the source record is part of the hash. We preserve Python dict insertion order, which is deterministic from Python 3.7 onward. This is conservative: if you ever need to verify a chain without the library, replicating the exact insertion order is the verification contract.record_hashis excluded. The hash covers the payload, then is appended to the payload before serialization to the log line.
compliance.py:849–853.
The full canonical payload sequence (in order) is:
record_type="summary" the type-specific fields are summary then evaluator_results. For record_type="case" it is case. Source: compliance.py:694–706.
The chain
The first record’sprev_hash is the genesis sentinel — 64 zero characters. Every subsequent record’s prev_hash is the previous record’s record_hash. This means:
- Deleting a record breaks the chain at the deletion point.
- Editing a record changes that record’s recomputed hash, breaking the chain at the next record.
- Reordering records breaks the chain at the first out-of-order record.
- Inserting a record breaks the chain at the next genuine record.
tags.
Verifying the chain
The library’sverify() walks the log line by line, recomputing each hash and checking the chain link:
- TAMPERED — record’s stored hash doesn’t match recomputed hash. The record itself was modified.
- CHAIN BROKEN — the
prev_hashdoesn’t match what the previous record’s recomputed hash should be. The record itself is intact, but the chain link to the past is wrong (a record was likely deleted or reordered). - OK (legacy) — record predates
chain_version=1(nochain_versionfield). Verified standalone — chain link is not checked because the legacy format didn’t includeprev_hash. Surfaced explicitly so the auditor knows the chain coverage is partial.
compliance.py:730–787.
Reproducing the verifier without the library
The verifier is also bundled into theaudit-package zip as a standalone verify.py (no multivon-eval import required). An auditor with only Python’s standard library can run:
hashlib, json, and pathlib. A consequence: an auditor can verify the chain even without the multivon-eval library installed, which is the standard expectation for vendor-supplied audit evidence.
The zip’s contents:
External anchoring
A chain that lives only on your filesystem can be rewritten by anyone with write access. For high-stakes deployments, anchor the chain tip to an external immutable witness. The library supports this via theanchor_fn parameter:
record(...) call, the tip hash is appended to $GITHUB_OUTPUT and is captured by GitHub Actions as a workflow output. A workflow run’s output is immutable within GitHub’s platform retention and access-control policies — sufficient for most audit purposes, but not a cryptographic guarantee. Even if a future attacker rewrites your filesystem audit log, the historical GitHub Actions run record still witnesses the original tip as long as the run record itself has not been deleted by an actor with admin access to your GitHub organization. For higher-assurance anchoring, see the alternatives below.
The anchor_fn signature is Callable[[str], None]. Anchor targets the library does not ship out of the box but that customers commonly implement:
- AWS S3 with Object Lock + retention (compliance-mode bucket; the tip object becomes immutable for the retention period).
- Sigstore / Rekor (a public transparency-log entry produces a cryptographic timestamp witnessed by an external service).
- AWS QLDB or another internal append-only ledger (Datadog log archive in immutable mode; Kafka topic with retention enforced by topic config).
- An RFC 3161 trusted-timestamp authority when a qualified timestamp is required (e.g., for European eIDAS qualified-electronic-signature workflows).
github_actions_anchor is included because CI integration is the most common path and is trivial to implement.
Source: compliance.py:860–891.
Per-case vs summary recording
Usemode="summary" for one chained record per report (one row per eval run). Use mode="case" for one chained record per case (one row per AI decision):
provenance block.
Source: compliance.py:540–559.
Provenance
Every record carries aprovenance block so an auditor reading the log a year from now can answer “what code, judge, calibration, and cases produced this score?”:
| Field | Contents |
|---|---|
schema_version | Currently 1. Bumped on breaking changes to the provenance shape. |
package_version | multivon_eval.__version__ at record time. |
package_git_sha | If running from a git checkout, the HEAD SHA. Absent for PyPI installs. |
package_git_dirty | If running from a git checkout, whether the working tree had uncommitted changes. A SHA without this flag could point at code that doesn’t fully describe what ran. |
host | Python version, OS, machine architecture. No hostname, no username. |
suite_lock | The full SuiteLock (evaluator fingerprints, resolved judge configs, calibration entries used, cases hash) when the report was produced by EvalSuite.run. |
suite_lock_status | "ok" / "absent" / "serialization_failed". Explicit, so the auditor knows why a suite_lock is missing rather than guessing. |
compliance.py:582–628.
The suite_lock is the primary artifact establishing reproducibility. Two records that share a suite_lock were produced by the same evaluator set with the same judge configs against the same case hashes — so any score difference between those records is attributable to the model under test, not to drift in the test infrastructure.
What we are NOT claiming
- Not a notarization service. Local hash chains prove tamper-evidence; they do not prove when a record was written. If you need wall-clock attestation, anchor to a service that does (Sigstore Rekor, AWS QLDB, a trusted timestamp authority).
- Not zero-knowledge. The log contains your eval data. If the data is sensitive, the log itself is sensitive. Apply access control accordingly.
- Not perpetually backward-compatible. If
chain_versionever bumps to2, the new verifier handles both versions, but a 2026-vintageverify.pywon’t understand 2027-vintage chains. Bundle the verifier with the audit package (which the library does automatically) so contemporary verifiers travel with their logs. - Not a qualified timestamp. GitHub Actions output, S3 Object Lock, and internal ledgers establish the chain tip existed at some moment after it was anchored. They are not RFC 3161 qualified timestamps. For jurisdictions where qualified timestamps are required (eIDAS-regulated EU workflows), pair the anchor with a trusted-timestamp authority.
sort_keys=Falseis a real interoperability constraint. If a third-party verifier re-serializes a record from a parsed JSON object using a language or runtime that does not preserve insertion order, hash recomputation will silently fail. Verifiers should always read the raw NDJSON line as bytes and hash the byte sequence directly when possible, or use a JSON library that preserves insertion order (Python 3.7+, Go’sencoding/jsonwithorderedtagging, etc.). This is a known limitation of the canonical-JSON-by-insertion-order approach; futurechain_versionbumps may move to RFC 8785 (JCS) for stronger language portability.- Not a replacement for filesystem permissions. A tamper-evident log proves that tampering happened; it does not prevent it. Pair the chain with filesystem
chattr +a(Linux append-only), an S3 bucket with versioning and retention, or a write-once-read-many appliance for prevention.
See also
compliance.py— the full reference implementation.- Sample audit-package zip (5.5 KB) — an illustrative chain produced from the
regulatedinit template plus the standalone verifier. Synthetic eval data, real chain mechanics. - Compliance Bundle — paid services that wrap the OSS audit trail with framework-update commitments, attestation letter, and named technical contact.

