The problem
An AI agent calls a tool, sends an email, modifies a database, or makes a purchase. Did the action stay within authorization? Was the evidence complete? Could another party verify what happened without trusting the operator?
Today, the usual answer is still “check the logs.” But logs are mutable, incomplete, and operator-controlled. There is no visible settlement step. The action simply happens, and the story is reconstructed after the fact.
The four phases
Authorize the action
Policy checks constraints before execution and issues a permit describing what is allowed, for how long, and under which policy hash.
Emit receipts at runtime
The action executes while the runtime emits signed receipts capturing the call path, inputs, outputs, timing, hashes, and causal links.
Verify before trust
A verifier checks integrity, coverage, and claims: was the permit valid, did execution stay inside bounds, and is the evidence complete?
Feed outcomes back
The verified outcome updates trust state for the agent, policy, or workflow, turning evidence into a learning signal rather than a dead log artifact.
1. PREFLIGHT PERMIT Agent requests action -> policy gate checks constraints -> issues permit Evidence: permit receipt (policy hash, constraints, expiry) 2. EXECUTION WITH EVIDENCE Agent executes action -> runtime emits receipts for every operation Evidence: execution receipts (inputs, outputs, model, timing, hashes) 3. SETTLEMENT Verifier checks: permit valid? execution within constraints? evidence complete? Evidence: verification report (integrity check, claim results, coverage) 4. REPUTATION UPDATE Outcome feeds back into agent trust scoring Evidence: reputation receipt (delta, new score, basis)
Each phase produces evidence that can be bundled into a portable proof artifact. The point is not merely observability. The point is that another party can verify the artifact without server access and without trusting the operator.
What Assay implements today
| Phase | Status | What exists now |
|---|---|---|
| Preflight Permit | Future | Requires a policy gate such as Guardian or an equivalent permit issuer upstream of execution. |
| Execution with Evidence | Shipping | assay patch and assay run instrument supported SDK calls and emit signed receipts, including causal links via parent_receipt_id. |
| Settlement | Shipping | assay verify-pack checks integrity and claims. assay diff --gate enforces cost, latency, and error thresholds against baselines. |
| Reputation Update | Future | Requires a trust-scoring layer or downstream policy engine that consumes verified outcomes. |
assay quickstart assay patch assay run -c receipt_completeness -- python my_app.py assay verify-pack ./proof_pack_*/ assay diff ./baseline_pack/ ./proof_pack_*/ --gate-cost-pct 25 --gate-errors 0 --gate-strict
The public first-contact truth stays simple: Assay instruments existing AI workflows and proves what happened in a real run. Decision Escrow is the protocol model that explains where those runtime artifacts fit as systems move toward stronger consequence gating.
What Assay proves today
- Integrity: evidence was not altered after creation through Ed25519 signatures and SHA-256 hashes.
- Completeness under contract: contracted call sites can be checked for receipt coverage.
- Claim compliance: declared checks can pass or fail honestly against authentic evidence.
- Budget enforcement: cost, latency, and error thresholds can be gated in CI or release workflows.
What it does not prove by itself
- A dishonest operator can still fabricate a run from scratch at the base self-signed tier.
- Scanning finds gaps but cannot force every call site to be instrumented.
- Timestamps remain local-clock assertions until an external anchor is added.
- Assay proves what happened in the captured execution path, not whether the model behavior was normatively correct.
The cost of cheating scales with the complexity of the lie. Assay does not make fraud impossible. It makes fraud more expensive, more fragile, and easier to catch.
Trust tiers and upgrade path
Self-signed
Portable, offline-verifiable artifacts. Useful immediately, but the operator still controls the signing key.
Time-anchored
Add RFC 3161 or Sigstore-style timestamping to prove the artifact existed before a particular date.
Independent witness
Anchor proofs in a transparency log such as assay-ledger so an external system observes publication.
Runtime attestation
Bind evidence to a hardware-backed execution environment so even the runtime itself participates in attestation.
Every tier raises the bar for fabrication. T0 gives portable evidence now. T1 establishes when the artifact existed. T2 adds an external witness. T3 pushes toward attested execution rather than operator-only assertions.
Who this is for
Ask for one recent proof pack
Request a pack from a recent run. Verify it independently with assay verify-pack. If the artifact is missing, they do not yet have evidence infrastructure. If integrity fails, the evidence was altered. If claims fail honestly, the failure itself is still evidence.
Add a settlement membrane
Decision Escrow frames agent systems as permitable, inspectable, and settlable workflows. The receipt becomes the atomic unit. The proof pack becomes the artifact. The verifier becomes the judge.
Relationship to standards
| Framework | What Assay contributes |
|---|---|
| SOC 2 (CC7.2) | Tamper-evident execution evidence that can be reviewed outside the operator’s dashboard. |
| ISO 42001 | Portable evidence artifacts for AI management system audits and control reviews. |
| EU AI Act (Articles 12 & 19) | Evidence relevant to traceability and documentation for high-risk systems, though not full compliance by itself. |
| NIST AI RMF | Machine-checkable artifacts that strengthen governance, accountability, and reviewability. |
Start with the shipping workflow.
Decision Escrow is the model. The production on-ramp is the shipping review-packet workflow: capture evidence, generate proof packs, sign a bounded verification report, and let another reviewer verify it independently.