Wrap your LLM calls. Get cryptographically signed receipts. Hand them to an auditor who can verify offline -- no access to your server needed. Two lines of code. Four exit codes. Done.
For teams shipping AI agents, RAG pipelines, and autonomous workflows.
Requires Python 3.9+. No API key needed. ~10s install. Uses synthetic data. If bare pip is not on PATH on macOS, use python3 -m pip.
Installing Assay gives you the tools. Instrumentation makes receipts happen.
assay patch . or patch() wires receipt emission into your runtime.
assay run -- ... launches your app with a trace id and packages emitted receipts into
proof_pack_*. assay verify-pack checks the artifact offline.
No instrumentation means no receipts.
We scanned the ecosystem. Nobody has tamper-evident audit trails yet.
Scan study methodology (Feb 2026) ↗
1,100+ tests · 5 integrations · Apache-2.0 · on PyPI since Jan 2026
Your agent runs. Something goes wrong. You check the logs. But the logs are on your server.
Assay makes this harder to fake. Every LLM call gets a cryptographically signed receipt bundled into a portable proof pack. Edit one byte and verification fails. Skip a contracted call site (for supported frameworks) and completeness checks catch it. The verifier doesn't need access to your server, your database, or your trust.
Run assay patch . to auto-insert the integration into your source files (it finds your LLM call sites and adds the import for you), or add it manually.
Then assay run. Every LLM call now produces a signed receipt.
The proof pack is a 5-file evidence bundle that anyone can verify independently.
Works with OpenAI, Anthropic, Google Gemini, LiteLLM, and LangChain.
Your code calls OpenAI. No evidence trail.
Add one import. Every call emits a cryptographically signed receipt.
assay run wraps your program, collects receipts, and bundles them into a proof pack.
-c receipt_completeness runs the built-in check that all receipts are present.
Everything after -- is your normal run command.
Drop this into your pipeline. The lockfile catches config drift. Verify-pack catches tampering. Diff catches regressions and budget overruns.
Decision Escrow — the protocol model behind this workflow.
Five files. One signature. Independently verifiable.
| Integrity | Claims | Exit | Meaning |
|---|---|---|---|
| PASS | PASS | 0 | Evidence checks out, behavior meets standards |
| PASS | FAIL | 1 | Honest failure: authentic evidence of standards violation |
| FAIL | -- | 2 | Evidence has been tampered with |
| -- | 3 | Input validation error (bad arguments, missing files) | |
The split is the point. Systems that can prove they failed honestly are more trustworthy than systems that always claim to pass.
Scanning tells you where the gaps are. The completeness contract closes the loop: it bridges static analysis to runtime evidence, so you can prove what percentage of your LLM call sites actually emitted receipts.
AST scan writes a contract file listing every LLM call site with a stable ID
assay scan --emit-contract coverage.json
Integration patches tag each receipt with its callsite_id at runtime
assay run -- python app.py
Match receipt IDs against the contract. Fail if below threshold.
assay verify-pack --coverage-contract coverage.json --min-coverage 0.8
No other tool connects static scan results to runtime proof. The contract turns "we think we instrumented everything" into "we can prove 82% of call sites emitted signed evidence."
No API key needed. Synthetic data. Real cryptography.
Two-act scenario: a passing run, then someone swaps the model and drops the guardian.
# Act 1: PASS (exit 0). Act 2: honest FAIL (exit 1). python3 -m pip install assay-ai assay demo-incident
You'll see: same system, different behavior, caught by the same contract.
CTF-style challenge. One pack is legit, one has been tampered with. Can you tell which?
# Good pack vs tampered pack assay demo-challenge
You'll see: one pack exits 0, the other exits 2 (tampered). Inline verification shows which bytes changed.
Find every uninstrumented LLM call site. Get a self-contained HTML gap report.
# Interactive HTML report assay scan . --report
Freeze the verification contract. Block merges when evidence degrades.
# Generate GitHub Actions workflow assay ci init github assay lock write --cards receipt_completeness,coverage_contract
Step-by-step flows for trying, adopting, CI, MCP, and audit handoff.
# See the plan, then --apply to execute assay flow try assay flow ci --apply
Bundle evidence for an auditor. Self-contained archive with verify instructions.
# Create portable audit bundle assay audit bundle ./proof_pack_*/ assay verify-signer ./proof_pack_*/
We scanned 30 popular open-source AI projects for tamper-evident audit trails. High-confidence = direct SDK calls (OpenAI, Anthropic). Click column headers to sort.
| Repo | Stars | High | Med | Low | Total | Coverage |
|---|
No. Logs record what you say happened. Receipts prove the integrity and tamper-evidence of recorded events -- a third party can verify the evidence artifact was not modified after creation. Logs live on your server and you can edit them. Proof packs are portable and tamper-evident -- change one byte and verification fails.
Low overhead. Ed25519 signing takes microseconds. Receipt emission happens after the LLM call returns and adds minimal latency to your application. The proof pack is assembled at the end of the run. Benchmark in your environment to confirm.
Each receipt records: model name, provider, timestamp, prompt hash, response hash, and token counts. Full prompt/response content is optional and off by default. You control what goes into the evidence bundle.
assay verify-pack ./proof_pack_*/.
Exit 0 = pass, 1 = claims failed, 2 = tampered, 3 = bad input. No server, no account, no internet
connection required. The verifier checks Ed25519 signatures, Merkle
tree consistency, manifest completeness, and any declared claims.
Run these in order:
assay scan . (finds LLM call sites in your code),
assay scan . --report (generates an HTML gap report),
assay run -- python app.py (wraps your program and collects receipts),
then assay doctor (runs 15 preflight checks on your setup).
Together these tell you whether your project has
supported call sites, whether those files are instrumented, and whether
your run command is wired correctly (including the required -- separator).
Five integrations ship built-in: OpenAI, Anthropic, Google Gemini, and LiteLLM use a one-line monkey-patch that wraps SDK methods transparently. LangChain uses a callback handler you pass to your LLM. Your application code doesn't change. The scanner also detects LlamaIndex call sites.
Yes. assay quickstart guards against scanning home/system-size
directories by mistake. Run it from your project root, or bypass the guard
when intentional: assay quickstart --force.
Assay is an evidence integrity and completeness layer, not a lie detector. Here's exactly what it catches and what it doesn't.
The cost of cheating scales with the complexity of the lie. Assay doesn't make fraud impossible -- it makes fraud expensive.
The EU AI Act (Regulation 2024/1689, Articles 12 & 19) requires tamper-resistant logging and retention for high-risk AI systems. These obligations take effect August 2, 2026 under the current timeline. Assay provides one building block for these requirements -- it does not constitute full compliance on its own. See For Compliance Teams for mapping to SOC 2, ISO 42001, and NIST AI RMF.
Open source. Apache-2.0. Works with OpenAI, Anthropic, Google Gemini, LiteLLM, and LangChain today.