AI Evidence Readiness Scorecard

How much of your AI system's behavior is independently verifiable?

Beta. This scorecard measures evidence readiness (instrumentation coverage), not project quality. Most AI projects score low because tamper-evident audit trails are new. Methodology

20 repos scanned with assay-ai v1.6.0 · Last updated: 2026-03-29 · How we score

Repository	Stars	Call Sites	Instrumented	Score	Report
567-labs/instructor	12,622	240	22/240	F 3	View Report
agno-agi/agno	38,996	42	9/42	F 8	View Report
BerriAI/litellm	41,343	994	416/994	F 15	View Report
crewAIInc/crewAI	47,434	27	1/27	F 1	View Report
deepset-ai/haystack	24,639	5	0/5	F 0	View Report
geekan/MetaGPT	66,375	8	0/8	F 0	View Report
gpt-engineer-org/gpt-engineer	55,231	8	0/8	F 0	View Report
langchain-ai/langchain	131,414	1734	187/1734	F 4	View Report
letta-ai/letta	21,789	277	76/277	F 10	View Report
mem0ai/mem0	51,346	148	72/148	F 17	View Report
microsoft/autogen	56,347	24	0/24	F 0	View Report
openai/openai-agents-python	20,377	7	0/7	F 0	View Report
openai/swarm	21,247	9	0/9	F 0	View Report
OpenInterpreter/open-interpreter	62,894	4	0/4	F 0	View Report
pydantic/pydantic-ai	15,897	5	0/5	F 0	View Report
Pythagora-io/gpt-pilot	33,804	2	0/2	F 0	View Report
run-llama/llama_index	48,097	200	25/200	F 4	View Report
ScrapeGraphAI/Scrapegraph-ai	23,146	42	0/42	F 0	View Report
Significant-Gravitas/AutoGPT	182,921	36	2/36	F 2	View Report
stanfordnlp/dspy	33,238	7	0/7	F 0	View Report

Check your own repo:
pip install assay-ai && assay scan . && assay score .