AI Evidence Readiness Scorecard

How much of your AI system's behavior is independently verifiable?

Beta. This scorecard measures evidence readiness (instrumentation coverage), not project quality. Most AI projects score low because tamper-evident audit trails are new. Methodology

20 repos scanned with assay-ai v1.6.0 · Last updated: 2026-03-29 · How we score

Repository Stars Call Sites Instrumented Score Report
567-labs/instructor 12,622 240 22/240 F 3 View Report
agno-agi/agno 38,996 42 9/42 F 8 View Report
BerriAI/litellm 41,343 994 416/994 F 15 View Report
crewAIInc/crewAI 47,434 27 1/27 F 1 View Report
deepset-ai/haystack 24,639 5 0/5 F 0 View Report
geekan/MetaGPT 66,375 8 0/8 F 0 View Report
gpt-engineer-org/gpt-engineer 55,231 8 0/8 F 0 View Report
langchain-ai/langchain 131,414 1734 187/1734 F 4 View Report
letta-ai/letta 21,789 277 76/277 F 10 View Report
mem0ai/mem0 51,346 148 72/148 F 17 View Report
microsoft/autogen 56,347 24 0/24 F 0 View Report
openai/openai-agents-python 20,377 7 0/7 F 0 View Report
openai/swarm 21,247 9 0/9 F 0 View Report
OpenInterpreter/open-interpreter 62,894 4 0/4 F 0 View Report
pydantic/pydantic-ai 15,897 5 0/5 F 0 View Report
Pythagora-io/gpt-pilot 33,804 2 0/2 F 0 View Report
run-llama/llama_index 48,097 200 25/200 F 4 View Report
ScrapeGraphAI/Scrapegraph-ai 23,146 42 0/42 F 0 View Report
Significant-Gravitas/AutoGPT 182,921 36 2/36 F 2 View Report
stanfordnlp/dspy 33,238 7 0/7 F 0 View Report
Check your own repo:
pip install assay-ai && assay scan . && assay score .