AI Evidence Readiness Scorecard
How much of your AI system's behavior is independently verifiable?
Beta. This scorecard measures evidence readiness (instrumentation coverage), not project quality.
Most AI projects score low because tamper-evident audit trails are new.
Methodology
20 repos scanned with assay-ai v1.6.0
· Last updated: 2026-03-29
· How we score
| Repository |
Stars |
Call Sites |
Instrumented |
Score |
Report |
| 567-labs/instructor |
12,622 |
240 |
22/240 |
F 3 |
View Report |
| agno-agi/agno |
38,996 |
42 |
9/42 |
F 8 |
View Report |
| BerriAI/litellm |
41,343 |
994 |
416/994 |
F 15 |
View Report |
| crewAIInc/crewAI |
47,434 |
27 |
1/27 |
F 1 |
View Report |
| deepset-ai/haystack |
24,639 |
5 |
0/5 |
F 0 |
View Report |
| geekan/MetaGPT |
66,375 |
8 |
0/8 |
F 0 |
View Report |
| gpt-engineer-org/gpt-engineer |
55,231 |
8 |
0/8 |
F 0 |
View Report |
| langchain-ai/langchain |
131,414 |
1734 |
187/1734 |
F 4 |
View Report |
| letta-ai/letta |
21,789 |
277 |
76/277 |
F 10 |
View Report |
| mem0ai/mem0 |
51,346 |
148 |
72/148 |
F 17 |
View Report |
| microsoft/autogen |
56,347 |
24 |
0/24 |
F 0 |
View Report |
| openai/openai-agents-python |
20,377 |
7 |
0/7 |
F 0 |
View Report |
| openai/swarm |
21,247 |
9 |
0/9 |
F 0 |
View Report |
| OpenInterpreter/open-interpreter |
62,894 |
4 |
0/4 |
F 0 |
View Report |
| pydantic/pydantic-ai |
15,897 |
5 |
0/5 |
F 0 |
View Report |
| Pythagora-io/gpt-pilot |
33,804 |
2 |
0/2 |
F 0 |
View Report |
| run-llama/llama_index |
48,097 |
200 |
25/200 |
F 4 |
View Report |
| ScrapeGraphAI/Scrapegraph-ai |
23,146 |
42 |
0/42 |
F 0 |
View Report |
| Significant-Gravitas/AutoGPT |
182,921 |
36 |
2/36 |
F 2 |
View Report |
| stanfordnlp/dspy |
33,238 |
7 |
0/7 |
F 0 |
View Report |
Check your own repo:
pip install assay-ai && assay scan . && assay score .