GitHub Stars Don't Predict AI Agent Trust. I Scored 192 to Prove It.
Every "best AI agent" list ranks the same way: by GitHub stars. Stars are easy to count and easy to game. They measure attention, not trustworthiness — and for software you're about to give access to your terminal, your CI, and your codebase, attention is the wrong metric.
So I built HVTracker to score AI agents on signals you can actually verify: OSSF Scorecard, build provenance, signed commits, license type, maintenance, and adoption — each weighted by how hard it is to fake. It's a curated and ever-growing registry of the most notable agents (the ones people actually use), refreshed every two hours — not an index of every repo on GitHub.
Then I checked the 30 most-starred agents for the one signal that should be table stakes: build provenance — cryptographic proof that the package you install was built from the source you can read.
24 of the 30 don't publish it.
The list
The 24 most-starred AI agents shipping without build provenance, with their HVTrust rank:
| Agent | Stars | Provenance | HVTrust Rank |
|---|---|---|---|
| OpenClaw | 376k | None | #43 |
| AutoGPT | 185k | None | #53 |
| opencode | 168k | None | #120 |
| Langflow | 149k | None | #47 |
| Dify | 143k | None | #36 |
| LangChain | 138k | None | #26 |
| Claude Code | 128k | None | #82 |
| Firecrawl | 126k | None | #38 |
| Gemini CLI | 105k | None | #34 |
| Browser Use | 96k | None | #49 |
| MCP (servers) | 87k | None | #76 |
| RAGFlow | 82k | None | #62 |
| OpenHands | 75k | None | #33 |
| Daytona | 73k | None | #125 |
| DeerFlow | 70k | None | #51 |
| MetaGPT | 68k | None | #124 |
| Crawl4AI | 67k | None | #85 |
| Open Interpreter | 64k | None | #77 |
| AnythingLLM | 61k | None | #118 |
| PrivateGPT | 57k | None | #122 |
| OpenManus | 56k | None | #164 |
| Flowise | 53k | None | #45 |
| CrewAI | 52k | None | #23 |
| LlamaIndex | 50k | None | #117 |
That's not an accusation against any one of them. It's a snapshot of an ecosystem that has collectively decided provenance is optional.
Why provenance matters
When you pip install or npm install an agent, you're trusting that the published artifact matches the public source. Provenance attestation (via SLSA / Sigstore) is the cryptographic receipt that proves it. Without it, a compromised build pipeline or a hijacked publish token can ship malicious code under a trusted name — and you'd have no way to tell.
This would matter for any dependency. It matters more for AI agents because they execute code, call APIs, access tools, and increasingly spend money on your behalf. The blast radius of a compromised logging library is data exfiltration. The blast radius of a compromised agent with tool access is "whatever it was authorized to do."
The six that get it right
Credit where it's due. Among the 30 most-starred, these publish provenance — and it's no coincidence they also score near the top:
| Agent | Stars | Provenance | HVTrust Rank |
|---|---|---|---|
| Codex | 87k | Verified | #2 |
| n8n | 190k | Verified | #6 |
| Cline | 63k | Verified | #10 |
| Mem0 | 57k | Verified | #16 |
| Open WebUI | 139k | Verified | #19 |
| AutoGen | 59k | Verified | #25 |
Verifiable practices and trust scores move together. Projects that publish provenance also tend to care about signed commits, OSSF Scorecard, and disclosure policies. Trust signals cluster.
Stars vs. trust, head to head
The clearest illustration:
- Claude Code — 128k stars, trust score 61.9 (#82).
- Codex — 87k stars, trust score 92.8 (#2).
Fewer stars, far higher trust. This isn't a quality judgment on Claude Code — it's an excellent tool. It's proprietary, with no public OSSF Scorecard and no published provenance, so there's simply less that's externally verifiable. That's the point: popularity and verifiability are different axes, and stars only measure the first one.
How we score (and how to argue with it)
Every score is built from public signals, weighted by how hard they are to fake, then scaled by an evidence-confidence factor so a tool with little verifiable evidence can't bluff its way to the top. The full methodology is public, and the entire dataset is free to use under CC BY 4.0.
If you think a weight is wrong, tell us — the methodology is meant to be argued with, in the open.
See where your stack ranks
Provenance, OSSF Scorecard, signed commits, license type, and more — across hundreds of notable AI agents. Refreshed every 2 hours.
Browse the trust registryData from HVTracker signals as of May 31, 2026. Provenance is checked via npm registry attestations and PyPI PEP 740 metadata. Full methodology.