Runtime Trust Is Live on HVTracker
HVTracker v3.2 adds a new public layer to every agent profile: runtime-trust discovery. We already ranked projects on supply-chain trust, provenance, maintenance, and adoption. Now we also expose four lightweight runtime fields that help answer a different question: what does this thing look like once it actually runs?
The four new public fields are:
MCP server support, external service dependencies, tool / plugin surface, and package provenance drift.
What actually shipped
This release is intentionally conservative. We are only showing public, checkable signals from repo docs, manifests, and registry metadata. No private annotations, no secret analyst layer, and no pretending we can perfectly infer runtime behavior from static files.
That gives us a useful first snapshot:
- Does the project look MCP-aware, and does it appear to expose an MCP server?
- Which external providers or services appear to be required?
- How broad is the public tool or plugin surface?
- Do the latest published package references still point back to the tracked source repo?
How the current top 10 would change
The current production top 10 is still the official leaderboard. But the first local v2 calibration already tells us something interesting: the biggest movements come from strong MCP and provenance matches on one side, and package-provenance drift warnings on the other.
| Agent | Current | V2 | Delta | Score change |
|---|---|---|---|---|
| Haystack | #1 | #3 | -2 | 94.3 → 96.1 |
| LangGraph | #2 | #17 | -15 | 93.2 → 86.8 |
| Codex | #3 | #1 | +2 | 92.8 → 97.9 |
| Vercel AI SDK | #4 | #2 | +2 | 92.6 → 97.8 |
| n8n | #5 | #4 | +1 | 91.8 → 94.3 |
| OpenAI Agents SDK | #6 | #10 | -4 | 91.2 → 91.4 |
| LiveKit Agents | #7 | #9 | -2 | 91.2 → 92.4 |
| PydanticAI | #8 | #5 | +3 | 91.1 → 93.4 |
| Cline | #9 | #6 | +3 | 91.0 → 92.9 |
| MLflow | #10 | #11 | -1 | 90.7 → 89.5 |
The most interesting upset
LangGraph is the obvious surprise. It is currently ranked #2, but the first v2 calibration drops it to #17. That is not because the project suddenly looks weak overall. It is because the runtime layer found a package provenance drift warning, and that penalty is currently strong enough to dominate the new runtime slice.
That is exactly why Score Lab exists. A result like that does not mean we should immediately change the live ranking. It means we should inspect the evidence, review whether the drift penalty is too sharp, and decide if the formula is rewarding and punishing the right things.
What looks good in the first pass
Codex and Vercel AI SDK both move into the top two positions under the first calibration. Both benefit from strong MCP signals plus clean package-source matches. PydanticAI and Cline also climb, though more modestly.
That feels directionally right. The runtime layer should reward projects that are both verifiable and explicit about the surfaces they expose, without letting runtime complexity alone overpower the stronger supply-chain core of HVTrust.
What happens next
For now, the production leaderboard remains unchanged. The next work is calibration, not immediate launch:
- review large rank swings case by case
- soften penalties that create noisy or premature upsets
- separate “interesting signal” from “deserves ranking impact”
- add historical runtime drift and a simple weekly digest later on
See the experimental ranking preview
Open the Score Lab to inspect the raw current-vs-v2 changes, biggest movers, and the full comparison table.
Open Score LabIf there is a better product pattern here, it is probably this: keep raw discovery public, keep score changes conservative, and let people inspect the “what if” ranking before we touch the main leaderboard. That gives us more trust, not less.