Runtime Trust Is Live on HVTracker

June 5, 2026 · 5 min read · HVTracker Research

HVTracker v3.2 adds a new public layer to every agent profile: runtime-trust discovery. We already ranked projects on supply-chain trust, provenance, maintenance, and adoption. Now we also expose four lightweight runtime fields that help answer a different question: what does this thing look like once it actually runs?

The four new public fields are:

MCP server support, external service dependencies, tool / plugin surface, and package provenance drift.

Important: these runtime fields are live on agent pages now, but they do not affect the production leaderboard yet. We are calibrating them separately in an experimental Score Lab so we can study upsets before changing the live rank.

What actually shipped

This release is intentionally conservative. We are only showing public, checkable signals from repo docs, manifests, and registry metadata. No private annotations, no secret analyst layer, and no pretending we can perfectly infer runtime behavior from static files.

That gives us a useful first snapshot:

60
agents currently shift by 10 or more ranks in the first experimental v2 calibration

How the current top 10 would change

The current production top 10 is still the official leaderboard. But the first local v2 calibration already tells us something interesting: the biggest movements come from strong MCP and provenance matches on one side, and package-provenance drift warnings on the other.

AgentCurrentV2DeltaScore change
Haystack#1#3-294.3 → 96.1
LangGraph#2#17-1593.2 → 86.8
Codex#3#1+292.8 → 97.9
Vercel AI SDK#4#2+292.6 → 97.8
n8n#5#4+191.8 → 94.3
OpenAI Agents SDK#6#10-491.2 → 91.4
LiveKit Agents#7#9-291.2 → 92.4
PydanticAI#8#5+391.1 → 93.4
Cline#9#6+391.0 → 92.9
MLflow#10#11-190.7 → 89.5

The most interesting upset

LangGraph is the obvious surprise. It is currently ranked #2, but the first v2 calibration drops it to #17. That is not because the project suddenly looks weak overall. It is because the runtime layer found a package provenance drift warning, and that penalty is currently strong enough to dominate the new runtime slice.

That is exactly why Score Lab exists. A result like that does not mean we should immediately change the live ranking. It means we should inspect the evidence, review whether the drift penalty is too sharp, and decide if the formula is rewarding and punishing the right things.

What looks good in the first pass

Codex and Vercel AI SDK both move into the top two positions under the first calibration. Both benefit from strong MCP signals plus clean package-source matches. PydanticAI and Cline also climb, though more modestly.

That feels directionally right. The runtime layer should reward projects that are both verifiable and explicit about the surfaces they expose, without letting runtime complexity alone overpower the stronger supply-chain core of HVTrust.

What happens next

For now, the production leaderboard remains unchanged. The next work is calibration, not immediate launch:

See the experimental ranking preview

Open the Score Lab to inspect the raw current-vs-v2 changes, biggest movers, and the full comparison table.

Open Score Lab

If there is a better product pattern here, it is probably this: keep raw discovery public, keep score changes conservative, and let people inspect the “what if” ranking before we touch the main leaderboard. That gives us more trust, not less.