Runtime Trust Signals
https://hvtracker.net/spec/runtime-trust/v0.1
1. Purpose
Supply-chain trust tells you whether an agent's code and packages are what they claim. Runtime trust asks a different question: what can this agent reach once it runs? An agent that ships an MCP server, calls many external providers, or exposes a plugin marketplace has a materially different risk surface than a self-contained library — regardless of how clean its build provenance is.
This spec documents the four runtime-trust signals HVTracker discovers for every tracked agent, and the experimental scoring that incorporates them. It exists so the methodology is public before any of it affects the production ranking.
2. Status: Not in the Production Rank
As of this version, runtime signals are discovery-only. The production leaderboard rank is computed exclusively from the published Methodology v2.0 dimensions. The experimental v2 score (§4) is computed and published alongside every agent for transparency and calibration, but does not order the leaderboard.
Promotion of runtime signals into the production rank is gated on calibration evidence (§6) and will ship, if at all, as a separately labelled scoring slice with per-field explanations — never as a silent reweight.
3. Runtime Discovery Fields
Each tracked agent carries four runtime fields, discovered by static analysis of the repository and its published package metadata. Every field reports a status, a confidence (high / medium / low), and an evidence array of human-readable findings, so any consumer can audit why a value was assigned.
| Field | Statuses | What it captures |
|---|---|---|
mcp_server_support | implemented, declared, none | Whether the project ships or declares a Model Context Protocol server |
external_service_dependencies | providers list + requires_api_keys | Third-party services the agent calls at runtime (LLM providers, APIs) |
tool_plugin_surface | plugin_system: marketplace, extension-based, declared, none; plus tool_tags | How much third-party code the agent can load and execute |
package_provenance_drift | match, partial, unknown, not_applicable, warning | Whether the published package matches the tracked repository |
These fields are recorded in every daily history snapshot, building an append-only time series of runtime-surface drift per agent.
4. Experimental v2 Scoring
The v2 score is the production trust score plus a bounded runtime adjustment, clamped to [0, 100]. Reference implementation: compute_trust_score_v2 in fetch_and_build.py. The adjustments are:
| Dimension | Adjustment |
|---|---|
| MCP server support | implemented +2.0 · declared +1.0 · none 0 |
| External dependencies | −0.5 per provider beyond the first, capped at −3.0; additional −1.0 if API keys are required |
| Tool/plugin surface | −0.3 per tool tag, capped at −1.5; plus marketplace −1.0, extension-based −0.6, declared −0.3 |
| Provenance drift | match +4.0 · partial +2.0 · unknown/not_applicable 0 · warning −5.0 |
Each agent publishes trust_score_v2, the net trust_v2_adjustment, and a per-dimension trust_v2_breakdown, so every point of difference from the production score is attributable.
5. Data Access
GET /data/latest.json— all agents, including runtime fields and v2 scoresGET /data/agents/{slug}.json— per-agent recordGET /data/history/YYYY-MM-DD.json— daily snapshots (runtime-drift time series)- Methodology — Runtime-Trust Calibration — the human-readable adjustment reference
6. Calibration and Promotion Criteria
Runtime signals move into the production rank only after an upset review demonstrates the change is evidence-backed, with published criteria covering: maximum acceptable rank churn, protection of high-grade agents from unexplained drops, no single dimension dominating the adjustment, and this spec being published first. Until those criteria pass, v2 remains a labelled experiment. The review is repeated after any recalibration.
7. Versioning
This spec versions independently of the scoring methodology. Any change to the adjustment table (§4) requires a version bump and a changelog entry; promotion into production rank requires a new major section documenting the cutover and the evidence that gated it.