HVTracker Provenance Profile

Version: v0.1 Status: Published Published: 2026-05-24 Authors: HVTracker

1. Abstract

This document specifies the trust signal model used by HVTracker to assess the supply chain integrity of open-source AI agent projects. It formally defines the schema, data source, collection method, freshness expectations, and failure modes for each of the four currently tracked signals: npm provenance attestations, PyPI PEP 740 attestations, OSSF Scorecard, and signed commit ratio.

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119.

v0.x status: Signals are collected and displayed but do not affect the health score defined in the Methodology Specification v2.0. Promotion to v1.0 will occur when the trust model is considered stable enough to inform scoring.

2. Motivation

Open-source AI agent projects are becoming critical infrastructure. Unlike traditional software libraries, AI agents often operate with broad system permissions — reading files, browsing the web, executing code, and calling external APIs on behalf of users. The integrity of the supply chain from which these agents are installed therefore carries higher stakes than for passive libraries.

HVTracker tracks health signals derived from project activity (stars, commits, forks). These signals measure adoption and development momentum but say nothing about whether the artifacts users install are trustworthy. A project with 50,000 stars and daily commits can still ship a compromised release if its publishing pipeline lacks attestations or its commits go unsigned.

The Provenance Profile addresses this gap. It defines a set of unilaterally observable, publicly verifiable signals that characterize the supply chain trustworthiness of a project's release artifacts. No maintainer participation, registration, or opt-in is required or assumed — all signals are derived from public cryptographic infrastructure.

3. Terminology

Trust signal
A binary or scalar value derived from publicly observable cryptographic infrastructure that characterizes one dimension of the supply chain integrity of a software project. Trust signals are not scores and are not aggregated into a composite value in this version.
Attestation
A cryptographically signed statement, produced by a known identity (typically a CI/CD system), asserting facts about a software artifact — most commonly that the artifact was built from a specific source commit in a specific pipeline environment. Attestations are logged to a transparency log so they cannot be silently revoked.
Provenance
The verifiable record of how a software artifact was produced: which source repository, which commit, which build environment, and which pipeline. Provenance is a specific category of attestation that answers "where did this artifact come from?"
Artifact
A published binary or source distribution of a software package — a .whl file on PyPI, a tarball on npm, a container image layer, etc.
Transparency log
An append-only, cryptographically verifiable log of signed records. The primary transparency log used by npm and PyPI provenance is Sigstore's Rekor. Once a record is appended, it cannot be deleted without detection.
Freshness
The degree to which a trust signal reflects the current state of a project rather than a historical state. A signal is fresh if it was collected within one Daily Run of the current build.
Verified signature
A commit signature whose cryptographic validity has been confirmed by GitHub's signature verification API. A verified signature does not imply key quality or trust level — it means the signature is mathematically valid and the signing key is recognized by GitHub.
Trusted Publisher
A mechanism (npm: Provenance, PyPI: OIDC Trusted Publishing) that allows a CI/CD system to publish packages using a short-lived identity token rather than a long-lived API key. Artifacts published via a Trusted Publisher are eligible for provenance attestations.
OSSF Scorecard
An automated tool maintained by the Open Source Security Foundation that evaluates a project's security posture across a fixed set of checks and produces a score from 0 to 10.

4. Trust Signals

4.1 npm Provenance

Field: npm_provenance  |  Type: boolean | null  |  Scope: agents with a configured npm_package

ValueMeaning
trueThe latest published version includes at least one provenance attestation in dist.attestations.
falseThe latest published version has no provenance attestations (dist.attestations is absent or null).
nullNo npm_package configured, or the API request failed.

Source: GET https://registry.npmjs.org/{package}/latest — signal is true iff dist.attestations is present and non-null.

What it means: The package publisher used npm's Provenance feature, requiring a Trusted Publisher (GitHub Actions, GitLab CI, CircleCI). The resulting attestation is an in-toto SLSA provenance statement, signed via Sigstore's keyless protocol and logged to the Rekor transparency log. End users can verify via npm audit signatures.

Freshness: Reflects the latest tag at collection time. Changes on the next Daily Run after a new version is published.

Limitations: Only the latest tag is checked. The attestation's content is not cryptographically verified by the reference implementation — presence is checked, not validity.

4.2 PyPI Provenance (PEP 740)

Field: pypi_provenance  |  Type: boolean | null  |  Scope: agents with a configured pypi_package

ValueMeaning
trueThe last file entry in the package's Simple API response has a non-null provenance field.
falseThe last file entry has no provenance field.
nullNo pypi_package configured, or the API returned non-200.

Source: GET https://pypi.org/simple/{package}/ with Accept: application/vnd.pypi.simple.v1+json (PEP 691). The last element of the files array is checked for a provenance field.

Collection: Requests MUST be made serially with a minimum 1.2-second delay between PyPI requests. On HTTP 429 the signal is null for that run.

What it means: The most recently uploaded distribution carries a PEP 740 digital attestation generated by PyPI's Trusted Publishing mechanism. The attestation is cryptographically bound to the source repository and CI run that produced the artifact.

Limitations: PEP 740 was accepted in 2024; most packages predate it. Packages published via twine with API tokens cannot carry PEP 740 attestations by design. Only the last uploaded file is checked, not the latest stable release tag.

4.3 OSSF Scorecard

Fields: scorecard_score (float | null, range 0–10)  ·  scorecard_checks (object | {}, check name → score)

Scope: All agents.

Collection: Scorecard data is generated by running the OSSF Scorecard CLI tool directly against each repository, refreshed weekly via GitHub Actions. Results are cached in scorecard-cache.json; daily builds read from this cache. If a repository is absent from the weekly cache, the daily build falls back to remote APIs (tried in order):

  1. deps.dev: GET https://api.deps.dev/v3/projects/github.com%2F{owner}%2F{repo} — reads scorecard.overallScore and scorecard.checks.
  2. securityscorecards.dev: GET https://api.securityscorecards.dev/projects/github.com/{owner}/{repo} — reads score and checks.

What it means: The OSSF Scorecard score is a weighted aggregate of 10–18 individual checks (Maintained, Code-Review, Branch-Protection, Signed-Releases, Pinned-Dependencies, Vulnerabilities, Token-Permissions, Dangerous-Workflow, and others). HVTracker reports the score as-is; it does not reweight or reinterpret individual checks.

Freshness: The CLI scan runs weekly; results are at most 7 days old. Daily builds serve the cached value unchanged until the next weekly scan.

Limitations: Absence of a score does not imply poor security posture. The check set and weights are controlled by the OpenSSF Scorecard project, not HVTracker, and may change between Scorecard versions.

4.4 Signed Commit Ratio

Field: signed_commits_ratio  |  Type: float | null, range 0.0–1.0, rounded to 3 decimal places  |  Scope: All agents.

ValueMeaning
1.0All sampled commits carry a verified signature.
0.0 < x < 1.0Fraction x of sampled commits carry a verified signature.
0.0No sampled commits carry a verified signature.
nullAPI request failed or commits list is empty.

Source: GET https://api.github.com/repos/{owner}/{repo}/commits?per_page=100 — reads commit.verification.verified on each result. Signal = verified_count / total_count.

What it means: The fraction of the most recent 100 commits on the default branch that carry a cryptographic signature (GPG, SSH, or S/MIME) verified as mathematically valid by GitHub.

Freshness: Collected fresh each Daily Run. The sample window slides forward as new commits are pushed.

Limitations: Commits made through GitHub's web UI are signed by GitHub's own key and counted as verified, which may inflate the ratio. Only 100 commits are sampled. Signature presence is measured, not key quality or trust level.

5. Extension Model

5.1 Inclusion Criteria

A proposed new trust signal MUST satisfy all of the following:

  1. Unilateral observability: Derivable from public APIs or public cryptographic infrastructure without maintainer participation or opt-in.
  2. Binary or scalar output: Produces a boolean, float, or integer value (or null for unavailability). Categorical or free-text signals are not permitted.
  3. Determinism: Given the same API response at the same point in time, two independent implementations MUST produce the same signal value.
  4. Distinct from existing signals: Measures a dimension of supply chain trust not already covered by the four signals in Section 4.
  5. Stable upstream source: The data source MUST be maintained by a known organization and have a documented API.

5.2 Addition Process

  1. The candidate signal is described in a draft section following the format of Section 4 (schema, data source, collection method, meaning, freshness, failure modes, limitations).
  2. The draft is reviewed by the owner and merged into a new minor version of this specification (e.g., v0.1 → v0.2).
  3. The reference implementation is updated to collect the signal. The signal is added to data.json and displayed on agent profile pages.
  4. The signal is collected for at least 30 days before any discussion of incorporating it into the health score.

5.3 Removal Process

A signal MAY be removed if the upstream data source becomes unavailable or undocumented, if the signal is found to be non-deterministic or gameable in a way that undermines its value, or if a superior signal that subsumes it is added. Removal increments the minor version. Historical Snapshots retain all fields from their collection date and are not retroactively modified.

6. Verification Process

HVTracker does not independently verify the cryptographic claims in trust signals — that responsibility lies with the end user and the upstream infrastructure. HVTracker's role is to collect and report whether the relevant cryptographic infrastructure is in use.

SignalWho verifies the cryptography
npm_provenancenpm CLI (npm audit signatures); Rekor transparency log
pypi_provenancePyPI infrastructure; Rekor transparency log
scorecard_scoreOpenSSF Scorecard infrastructure; publicly auditable
signed_commits_ratioGitHub's signature verification API; end users via git verify-commit

A true or high-ratio value means the relevant verification system reported success — it does not constitute an independent endorsement by HVTracker.

7. Versioning and Changelog

v0.x: Signals are collected and displayed but do not affect the health score. The specification is experimental; signals may be added, removed, or redefined with minor version increments.

v1.x: The signal set is considered stable. Changes that affect which signals are collected or how they are defined increment the major version. Promotion from v0.x to v1.0 requires that the signal set has been stable for at least 90 days and that the owner has reviewed the specification for completeness.

All published versions remain permanently accessible at their versioned URLs. A version MUST NOT be modified after it receives Published status.

VersionDateSummary
v0.12026-05-24Initial publication. Defines four signals: npm provenance, PyPI PEP 740, OSSF Scorecard (deps.dev primary + securityscorecards.dev fallback), signed commit ratio. Extension model and verification process defined.

A. Field Reference

FieldTypeSignalSince
npm_provenanceboolean | nullnpm SLSA provenance attestationMethodology v2.0
pypi_provenanceboolean | nullPyPI PEP 740 attestationMethodology v2.0
has_provenanceboolean | nullnpm_provenance OR pypi_provenance (derived)Methodology v2.0
scorecard_scorefloat | nullOSSF Scorecard overall score [0–10]Methodology v2.0
scorecard_checksobject | {}Per-check Scorecard scoresMethodology v2.0
signed_commits_ratiofloat | nullFraction of signed commits [0.0–1.0]Methodology v2.0

B. Coverage as of 2026-05-24

SignalCoverageNotes
signed_commits_ratio65/65 (100%)GitHub API always returns data for accessible repos
npm_provenance = true4/11 npm-tracked agents4 of 11 npm packages use Trusted Publishing
pypi_provenance = true7/46 PyPI-tracked agentsMost packages predate PEP 740 (2024)
has_provenance11/65 (17%)At least one of npm or PyPI provenance present
scorecard_score3/65 (5%)securityscorecards.dev fallback added 2026-05-24; untested in production