HVTracker Provenance Profile
https://hvtracker.net/spec/provenance/v0.1
1. Abstract
This document specifies the trust signal model used by HVTracker to assess the supply chain integrity of open-source AI agent projects. It formally defines the schema, data source, collection method, freshness expectations, and failure modes for each of the four currently tracked signals: npm provenance attestations, PyPI PEP 740 attestations, OSSF Scorecard, and signed commit ratio.
The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119.
2. Motivation
Open-source AI agent projects are becoming critical infrastructure. Unlike traditional software libraries, AI agents often operate with broad system permissions — reading files, browsing the web, executing code, and calling external APIs on behalf of users. The integrity of the supply chain from which these agents are installed therefore carries higher stakes than for passive libraries.
HVTracker tracks health signals derived from project activity (stars, commits, forks). These signals measure adoption and development momentum but say nothing about whether the artifacts users install are trustworthy. A project with 50,000 stars and daily commits can still ship a compromised release if its publishing pipeline lacks attestations or its commits go unsigned.
The Provenance Profile addresses this gap. It defines a set of unilaterally observable, publicly verifiable signals that characterize the supply chain trustworthiness of a project's release artifacts. No maintainer participation, registration, or opt-in is required or assumed — all signals are derived from public cryptographic infrastructure.
3. Terminology
- Trust signal
- A binary or scalar value derived from publicly observable cryptographic infrastructure that characterizes one dimension of the supply chain integrity of a software project. Trust signals are not scores and are not aggregated into a composite value in this version.
- Attestation
- A cryptographically signed statement, produced by a known identity (typically a CI/CD system), asserting facts about a software artifact — most commonly that the artifact was built from a specific source commit in a specific pipeline environment. Attestations are logged to a transparency log so they cannot be silently revoked.
- Provenance
- The verifiable record of how a software artifact was produced: which source repository, which commit, which build environment, and which pipeline. Provenance is a specific category of attestation that answers "where did this artifact come from?"
- Artifact
- A published binary or source distribution of a software package — a
.whlfile on PyPI, a tarball on npm, a container image layer, etc. - Transparency log
- An append-only, cryptographically verifiable log of signed records. The primary transparency log used by npm and PyPI provenance is Sigstore's Rekor. Once a record is appended, it cannot be deleted without detection.
- Freshness
- The degree to which a trust signal reflects the current state of a project rather than a historical state. A signal is fresh if it was collected within one Daily Run of the current build.
- Verified signature
- A commit signature whose cryptographic validity has been confirmed by GitHub's signature verification API. A verified signature does not imply key quality or trust level — it means the signature is mathematically valid and the signing key is recognized by GitHub.
- Trusted Publisher
- A mechanism (npm: Provenance, PyPI: OIDC Trusted Publishing) that allows a CI/CD system to publish packages using a short-lived identity token rather than a long-lived API key. Artifacts published via a Trusted Publisher are eligible for provenance attestations.
- OSSF Scorecard
- An automated tool maintained by the Open Source Security Foundation that evaluates a project's security posture across a fixed set of checks and produces a score from 0 to 10.
4. Trust Signals
4.1 npm Provenance
Field: npm_provenance | Type: boolean | null | Scope: agents with a configured npm_package
| Value | Meaning |
|---|---|
true | The latest published version includes at least one provenance attestation in dist.attestations. |
false | The latest published version has no provenance attestations (dist.attestations is absent or null). |
null | No npm_package configured, or the API request failed. |
Source: GET https://registry.npmjs.org/{package}/latest — signal is true iff dist.attestations is present and non-null.
What it means: The package publisher used npm's Provenance feature, requiring a Trusted Publisher (GitHub Actions, GitLab CI, CircleCI). The resulting attestation is an in-toto SLSA provenance statement, signed via Sigstore's keyless protocol and logged to the Rekor transparency log. End users can verify via npm audit signatures.
Freshness: Reflects the latest tag at collection time. Changes on the next Daily Run after a new version is published.
Limitations: Only the latest tag is checked. The attestation's content is not cryptographically verified by the reference implementation — presence is checked, not validity.
4.2 PyPI Provenance (PEP 740)
Field: pypi_provenance | Type: boolean | null | Scope: agents with a configured pypi_package
| Value | Meaning |
|---|---|
true | The last file entry in the package's Simple API response has a non-null provenance field. |
false | The last file entry has no provenance field. |
null | No pypi_package configured, or the API returned non-200. |
Source: GET https://pypi.org/simple/{package}/ with Accept: application/vnd.pypi.simple.v1+json (PEP 691). The last element of the files array is checked for a provenance field.
Collection: Requests MUST be made serially with a minimum 1.2-second delay between PyPI requests. On HTTP 429 the signal is null for that run.
What it means: The most recently uploaded distribution carries a PEP 740 digital attestation generated by PyPI's Trusted Publishing mechanism. The attestation is cryptographically bound to the source repository and CI run that produced the artifact.
Limitations: PEP 740 was accepted in 2024; most packages predate it. Packages published via twine with API tokens cannot carry PEP 740 attestations by design. Only the last uploaded file is checked, not the latest stable release tag.
4.3 OSSF Scorecard
Fields: scorecard_score (float | null, range 0–10) · scorecard_checks (object | {}, check name → score)
Scope: All agents.
Collection: Scorecard data is generated by running the OSSF Scorecard CLI tool directly against each repository, refreshed weekly via GitHub Actions. Results are cached in scorecard-cache.json; daily builds read from this cache. If a repository is absent from the weekly cache, the daily build falls back to remote APIs (tried in order):
- deps.dev:
GET https://api.deps.dev/v3/projects/github.com%2F{owner}%2F{repo}— readsscorecard.overallScoreandscorecard.checks. - securityscorecards.dev:
GET https://api.securityscorecards.dev/projects/github.com/{owner}/{repo}— readsscoreandchecks.
What it means: The OSSF Scorecard score is a weighted aggregate of 10–18 individual checks (Maintained, Code-Review, Branch-Protection, Signed-Releases, Pinned-Dependencies, Vulnerabilities, Token-Permissions, Dangerous-Workflow, and others). HVTracker reports the score as-is; it does not reweight or reinterpret individual checks.
Freshness: The CLI scan runs weekly; results are at most 7 days old. Daily builds serve the cached value unchanged until the next weekly scan.
Limitations: Absence of a score does not imply poor security posture. The check set and weights are controlled by the OpenSSF Scorecard project, not HVTracker, and may change between Scorecard versions.
4.4 Signed Commit Ratio
Field: signed_commits_ratio | Type: float | null, range 0.0–1.0, rounded to 3 decimal places | Scope: All agents.
| Value | Meaning |
|---|---|
1.0 | All sampled commits carry a verified signature. |
0.0 < x < 1.0 | Fraction x of sampled commits carry a verified signature. |
0.0 | No sampled commits carry a verified signature. |
null | API request failed or commits list is empty. |
Source: GET https://api.github.com/repos/{owner}/{repo}/commits?per_page=100 — reads commit.verification.verified on each result. Signal = verified_count / total_count.
What it means: The fraction of the most recent 100 commits on the default branch that carry a cryptographic signature (GPG, SSH, or S/MIME) verified as mathematically valid by GitHub.
Freshness: Collected fresh each Daily Run. The sample window slides forward as new commits are pushed.
Limitations: Commits made through GitHub's web UI are signed by GitHub's own key and counted as verified, which may inflate the ratio. Only 100 commits are sampled. Signature presence is measured, not key quality or trust level.
5. Extension Model
5.1 Inclusion Criteria
A proposed new trust signal MUST satisfy all of the following:
- Unilateral observability: Derivable from public APIs or public cryptographic infrastructure without maintainer participation or opt-in.
- Binary or scalar output: Produces a
boolean,float, orintegervalue (ornullfor unavailability). Categorical or free-text signals are not permitted. - Determinism: Given the same API response at the same point in time, two independent implementations MUST produce the same signal value.
- Distinct from existing signals: Measures a dimension of supply chain trust not already covered by the four signals in Section 4.
- Stable upstream source: The data source MUST be maintained by a known organization and have a documented API.
5.2 Addition Process
- The candidate signal is described in a draft section following the format of Section 4 (schema, data source, collection method, meaning, freshness, failure modes, limitations).
- The draft is reviewed by the owner and merged into a new minor version of this specification (e.g., v0.1 → v0.2).
- The reference implementation is updated to collect the signal. The signal is added to
data.jsonand displayed on agent profile pages. - The signal is collected for at least 30 days before any discussion of incorporating it into the health score.
5.3 Removal Process
A signal MAY be removed if the upstream data source becomes unavailable or undocumented, if the signal is found to be non-deterministic or gameable in a way that undermines its value, or if a superior signal that subsumes it is added. Removal increments the minor version. Historical Snapshots retain all fields from their collection date and are not retroactively modified.
6. Verification Process
HVTracker does not independently verify the cryptographic claims in trust signals — that responsibility lies with the end user and the upstream infrastructure. HVTracker's role is to collect and report whether the relevant cryptographic infrastructure is in use.
| Signal | Who verifies the cryptography |
|---|---|
npm_provenance | npm CLI (npm audit signatures); Rekor transparency log |
pypi_provenance | PyPI infrastructure; Rekor transparency log |
scorecard_score | OpenSSF Scorecard infrastructure; publicly auditable |
signed_commits_ratio | GitHub's signature verification API; end users via git verify-commit |
A true or high-ratio value means the relevant verification system reported success — it does not constitute an independent endorsement by HVTracker.
7. Versioning and Changelog
v0.x: Signals are collected and displayed but do not affect the health score. The specification is experimental; signals may be added, removed, or redefined with minor version increments.
v1.x: The signal set is considered stable. Changes that affect which signals are collected or how they are defined increment the major version. Promotion from v0.x to v1.0 requires that the signal set has been stable for at least 90 days and that the owner has reviewed the specification for completeness.
All published versions remain permanently accessible at their versioned URLs. A version MUST NOT be modified after it receives Published status.
| Version | Date | Summary |
|---|---|---|
| v0.1 | 2026-05-24 | Initial publication. Defines four signals: npm provenance, PyPI PEP 740, OSSF Scorecard (deps.dev primary + securityscorecards.dev fallback), signed commit ratio. Extension model and verification process defined. |
A. Field Reference
| Field | Type | Signal | Since |
|---|---|---|---|
npm_provenance | boolean | null | npm SLSA provenance attestation | Methodology v2.0 |
pypi_provenance | boolean | null | PyPI PEP 740 attestation | Methodology v2.0 |
has_provenance | boolean | null | npm_provenance OR pypi_provenance (derived) | Methodology v2.0 |
scorecard_score | float | null | OSSF Scorecard overall score [0–10] | Methodology v2.0 |
scorecard_checks | object | {} | Per-check Scorecard scores | Methodology v2.0 |
signed_commits_ratio | float | null | Fraction of signed commits [0.0–1.0] | Methodology v2.0 |
B. Coverage as of 2026-05-24
| Signal | Coverage | Notes |
|---|---|---|
signed_commits_ratio | 65/65 (100%) | GitHub API always returns data for accessible repos |
npm_provenance = true | 4/11 npm-tracked agents | 4 of 11 npm packages use Trusted Publishing |
pypi_provenance = true | 7/46 PyPI-tracked agents | Most packages predate PEP 740 (2024) |
has_provenance | 11/65 (17%) | At least one of npm or PyPI provenance present |
scorecard_score | 3/65 (5%) | securityscorecards.dev fallback added 2026-05-24; untested in production |