HVTracker Data Schema Specification

Version: v0.1 Status: Published Published: 2026-05-25 Authors: HVTracker

1. Abstract

This document defines the schema for all machine-readable data published by HVTracker at the /data/ endpoint family. It specifies the URL catalog, field definitions, data types, nullability rules, refresh cadence, and the versioning policy governing schema evolution.

The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119.

2. Motivation

HVTracker publishes daily health scores and trust signals for open-source AI agent projects. As third-party consumers begin building integrations — dashboards, alerts, research pipelines — the absence of a formal schema creates fragility: a field rename or type change silently breaks downstream consumers.

A formal data schema specification serves three purposes:

  • Stability contract: Consumers can depend on documented fields not changing without a version increment.
  • Discovery: The endpoint catalog and field definitions document what data exists, removing the need to reverse-engineer data.json.
  • Trust: A versioned, published schema signals that the dataset is intended as infrastructure, not just a build artifact.

3. Terminology

Snapshot
A complete dataset capture as of a single daily cron run. Each snapshot contains all agent records with values reflecting the state of the world at generation time.
Agent record
A JSON object representing a single tracked project. Every agent record contains at minimum the fields defined in Section 5.2.
Signal
A measured attribute of an agent record. Signals are either activity signals (stars, commits, HN mentions), trust signals (provenance, scorecard, signed commits), or behavioral signals (future: public action counts).
Envelope
The top-level fields present in every endpoint response, defined in Section 5.1. Envelope fields carry metadata about the response itself rather than about individual agents.
Slug
A URL-safe identifier derived from the agent name by lowercasing and replacing non-alphanumeric characters with hyphens. Used to construct per-agent endpoint URLs.
Null
A JSON null value indicating the signal could not be collected during this cron run. null is semantically distinct from zero and from the absence of a field.

4. Endpoint Catalog

All endpoints are static JSON files (except the HTML index) served from https://hvtracker.net/data/. Files are regenerated on each daily cron run at 06:00 UTC. CORS header Access-Control-Allow-Origin: * is set on all /data/* responses.

4.1 /data/latest.json

Content: Full snapshot — envelope fields plus an agents array containing all active agent records.

Refresh: Daily at 06:00 UTC, atomically replaced.

Use case: Primary integration point. Fetch once daily to get the complete leaderboard.

Size bound: SHOULD remain under 500 KB. If the dataset grows beyond this, the maintainer MUST split the endpoint or switch to pagination before the next schema version.

4.2 /data/agents/<slug>.json

Content: Single agent record (all fields from Section 5.2) plus a history array containing the last 90 days of daily snapshots for this agent (Section 5.3).

URL construction: https://hvtracker.net/data/agents/{slug}.json where {slug} is the agent's slug as defined in Section 3.

Refresh: Daily at 06:00 UTC.

Size bound: SHOULD remain under 50 KB per file.

Note: Files for legacy agents are also generated but tagged with "status": "legacy" in the agent record.

4.3 /data/signals/scorecard.json

Content: Envelope fields plus an agents array where each element contains only the fields: repo, name, scorecard_score, scorecard_checks, signed_commits_ratio.

Use case: Supply-chain security consumers who need trust signals without the full dataset.

4.4 /data/signals/provenance.json

Content: Envelope fields plus an agents array where each element contains: repo, name, has_provenance, npm_provenance, pypi_provenance.

Use case: Package provenance monitoring, SBOM pipelines.

4.5 /data/history/<YYYY-MM-DD>.json

Content: Full snapshot for the named calendar date (UTC). Same structure as /data/latest.json.

Permanence: Historical files are never deleted or overwritten. A file at /data/history/2026-05-25.json will remain accessible indefinitely.

Availability: Files exist for every date on which the cron ran successfully. Gaps are possible during outages.

4.6 /data/index.html

Content: Human-readable HTML catalog listing all available endpoints with descriptions, links, and generation metadata. Not machine-readable.

5. Field Definitions

5.1 Envelope Fields

Every endpoint response (except /data/index.html) is a JSON object containing the following envelope fields at the top level.

FieldTypeNullableDescription
schema_versionstringNoSchema version string, e.g. "v0.1". Incremented per Section 6.
generated_atstringNoISO 8601 UTC timestamp of this cron run, e.g. "2026-05-25 06:00 UTC".
methodology_versionstringNoMethodology spec version used to compute scores, e.g. "v2.0".
licensestringNoData license declaration. Current value: "CC BY 4.0 — https://creativecommons.org/licenses/by/4.0/".
updatedstringNoHuman-readable generation time (same as generated_at).
totalintegerNoCount of active (non-legacy) agent records in this snapshot.
agentsarrayNoArray of agent records. See Section 5.2.

5.2 Agent Record Fields

Each element of the agents array is an agent record with the following fields.

FieldTypeNullableDescription
namestringNoDisplay name of the project.
repostringNoGitHub repository path, e.g. "All-Hands-AI/OpenHands".
urlstringNoCanonical GitHub URL.
rankintegerNoGlobal rank by health score (1 = highest). Active agents only.
previous_rankinteger | nullYesRank from the previous daily snapshot. null for newly added agents.
rank_deltainteger | nullYesChange in rank since previous snapshot. Positive = improved rank.
starsintegerNoGitHub star count at collection time.
stars_fmtstringNoHuman-formatted star count, e.g. "45.2k".
forksintegerNoGitHub fork count at collection time.
forks_fmtstringNoHuman-formatted fork count.
last_pushstringNoISO 8601 UTC timestamp of the most recent push to the default branch.
days_agointegerNoDays since last_push as of the collection date.
weekly_commitsinteger | nullYesCommit count in the trailing 4 weeks. null if the GitHub stats API did not return data.
commits_low_confidencebooleanNotrue when weekly_commits was derived from a single-page estimate rather than a full count.
scorenumberNoHealth score [0–100] computed per Methodology v2.0. One decimal place.
descriptionstring | nullYesRepository description from GitHub API.
languagestring | nullYesPrimary programming language reported by GitHub.
open_issuesintegerNoOpen issue count at collection time.
categorystringNoHVTracker category. One of: Coding Agents, Agent Frameworks, Workflow Platforms, Browser & Computer Use, LLM Gateways & Infra, Memory & Knowledge, Research & Data, Multi-Agent Systems.
category_rankinteger | nullYesRank within the agent's category.
npm_packagestringNonpm package name if tracked, else empty string.
pypi_packagestringNoPyPI package name if tracked, else empty string.
weekly_downloadsinteger | nullYesCombined weekly download count (npm + PyPI). null if no package is tracked or download fetch failed.
dl_sourcestringNoSource label for weekly_downloads, e.g. "pypi", "npm+pypi". Empty string if no downloads tracked.
hn_mentions_30dinteger | nullYesCount of Hacker News story mentions in the trailing 30 days. null if no search term configured.
has_provenanceboolean | nullYesDerived: true if npm_provenance or pypi_provenance is true.
npm_provenanceboolean | nullYesSLSA provenance attestation detected on npm package. null if no npm package tracked.
pypi_provenanceboolean | nullYesPEP 740 attestation detected on PyPI package. null if no PyPI package tracked.
signed_commits_rationumber | nullYesFraction of recent commits with GPG/SSH signatures [0.0–1.0]. null if unavailable.
scorecard_scorenumber | nullYesOSSF Scorecard overall score [0.0–10.0]. null if not yet scanned.
scorecard_checksobjectNoPer-check Scorecard scores as a flat object. Empty object {} if no scorecard data.

5.3 History Point Fields

Each element of the history array in per-agent endpoint responses (Section 4.2) is a history point:

FieldTypeNullableDescription
datestringNoCalendar date of this snapshot, YYYY-MM-DD format (UTC).
rankinteger | nullYesGlobal rank on this date.
scorenumber | nullYesHealth score on this date.
starsinteger | nullYesStar count on this date.

5.4 Signal Subset Fields

Signal subset endpoints (Sections 4.3 and 4.4) use the same envelope as Section 5.1 but their agents array contains a reduced record. The exact field set for each subset is defined in Sections 4.3 and 4.4 respectively.

6. Schema Evolution

The schema is versioned as v{major}.{minor}. The schema_version envelope field carries the current version string.

6.1 Additive changes (minor version bump)

The following changes increment the minor version and are considered non-breaking:

  • Adding a new field to agent records or envelope.
  • Adding a new endpoint to the catalog.
  • Widening a type (e.g., integerinteger | null).
  • Adding new allowed values to an enum field.

Consumers SHOULD be written to ignore unknown fields so that minor version bumps do not break integrations.

6.2 Breaking changes (major version bump)

The following changes increment the major version:

  • Removing or renaming an existing field.
  • Changing a field's type in a non-widening way.
  • Changing the meaning of an existing field.
  • Removing an endpoint from the catalog.
  • Changing the URL structure of an existing endpoint.

When a major version is published, the previous major version's /data/latest.json remains accessible for a minimum of 90 days at a versioned URL (e.g., /data/v0/latest.json).

6.3 Adding new signal classes

New signal classes (e.g., behavioral signals introduced in Task 3 of the roadmap) are always introduced as additive fields and result in a minor version bump. They are not included in the health score formula without a Methodology spec version bump.

7. Data License

All data published at https://hvtracker.net/data/ is released under Creative Commons Attribution 4.0 International (CC BY 4.0).

You are free to share and adapt the data for any purpose, including commercial use, provided you give appropriate credit to HVTracker and link to https://hvtracker.net.

Source data (GitHub stars, commit activity, etc.) is sourced from the GitHub REST API and is subject to GitHub's terms of service. HVTracker does not grant rights to data that it does not own.

8. Versioning and Changelog

VersionDateSummary
v0.12026-05-25Initial publication. Defines 5 endpoints, 30 agent record fields, envelope format, history points, and schema evolution policy.

A. Field Quick Reference

FieldTypeNullableCategory
namestringNoIdentity
repostringNoIdentity
urlstringNoIdentity
rankintegerNoRanking
previous_rankinteger | nullYesRanking
rank_deltainteger | nullYesRanking
scorenumberNoScore
starsintegerNoActivity
forksintegerNoActivity
last_pushstringNoActivity
days_agointegerNoActivity
weekly_commitsinteger | nullYesActivity
weekly_downloadsinteger | nullYesActivity
hn_mentions_30dinteger | nullYesCommunity
categorystringNoClassification
languagestring | nullYesClassification
has_provenanceboolean | nullYesTrust
npm_provenanceboolean | nullYesTrust
pypi_provenanceboolean | nullYesTrust
signed_commits_rationumber | nullYesTrust
scorecard_scorenumber | nullYesTrust
scorecard_checksobjectNoTrust