HVTracker Data Schema Specification
https://hvtracker.net/spec/data-schema/v0.1
1. Abstract
This document defines the schema for all machine-readable data published by HVTracker at the /data/ endpoint family. It specifies the URL catalog, field definitions, data types, nullability rules, refresh cadence, and the versioning policy governing schema evolution.
The key words MUST, MUST NOT, SHOULD, SHOULD NOT, and MAY in this document are to be interpreted as described in RFC 2119.
2. Motivation
HVTracker publishes daily health scores and trust signals for open-source AI agent projects. As third-party consumers begin building integrations — dashboards, alerts, research pipelines — the absence of a formal schema creates fragility: a field rename or type change silently breaks downstream consumers.
A formal data schema specification serves three purposes:
- Stability contract: Consumers can depend on documented fields not changing without a version increment.
- Discovery: The endpoint catalog and field definitions document what data exists, removing the need to reverse-engineer
data.json. - Trust: A versioned, published schema signals that the dataset is intended as infrastructure, not just a build artifact.
3. Terminology
- Snapshot
- A complete dataset capture as of a single daily cron run. Each snapshot contains all agent records with values reflecting the state of the world at generation time.
- Agent record
- A JSON object representing a single tracked project. Every agent record contains at minimum the fields defined in Section 5.2.
- Signal
- A measured attribute of an agent record. Signals are either activity signals (stars, commits, HN mentions), trust signals (provenance, scorecard, signed commits), or behavioral signals (future: public action counts).
- Envelope
- The top-level fields present in every endpoint response, defined in Section 5.1. Envelope fields carry metadata about the response itself rather than about individual agents.
- Slug
- A URL-safe identifier derived from the agent name by lowercasing and replacing non-alphanumeric characters with hyphens. Used to construct per-agent endpoint URLs.
- Null
- A JSON
nullvalue indicating the signal could not be collected during this cron run.nullis semantically distinct from zero and from the absence of a field.
4. Endpoint Catalog
All endpoints are static JSON files (except the HTML index) served from https://hvtracker.net/data/. Files are regenerated on each daily cron run at 06:00 UTC. CORS header Access-Control-Allow-Origin: * is set on all /data/* responses.
4.1 /data/latest.json
Content: Full snapshot — envelope fields plus an agents array containing all active agent records.
Refresh: Daily at 06:00 UTC, atomically replaced.
Use case: Primary integration point. Fetch once daily to get the complete leaderboard.
Size bound: SHOULD remain under 500 KB. If the dataset grows beyond this, the maintainer MUST split the endpoint or switch to pagination before the next schema version.
4.2 /data/agents/<slug>.json
Content: Single agent record (all fields from Section 5.2) plus a history array containing the last 90 days of daily snapshots for this agent (Section 5.3).
URL construction: https://hvtracker.net/data/agents/{slug}.json where {slug} is the agent's slug as defined in Section 3.
Refresh: Daily at 06:00 UTC.
Size bound: SHOULD remain under 50 KB per file.
Note: Files for legacy agents are also generated but tagged with "status": "legacy" in the agent record.
4.3 /data/signals/scorecard.json
Content: Envelope fields plus an agents array where each element contains only the fields: repo, name, scorecard_score, scorecard_checks, signed_commits_ratio.
Use case: Supply-chain security consumers who need trust signals without the full dataset.
4.4 /data/signals/provenance.json
Content: Envelope fields plus an agents array where each element contains: repo, name, has_provenance, npm_provenance, pypi_provenance.
Use case: Package provenance monitoring, SBOM pipelines.
4.5 /data/history/<YYYY-MM-DD>.json
Content: Full snapshot for the named calendar date (UTC). Same structure as /data/latest.json.
Permanence: Historical files are never deleted or overwritten. A file at /data/history/2026-05-25.json will remain accessible indefinitely.
Availability: Files exist for every date on which the cron ran successfully. Gaps are possible during outages.
4.6 /data/index.html
Content: Human-readable HTML catalog listing all available endpoints with descriptions, links, and generation metadata. Not machine-readable.
5. Field Definitions
5.1 Envelope Fields
Every endpoint response (except /data/index.html) is a JSON object containing the following envelope fields at the top level.
| Field | Type | Nullable | Description |
|---|---|---|---|
schema_version | string | No | Schema version string, e.g. "v0.1". Incremented per Section 6. |
generated_at | string | No | ISO 8601 UTC timestamp of this cron run, e.g. "2026-05-25 06:00 UTC". |
methodology_version | string | No | Methodology spec version used to compute scores, e.g. "v2.0". |
license | string | No | Data license declaration. Current value: "CC BY 4.0 — https://creativecommons.org/licenses/by/4.0/". |
updated | string | No | Human-readable generation time (same as generated_at). |
total | integer | No | Count of active (non-legacy) agent records in this snapshot. |
agents | array | No | Array of agent records. See Section 5.2. |
5.2 Agent Record Fields
Each element of the agents array is an agent record with the following fields.
| Field | Type | Nullable | Description |
|---|---|---|---|
name | string | No | Display name of the project. |
repo | string | No | GitHub repository path, e.g. "All-Hands-AI/OpenHands". |
url | string | No | Canonical GitHub URL. |
rank | integer | No | Global rank by health score (1 = highest). Active agents only. |
previous_rank | integer | null | Yes | Rank from the previous daily snapshot. null for newly added agents. |
rank_delta | integer | null | Yes | Change in rank since previous snapshot. Positive = improved rank. |
stars | integer | No | GitHub star count at collection time. |
stars_fmt | string | No | Human-formatted star count, e.g. "45.2k". |
forks | integer | No | GitHub fork count at collection time. |
forks_fmt | string | No | Human-formatted fork count. |
last_push | string | No | ISO 8601 UTC timestamp of the most recent push to the default branch. |
days_ago | integer | No | Days since last_push as of the collection date. |
weekly_commits | integer | null | Yes | Commit count in the trailing 4 weeks. null if the GitHub stats API did not return data. |
commits_low_confidence | boolean | No | true when weekly_commits was derived from a single-page estimate rather than a full count. |
score | number | No | Health score [0–100] computed per Methodology v2.0. One decimal place. |
description | string | null | Yes | Repository description from GitHub API. |
language | string | null | Yes | Primary programming language reported by GitHub. |
open_issues | integer | No | Open issue count at collection time. |
category | string | No | HVTracker category. One of: Coding Agents, Agent Frameworks, Workflow Platforms, Browser & Computer Use, LLM Gateways & Infra, Memory & Knowledge, Research & Data, Multi-Agent Systems. |
category_rank | integer | null | Yes | Rank within the agent's category. |
npm_package | string | No | npm package name if tracked, else empty string. |
pypi_package | string | No | PyPI package name if tracked, else empty string. |
weekly_downloads | integer | null | Yes | Combined weekly download count (npm + PyPI). null if no package is tracked or download fetch failed. |
dl_source | string | No | Source label for weekly_downloads, e.g. "pypi", "npm+pypi". Empty string if no downloads tracked. |
hn_mentions_30d | integer | null | Yes | Count of Hacker News story mentions in the trailing 30 days. null if no search term configured. |
has_provenance | boolean | null | Yes | Derived: true if npm_provenance or pypi_provenance is true. |
npm_provenance | boolean | null | Yes | SLSA provenance attestation detected on npm package. null if no npm package tracked. |
pypi_provenance | boolean | null | Yes | PEP 740 attestation detected on PyPI package. null if no PyPI package tracked. |
signed_commits_ratio | number | null | Yes | Fraction of recent commits with GPG/SSH signatures [0.0–1.0]. null if unavailable. |
scorecard_score | number | null | Yes | OSSF Scorecard overall score [0.0–10.0]. null if not yet scanned. |
scorecard_checks | object | No | Per-check Scorecard scores as a flat object. Empty object {} if no scorecard data. |
5.3 History Point Fields
Each element of the history array in per-agent endpoint responses (Section 4.2) is a history point:
| Field | Type | Nullable | Description |
|---|---|---|---|
date | string | No | Calendar date of this snapshot, YYYY-MM-DD format (UTC). |
rank | integer | null | Yes | Global rank on this date. |
score | number | null | Yes | Health score on this date. |
stars | integer | null | Yes | Star count on this date. |
5.4 Signal Subset Fields
Signal subset endpoints (Sections 4.3 and 4.4) use the same envelope as Section 5.1 but their agents array contains a reduced record. The exact field set for each subset is defined in Sections 4.3 and 4.4 respectively.
6. Schema Evolution
The schema is versioned as v{major}.{minor}. The schema_version envelope field carries the current version string.
6.1 Additive changes (minor version bump)
The following changes increment the minor version and are considered non-breaking:
- Adding a new field to agent records or envelope.
- Adding a new endpoint to the catalog.
- Widening a type (e.g.,
integer→integer | null). - Adding new allowed values to an enum field.
Consumers SHOULD be written to ignore unknown fields so that minor version bumps do not break integrations.
6.2 Breaking changes (major version bump)
The following changes increment the major version:
- Removing or renaming an existing field.
- Changing a field's type in a non-widening way.
- Changing the meaning of an existing field.
- Removing an endpoint from the catalog.
- Changing the URL structure of an existing endpoint.
When a major version is published, the previous major version's /data/latest.json remains accessible for a minimum of 90 days at a versioned URL (e.g., /data/v0/latest.json).
6.3 Adding new signal classes
New signal classes (e.g., behavioral signals introduced in Task 3 of the roadmap) are always introduced as additive fields and result in a minor version bump. They are not included in the health score formula without a Methodology spec version bump.
7. Data License
All data published at https://hvtracker.net/data/ is released under Creative Commons Attribution 4.0 International (CC BY 4.0).
You are free to share and adapt the data for any purpose, including commercial use, provided you give appropriate credit to HVTracker and link to https://hvtracker.net.
Source data (GitHub stars, commit activity, etc.) is sourced from the GitHub REST API and is subject to GitHub's terms of service. HVTracker does not grant rights to data that it does not own.
8. Versioning and Changelog
| Version | Date | Summary |
|---|---|---|
| v0.1 | 2026-05-25 | Initial publication. Defines 5 endpoints, 30 agent record fields, envelope format, history points, and schema evolution policy. |
A. Field Quick Reference
| Field | Type | Nullable | Category |
|---|---|---|---|
name | string | No | Identity |
repo | string | No | Identity |
url | string | No | Identity |
rank | integer | No | Ranking |
previous_rank | integer | null | Yes | Ranking |
rank_delta | integer | null | Yes | Ranking |
score | number | No | Score |
stars | integer | No | Activity |
forks | integer | No | Activity |
last_push | string | No | Activity |
days_ago | integer | No | Activity |
weekly_commits | integer | null | Yes | Activity |
weekly_downloads | integer | null | Yes | Activity |
hn_mentions_30d | integer | null | Yes | Community |
category | string | No | Classification |
language | string | null | Yes | Classification |
has_provenance | boolean | null | Yes | Trust |
npm_provenance | boolean | null | Yes | Trust |
pypi_provenance | boolean | null | Yes | Trust |
signed_commits_ratio | number | null | Yes | Trust |
scorecard_score | number | null | Yes | Trust |
scorecard_checks | object | No | Trust |