Is AgentBench safe to use?

AgentBench has an HVTrust score of 16.7/100 (Grade D), ranked #297 of 300 tracked AI agent projects. This score reflects verifiable supply-chain integrity, transparency, maintenance, and adoption signals.

What is the trust score for AgentBench?

AgentBench scores 16.7/100 on HVTracker's evidence-weighted HVTrust scale. It has 3.5k GitHub stars and is categorized under Observability & Evaluation.

AgentBench

THUDM/AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Observability & Evaluation Python Grade D Listed Apache-2.0

Compare Compare saved agents Track this agent Suggest correction

Listing state

Listed

HVTrust

16.7/100 · Grade D

Last push

2026-02-08 · 134d ago

Recent change

New

Quick Trust Read

Verdict

Thin or incomplete trust evidence. Review carefully before production use.

16.7/100 · Grade D

Strongest Signal

Identity / Provenance

10.8/18

Weakest Signal

Safety / Integrity

2.5/25

What Would Improve It

Add or improve OSSF Scorecard coverage so safety checks are easier to verify.

Recent Changes

2026-06-21

Newly Listed

First tracked at rank #297

2026-05-25

Removed From Active Tracking

Removed from active tracking

2026-05-24

Rank Moved

Rank dropped 32 spots (#51 → #83)

Maintainer Checklist

Add Scorecard coverage Expose the repository to OpenSSF Scorecard checks so supply-chain posture is easier to verify.

Publish provenance Add package provenance or release attestations so users can verify where shipped artifacts came from.

Refresh maintenance signals The repo was last pushed 134 days ago. Fresh activity helps separate stable projects from stale ones.

Compare with another agent Track AgentBench View Observability & Evaluation Report missing data

39.1

Activity Score · out of 100

16.7

HVTrust Score · out of 100

#297

Global Rank · of 300

#20

In Observability & Evaluation

How to read this: HVTrust (0–100) weighs supply-chain signals (provenance, OSSF Scorecard, signed commits, open license) alongside real-world adoption. Grade D reflects the trust score band: A ≥ 80, B ≥ 65, C ≥ 50, D < 50. Full methodology →

Rank Trend

2026-05-23 2026-06-21

Activity & Reach

Stars

3.5k

Forks

263

Last Push

2026-02-08

134 days ago

Commits (4 wk)

Downloads (7d)

—

HN mentions (30d)

—

Open Issues

Rank Change

was #297

Analysis

HVTrust Dimensions

16.7 / 100 · 50.0% confidence

Safety / IntegrityOSSF, provenance, signatures

2.5 / 25

Identity / ProvenanceListing and build link

10.8 / 18

TransparencyLicense and public checks

8.5 / 17

MaintenanceFreshness and commits

3.1 / 20

AdoptionStars and downloads

8.5 / 20

Activity Inputs

39.1 / 100

StarsRepository reach

21.3 / 30

FreshnessLast push recency

6.4 / 25

ActivityRecent commits

0.0 / 25

CommunityFork signal

11.3 / 20

Supply Chain Trust

Package Provenance

None

No package attestations found

OSSF Scorecard

—

Not available

Signed Commits

51%

of last 100 commits verified

Is AgentBench safe?

Public trust evidence for AgentBench is thin: several supply-chain signals are missing or weak. This does not mean the project is unsafe — it means an outside observer cannot easily verify the usual integrity checks. Treat with extra scrutiny.

Does AgentBench publish package provenance?

No published build provenance is currently detected for AgentBench. This is common for open-source projects but means consumers cannot independently verify that the package on the registry matches the GitHub source.

Does AgentBench have an OpenSSF Scorecard?

No OpenSSF Scorecard data is currently published for AgentBench. Maintainers can enable the Scorecard GitHub Action to get a public score; without it, automated supply-chain hygiene is harder for outsiders to verify.

Is AgentBench actively maintained?

Slowing down. Last push was 134 days ago — keep an eye on whether activity resumes.

What license does AgentBench use?

AgentBench ships under Apache-2.0. A declared, OSI-approved license is one of the transparency signals HVTrust scores.

Are AgentBench's commits signed?

50% of the last 100 commits to AgentBench are verified-signed (GPG, SSH, S/MIME, or GitHub's signing flow). Signed commits help confirm that code was authored by who the commit claims.

Not a safety endorsement. HVTracker describes what public signals show, not whether a project is safe for your use case. Run your own security review before adopting in production.

AI agent surface

Profile context only

HVTrust currently ranks supply-chain and project-integrity trust only. This public view shows a compact AI-agent surface snapshot from repo docs and manifests. These fields are descriptive context and do not affect the production HVTrust rank. An experimental local preview remains available in Score Lab →, and the policy boundary is tracked on the roadmap →

MCP Server Support

None detected

No MCP server signal detected.

Detailed evidence is not shown in the public view.

External Service Dependencies

high confidence

3 detected

Public provider/service dependencies detected.

Anthropic
OpenAI
Redis

Credential signal: No explicit API-key/config marker detected.

Tool / Plugin Surface

high confidence

Extensions

Extension based plugin/integration surface detected.

browser
code
database
shell

Detailed evidence is not shown in the public view.

Package Provenance Drift

N/A

No package source configured

Detailed evidence is not shown in the public view.

MCP signal live
External deps live
Tool / plugin surface live
Package provenance drift live

Maintain AgentBench?

HVTrust scores AgentBench from public signals only — we never contact maintainers first. If a signal is wrong, stale, or missing (provenance you publish, a Scorecard you run, signed releases), tell us and we'll review it. Corrections are public and tracked on GitHub.

Submit a correction → Add the trust badge

Reputation Timeline

2026-06-21

Newly Listed

First tracked at rank #297

2026-05-25

Removed From Active Tracking

Removed from active tracking

2026-05-24

Rank Moved

Rank dropped 32 spots (#51 → #83)

GitHub →

Embed Badge Badge guide for maintainers →

Markdown:
            [![HVTrust](https://hvtracker.net/badge/agentbench.svg)](https://hvtracker.net/agents/agentbench)
            
HTML:
            <a href="https://hvtracker.net/agents/agentbench"><img src="https://hvtracker.net/badge/agentbench.svg" alt="HVTrust"></a>

Other agents in Observability & Evaluation

Data sources
GitHub REST API (repo, commits, stars, forks, license)
Each agent's signals refresh once daily across 6 staggered batches. Methodology v3.2 · Raw JSON