AgentBench

A Comprehensive Benchmark to Evaluate LLMs as Agents (ICLR'24)

Observability & Evaluation Python Grade D Listed Apache-2.0
Listing state
Listed
HVTrust
16.7/100 · Grade D
Last push
2026-02-08 · 134d ago
Recent change
New

Quick Trust Read

Verdict
Thin or incomplete trust evidence. Review carefully before production use.
16.7/100 · Grade D
Strongest Signal
Identity / Provenance
10.8/18
Weakest Signal
Safety / Integrity
2.5/25
What Would Improve It
Add or improve OSSF Scorecard coverage so safety checks are easier to verify.
Recent Changes
2026-06-21
Newly Listed
First tracked at rank #297
2026-05-25
Removed From Active Tracking
Removed from active tracking
2026-05-24
Rank Moved
Rank dropped 32 spots (#51 → #83)
Maintainer Checklist
Add Scorecard coverage Expose the repository to OpenSSF Scorecard checks so supply-chain posture is easier to verify.
Publish provenance Add package provenance or release attestations so users can verify where shipped artifacts came from.
Refresh maintenance signals The repo was last pushed 134 days ago. Fresh activity helps separate stable projects from stale ones.
39.1
Activity Score · out of 100
16.7
HVTrust Score · out of 100
#297
Global Rank · of 300
#20

How to read this: HVTrust (0–100) weighs supply-chain signals (provenance, OSSF Scorecard, signed commits, open license) alongside real-world adoption. Grade D reflects the trust score band: A ≥ 80, B ≥ 65, C ≥ 50, D < 50. Full methodology →

Signals refreshed 2026-06-22 00:01 UTC · Repo last pushed 134 days ago

Rank Trend

2026-05-23 2026-06-21

Activity & Reach

Stars
3.5k
Forks
263
Last Push
2026-02-08
134 days ago
Commits (4 wk)
0
Downloads (7d)
HN mentions (30d)
Open Issues
74
Rank Change
=
was #297

Analysis

HVTrust Dimensions

16.7 / 100 · 50.0% confidence
Safety / IntegrityOSSF, provenance, signatures
2.5 / 25
Identity / ProvenanceListing and build link
10.8 / 18
TransparencyLicense and public checks
8.5 / 17
MaintenanceFreshness and commits
3.1 / 20
AdoptionStars and downloads
8.5 / 20

Activity Inputs

39.1 / 100
StarsRepository reach
21.3 / 30
FreshnessLast push recency
6.4 / 25
ActivityRecent commits
0.0 / 25
CommunityFork signal
11.3 / 20

Supply Chain Trust

Package Provenance
None
No package attestations found
OSSF Scorecard
Not available
Signed Commits
51%
of last 100 commits verified

Is AgentBench safe?

Public trust evidence for AgentBench is thin: several supply-chain signals are missing or weak. This does not mean the project is unsafe — it means an outside observer cannot easily verify the usual integrity checks. Treat with extra scrutiny.
Does AgentBench publish package provenance?
No published build provenance is currently detected for AgentBench. This is common for open-source projects but means consumers cannot independently verify that the package on the registry matches the GitHub source.
Does AgentBench have an OpenSSF Scorecard?
No OpenSSF Scorecard data is currently published for AgentBench. Maintainers can enable the Scorecard GitHub Action to get a public score; without it, automated supply-chain hygiene is harder for outsiders to verify.
Is AgentBench actively maintained?
Slowing down. Last push was 134 days ago — keep an eye on whether activity resumes.
What license does AgentBench use?
AgentBench ships under Apache-2.0. A declared, OSI-approved license is one of the transparency signals HVTrust scores.
Are AgentBench's commits signed?
50% of the last 100 commits to AgentBench are verified-signed (GPG, SSH, S/MIME, or GitHub's signing flow). Signed commits help confirm that code was authored by who the commit claims.

Not a safety endorsement. HVTracker describes what public signals show, not whether a project is safe for your use case. Run your own security review before adopting in production.

AI agent surface

Profile context only

HVTrust currently ranks supply-chain and project-integrity trust only. This public view shows a compact AI-agent surface snapshot from repo docs and manifests. These fields are descriptive context and do not affect the production HVTrust rank. An experimental local preview remains available in Score Lab →, and the policy boundary is tracked on the roadmap →

MCP Server Support
None detected
No MCP server signal detected.
Detailed evidence is not shown in the public view.
External Service Dependencies
high confidence
3 detected
Public provider/service dependencies detected.
  • Anthropic
  • OpenAI
  • Redis
Credential signal: No explicit API-key/config marker detected.
Tool / Plugin Surface
high confidence
Extensions
Extension based plugin/integration surface detected.
  • browser
  • code
  • database
  • shell
Detailed evidence is not shown in the public view.
Package Provenance Drift
N/A
No package source configured
Detailed evidence is not shown in the public view.
  • MCP signal live
  • External deps live
  • Tool / plugin surface live
  • Package provenance drift live

Maintain AgentBench?

HVTrust scores AgentBench from public signals only — we never contact maintainers first. If a signal is wrong, stale, or missing (provenance you publish, a Scorecard you run, signed releases), tell us and we'll review it. Corrections are public and tracked on GitHub.

Reputation Timeline

Rank 1Delisted 1Listed 1
2026-06-21
Newly Listed
First tracked at rank #297
2026-05-25
Removed From Active Tracking
Removed from active tracking
2026-05-24
Rank Moved
Rank dropped 32 spots (#51 → #83)

Embed Badge Badge guide for maintainers →

HVTrust 16.7 Grade D
Markdown:
[![HVTrust](https://hvtracker.net/badge/agentbench.svg)](https://hvtracker.net/agents/agentbench)
HTML:
<a href="https://hvtracker.net/agents/agentbench"><img src="https://hvtracker.net/badge/agentbench.svg" alt="HVTrust"></a>

Other agents in Observability & Evaluation

Data sources
GitHub REST API (repo, commits, stars, forks, license)
Each agent's signals refresh once daily across 6 staggered batches. Methodology v3.2 · Raw JSON