How to Evaluate AI Agent Safety: 5 Signals That Actually Matter
Open-source AI agents are being adopted at an unprecedented rate. Developers are integrating coding assistants, research agents, and workflow automation tools into production systems that handle sensitive data and make real decisions.
But how do you know if an AI agent is safe to use? GitHub stars measure popularity, not trustworthiness. A project with 50,000 stars can still have unsigned releases, no security policy, and dependencies riddled with known vulnerabilities.
At HVTracker, we track trust signals for 170+ open-source AI agents daily. Here are the 5 evidence-based signals that actually predict whether an agent is safe to adopt.
In this article
1. OpenSSF Scorecard
What it measures
Supply-chain security practices across 18 automated checks
The OpenSSF Scorecard is a tool maintained by the Open Source Security Foundation that runs automated checks against a GitHub repository. It evaluates branch protection, dependency pinning, vulnerability disclosure policies, CI/CD security, and more. Each check produces a score from 0-10.
This is the single most informative signal for supply-chain safety. A project scoring 7+/10 on the Scorecard has branch protection enabled, pins its dependencies, uses SAST tools, and has a vulnerability disclosure process.
The problem? Most AI agent projects don't score well. In our dataset, the median Scorecard score across tracked agents is below 5/10. The agents that score 7+ tend to be backed by established organizations with mature security practices.
2. Package Provenance
What it measures
Whether published packages can be traced back to their source code
Package provenance creates a cryptographic link between a published npm or PyPI package and the specific source code commit and CI/CD pipeline that produced it. It uses SLSA (Supply-chain Levels for Software Artifacts) attestations to prove a package wasn't tampered with between the repository and the registry.
This matters because the supply-chain attack surface for AI agents is enormous. When you run pip install or npm install, you're trusting that the published package matches the source code you reviewed on GitHub. Without provenance, there's no way to verify this.
Provenance adoption is still early. In our tracking, only a fraction of AI agents publish packages with verified provenance attestations. But the ones that do are signaling a level of security maturity that goes beyond the basics.
3. Signed Commits
What it measures
Whether code contributions are cryptographically authenticated
Signed commits use GPG or SSH keys to cryptographically prove that a commit was actually made by the claimed author. Without signing, anyone who gains write access to a repository (through a compromised token, for example) can push commits that appear to come from a trusted maintainer.
A high signed-commit ratio (80%+) indicates that a project's maintainers take code integrity seriously. It's not a perfect signal — some legitimate maintainers don't sign commits — but when combined with other signals, it's a strong indicator of security culture.
We measure this by sampling recent commits and calculating the ratio that are cryptographically verified. Projects backed by organizations with security policies tend to score highest here.
4. Activity Patterns
What it measures
Whether a project is actively maintained and responsive
Activity isn't just about commit frequency. The pattern matters: regular commits over time indicate sustained maintenance. A burst of commits followed by silence suggests a project might be abandoned. We look at commit cadence over 30 days, time since last push, and issue response patterns.
An unmaintained AI agent is a security liability. Vulnerabilities in dependencies go unpatched. Breaking changes in APIs go unaddressed. Users who report bugs get silence.
But very high activity can also be a warning sign. A project with hundreds of commits per week from a single contributor might be moving too fast for proper review. The healthiest pattern is consistent, multi-contributor activity with a reasonable cadence.
5. Transparency Indicators
What it measures
Whether the project makes its practices and policies visible
Transparency includes having a clear license, a security policy (SECURITY.md), a code of conduct, contributing guidelines, and documentation. These aren't just nice-to-have — they signal that a project is run with governance and accountability in mind.
A project that lacks a security policy is telling you something: they haven't thought about what happens when a vulnerability is discovered. A project with no license leaves you legally exposed. Missing documentation makes it harder to audit what the agent actually does.
We also look at whether the project uses public GitHub Actions (visible CI/CD) versus private pipelines. Public CI/CD means anyone can verify how the software is built, tested, and released.
Putting It All Together
No single signal tells the full story. An agent with a perfect Scorecard score but no activity in 6 months might be abandoned. An agent with thousands of commits but no provenance might have supply-chain risks.
The most trustworthy agents score well across multiple dimensions: they have reasonable Scorecard scores, some form of provenance or signing, active maintenance, and transparent governance.
This is exactly what HVTracker's Trust Score measures — a composite of activity, adoption, transparency, safety, and identity signals, weighted to reflect real-world risk. Each agent gets a score from 0-100, updated daily.
Check the trust score for any AI agent
HVTracker independently evaluates 170+ open-source AI agents daily. Browse the full leaderboard, compare agents side by side, and see exactly which trust signals each agent has.
Browse the Trust RegistryFurther Reading
- HVTracker Methodology — how trust scores are computed
- Security & Guardrails Agents — agents specifically built for AI safety
- Coding Agents — ranked by trust score
- OpenSSF Scorecard — check any repo's supply-chain security
- SLSA Framework — supply-chain integrity standard