LangChain vs LangGraph vs CrewAI vs AutoGPT — Ranked by Trust, Not Hype
Every "best AI agent framework" post ranks by stars, features, or vibes. Nobody ranks by whether you can actually verify the thing you're installing.
We do. HVTracker ranks 172 open-source AI agents — including every major framework — by supply-chain trust signals: provenance, OSSF Scorecard, signed commits, maintenance, and evidence coverage. Here's the framework tier list nobody asked for but everyone needs.
The full rankings
| Framework | Stars | Trust Rank | Score | Provenance | Scorecard |
|---|---|---|---|---|---|
| LangGraph | 33k | #1 | 91.9 | Verified | 6.8 |
| PydanticAI | 17k | #2 | 90.6 | Verified | 5.8 |
| OpenAI Agents SDK | 26k | #4 | 90.2 | Verified | 4.2 |
| AutoGen | 58k | #22 | 78.1 | Verified | 5.9 |
| CrewAI | 52k | #23 | 77.9 | None | 8.0 |
| LangChain | 138k | #24 | 74.9 | None | 6.8 |
| AutoGPT | 184k | #39 | 69.2 | None | 5.7 |
| CopilotKit | 31k | #45 | 68.5 | Verified | N/A |
| Agno | 40k | #48 | 68.4 | None | 4.7 |
| LlamaIndex | 49k | #126 | 42.8 | None | N/A |
| smolagents | 27k | #138 | 38.0 | None | N/A |
| OpenManus | 56k | #167 | 22.6 | None | N/A |
The tier breakdown
Tier 1: actually verifiable (rank < 25)
LangGraph, PydanticAI, OpenAI Agents SDK, AutoGen, CrewAI, LangChain
These projects have enough independent signals — provenance, Scorecard, signed commits, active maintenance — to give an outsider confidence in the supply chain. The top 3 all publish build provenance. CrewAI doesn't publish provenance but compensates with the highest Scorecard in the category (8.0/10).
Tier 2: popular but incomplete (rank 25–80)
AutoGPT, CopilotKit, Agno, Composio
Signal coverage is partial. Some have provenance but no Scorecard. Others have Scorecard but no provenance. Trust scores in the high 60s — decent but with verifiable gaps. AutoGPT, the most-starred framework at 184k stars, ranks #39 because stars are capped and weighted last in HVTrust.
Tier 3: thin evidence (rank > 100)
LlamaIndex, smolagents, OpenManus
These are active, widely-used projects — but their public trust evidence is thin. No provenance, no Scorecard, low signed-commit ratios or missing data. OpenManus has 56k stars and ranks #167 of 172. LlamaIndex — one of the most widely adopted RAG frameworks — ranks #126.
This is not a quality judgment. These projects may have excellent internal security practices. But from the outside, with only public signals to work with, the verifiable evidence is not there.
What the data says about "best framework" posts
The internet is full of "LangChain vs LlamaIndex" comparison posts. Almost all of them compare features, developer experience, and LLM provider support. None of them ask:
- Can I verify the package I'm installing matches the source code?
- Does the project use branch protection and code review?
- Are the commits signed?
- Is there a security disclosure policy?
If you're evaluating a framework that will execute LLM-generated code in your infrastructure, these questions matter more than whether it supports streaming or has a nicer API for tool calling.
Why PydanticAI at #2 matters
PydanticAI has 17k stars — a fraction of LangChain's 138k or AutoGPT's 184k. But it ranks #2 globally because it nails the signals that are hardest to fake: build provenance is published, commits are signed, the Scorecard is public, and evidence coverage is broad enough for a Grade B.
This is the confidence multiplier in action. PydanticAI doesn't have the most impressive number for any single signal. It has consistent, verifiable evidence across all of them. That's what trust looks like when you measure it instead of counting stars.
Dig into the data yourself
Every score, signal, and check on this page is independently verifiable from the links on each agent's profile.
All agent frameworks How we scoreData from HVTracker signals as of May 30, 2026. Rankings change daily. Methodology. Found an error? Open an issue.