LangChain vs LangGraph vs CrewAI vs AutoGPT — Ranked by Trust, Not Hype

May 30, 2026 · 7 min read · HVTracker Research

Every "best AI agent framework" post ranks by stars, features, or vibes. Nobody ranks by whether you can actually verify the thing you're installing.

We do. HVTracker ranks 172 open-source AI agents — including every major framework — by supply-chain trust signals: provenance, OSSF Scorecard, signed commits, maintenance, and evidence coverage. Here's the framework tier list nobody asked for but everyone needs.

The full rankings

Framework	Stars	Trust Rank	Score	Provenance	Scorecard
LangGraph	33k	#1	91.9	Verified	6.8
PydanticAI	17k	#2	90.6	Verified	5.8
OpenAI Agents SDK	26k	#4	90.2	Verified	4.2
AutoGen	58k	#22	78.1	Verified	5.9
CrewAI	52k	#23	77.9	None	8.0
LangChain	138k	#24	74.9	None	6.8
AutoGPT	184k	#39	69.2	None	5.7
CopilotKit	31k	#45	68.5	Verified	N/A
Agno	40k	#48	68.4	None	4.7
LlamaIndex	49k	#126	42.8	None	N/A
smolagents	27k	#138	38.0	None	N/A
OpenManus	56k	#167	22.6	None	N/A

The tier breakdown

Tier 1: actually verifiable (rank < 25)

LangGraph, PydanticAI, OpenAI Agents SDK, AutoGen, CrewAI, LangChain

These projects have enough independent signals — provenance, Scorecard, signed commits, active maintenance — to give an outsider confidence in the supply chain. The top 3 all publish build provenance. CrewAI doesn't publish provenance but compensates with the highest Scorecard in the category (8.0/10).

Tier 2: popular but incomplete (rank 25–80)

AutoGPT, CopilotKit, Agno, Composio

Signal coverage is partial. Some have provenance but no Scorecard. Others have Scorecard but no provenance. Trust scores in the high 60s — decent but with verifiable gaps. AutoGPT, the most-starred framework at 184k stars, ranks #39 because stars are capped and weighted last in HVTrust.

Tier 3: thin evidence (rank > 100)

LlamaIndex, smolagents, OpenManus

These are active, widely-used projects — but their public trust evidence is thin. No provenance, no Scorecard, low signed-commit ratios or missing data. OpenManus has 56k stars and ranks #167 of 172. LlamaIndex — one of the most widely adopted RAG frameworks — ranks #126.

This is not a quality judgment. These projects may have excellent internal security practices. But from the outside, with only public signals to work with, the verifiable evidence is not there.

What the data says about "best framework" posts

The internet is full of "LangChain vs LlamaIndex" comparison posts. Almost all of them compare features, developer experience, and LLM provider support. None of them ask:

Can I verify the package I'm installing matches the source code?
Does the project use branch protection and code review?
Are the commits signed?
Is there a security disclosure policy?

If you're evaluating a framework that will execute LLM-generated code in your infrastructure, these questions matter more than whether it supports streaming or has a nicer API for tool calling.

The gap is closeable. Every project in Tier 3 could move to Tier 1 by enabling provenance, adding a SECURITY.md, and running the Scorecard GitHub Action. These are not hard — they're just not prioritized. Yet.

Why PydanticAI at #2 matters

PydanticAI has 17k stars — a fraction of LangChain's 138k or AutoGPT's 184k. But it ranks #2 globally because it nails the signals that are hardest to fake: build provenance is published, commits are signed, the Scorecard is public, and evidence coverage is broad enough for a Grade B.

This is the confidence multiplier in action. PydanticAI doesn't have the most impressive number for any single signal. It has consistent, verifiable evidence across all of them. That's what trust looks like when you measure it instead of counting stars.

Dig into the data yourself

Every score, signal, and check on this page is independently verifiable from the links on each agent's profile.

All agent frameworks How we score

Data from HVTracker signals as of May 30, 2026. Rankings change daily. Methodology. Found an error? Open an issue.