AI Agent / Harness Frameworks — Deep Comparison (May 2026)

The agent-framework field went from "the Big Four" (LangChain, LlamaIndex, AutoGen, CrewAI) to ~15 credible options in twelve months. Four structural shifts define the current moment:

2. Quick-pick decision guide

Complex stateful production workflows (Python)

LangGraph

Durable checkpointing, time-travel debugging, first-class human-in-the-loop. Highest enterprise adoption.

Fastest prototype / multi-agent crew

CrewAI

~35 lines to a working crew; lowest boilerplate; built-in semantic memory.

TypeScript full-stack / edge

Mastra

Best TS DX, durable workflows, built-in memory, native MCP, deploys to Cloudflare Workers / Vercel.

TS product team, AI features + light agents

Vercel AI SDK 6

20M+ monthly downloads, unified provider API, best streaming UI, new Agent abstraction.

Type-safe Python API / data pipelines

PydanticAI

Validated structured outputs, dependency injection, clean FastAPI fit.

OpenAI-first, ship fast

OpenAI Agents SDK

Cleanest handoff model, first-party tracing, agent in <30 lines.

Claude-native / compliance-sensitive

Claude Agent SDK

API-first design that powers Claude Code; first-class MCP, permission modes, hooks, self-host.

Google Cloud / auditable pipelines

Google ADK

Explicit Sequential/Parallel/Loop graphs, A2A protocol, best out-of-box visual debugger, 200+ models.

AWS serverless

AWS Strands

Model-driven autonomy, Lambda deploy in seconds, native MCP, Bedrock AgentCore.

Microsoft / Azure / .NET shop

MS Agent Framework 1.0

Lowest-regret enterprise default; one supported product across .NET + Python, YAML agents.

Open-source-first, self-hosted models

smolagents

Code-as-action (~30% fewer steps), ~3K LOC core, HuggingFace ecosystem.

RAG / document-heavy agents

LlamaIndex

Unmatched data connectors and retrieval primitives.

3. Master comparison matrix

Click any column header to sort. Stars are approximate GitHub counts as of May 2026; "Prod" is a synthesized 1–10 production-readiness score triangulated across the cited comparisons.

Framework ⇅	Backer ⇅	Language ⇅	Control-flow model ⇅	Stars ⇅	Multi-agent ⇅	Built-in memory ⇅	MCP ⇅	Observability ⇅	Prod ⇅
LangGraph	LangChain	Python / JS	Stateful graph / state machine (cyclic)	~16k ★	Excellent	Checkpoint	Plugin	LangSmith (best)	9.4
OpenAI Agents SDK	OpenAI	Python / JS	Imperative handoff chain	~19k ★	Handoffs	Server state	Native	First-party Traces	9.1
CrewAI	CrewAI Inc.	Python	Role-based crews + Flows	~35k ★	Excellent	Semantic	Plugin	CrewAI+ (trails)	8.6
Mastra	Mastra (ex-Gatsby)	TypeScript	Agents + graph workflow (XState)	~23.8k ★	Native	Working+semantic	Native	Built-in evals	8.5
Vercel AI SDK 6	Vercel	TypeScript	Composable tool loop + Agent	20M+/mo dl	Compose	App concern	Native (v6)	OTel / Langfuse hooks	8.3
Google ADK	Google	Py / TS / Java / Go	Explicit Seq/Parallel/Loop tree	~18.7k ★	Hierarchical + A2A	Yes	Native	`adk web` debugger (best)	8.3
Claude Agent SDK	Anthropic	Python / TS	API-first + subagent spawning	n/a	Subagents	Context	First-class	Hooks / OTel	8.2
MS Agent Framework 1.0	Microsoft	.NET / Python	YAML-declarative agents	~12k ★	Native	Yes	Native + A2A	Azure / OTel	8.2
AutoGen / AG2	Microsoft → community	Python / .NET	Conversational multi-agent	~58k ★	Excellent	Moderate	Plugin	AutoGen Studio	8.0
Agno	Agno (ex-Phidata)	Python	High-perf multi-agent runtime	~29k ★	Yes	Sessions	Native	Built-in	8.0
Semantic Kernel	Microsoft	C# / Py / Java	Plugins + planners (→ MAF)	~27.9k ★	Native	Yes	Native	Azure	7.8
PydanticAI	Pydantic	Python	Code-first, typed, DI	~17.2k ★	Basic	BYO	Not yet	Logfire	7.8
AWS Strands	AWS	Python	Model-driven agent loop	~5.5k ★	Yes	Yes	Native	X-Ray / AgentCore (BYO)	7.7
LlamaIndex	LlamaIndex	Python / TS	Event-driven Workflows (RAG)	~49.5k ★	Basic	Yes	Plugin	Instrumentation	7.6
smolagents	Hugging Face	Python	Code-as-action (writes Python)	~27.4k ★	Basic	BYO	Not yet	Logging / OTel	7.2

4. Graph & stateful frameworks

LangGraph

· LangChain PythonJSmulti-agentOSS

Model: directed graph of nodes/edges over a typed shared state — cyclic, branching. State: durable checkpointers, time-travel replay. Adoption: the enterprise default (~34.5M monthly downloads of the LangChain/Graph stack). Latest: LangGraph Studio visual IDE shipped 2026.

Pros

Durable state — checkpointers persist across crashes; resume long-running runs.
Time-travel debugging: replay any execution from any node.
Human-in-the-loop is first-class (pause, inspect, edit state, resume).
LangSmith observability is the strongest trace/eval tooling in the industry.
Streaming, retries, parallelism baked in; highest ceiling for complex flows.

Cons

Steepest learning curve; you design the graph yourself.
Graph abstraction is heavy for simple, linear tasks.
TypeScript port lags the Python version.
LangSmith is a paid add-on; LangChain lineage adds surface area.

Ideal for: mission-critical, long-horizon, stateful workflows — insurance claims, legal discovery, support automation — anything needing partial-failure recovery and HITL approvals.

Google ADK

· Google PythonTSJava/GoOSS

Model: explicit orchestration via SequentialAgent / ParallelAgent / LoopAgent in a hierarchical tree; blackboard shared State. Models: Model Garden 200+ (Gemini-optimized but model-agnostic). Latest: Python v1.x stable; adk-js TS SDK; A2A v0.3 (gRPC).

Pros

Explicit, auditable graphs — point at the graph and tell an auditor exactly how a decision was made.
Best out-of-box debugging: adk web visual debugger.
Native A2A protocol — interop with LangGraph/CrewAI agents without bridging code.
200+ models via Model Garden; bidirectional audio/video multimodal streaming.
Multi-language (Python/TS/Java/Go); deploy to Cloud Run / GKE / Vertex.

Cons

Value concentrates inside GCP; weaker elsewhere.
Gemini-optimized — other models may need tool-call/streaming config.
No model-driven dynamic routing (by design).
"Google sunset" trust concern for some teams.

Ideal for: GCP/Vertex-native teams needing auditable, structured multi-agent pipelines and multimodal interfaces.

5. Role / crew & conversational

CrewAI

· CrewAI Inc. Pythonmulti-agentOSS

Model: role/goal/backstory "crews" with Process modes (sequential, hierarchical, consensual); Flows (mid-2025) add deterministic control. Memory: built-in semantic memory (rare). Latest: v1.14.5 (May 2026). One of the fastest-growing frameworks of 2025–26 (~52k stars).

Pros

Lowest boilerplate-to-functionality ratio — working crew in ~35 lines.
Crew metaphor reads like an org chart; intuitive for non-engineers.
Built-in semantic memory + multi-vendor model support (OpenAI/Anthropic/Ollama).
Flows give deterministic workflow control when you need it.
Large, active community; CrewAI+ adds managed deploy, auth, monitoring.

Cons

"Magical" — hard to pinpoint which agent went rogue at scale.
Multi-agent token overhead can be ~3× a single-agent approach.
State persistence less robust than LangGraph; mid-flow failure recovery weaker.
OSS observability requires custom logging.

Ideal for: prototypes, hackathons, MVPs, and SOP-driven automation (marketing / sales / ops) where time-to-first-working-agent wins.

AutoGen / AG2

· Microsoft → community fork Python.NETOSS

Model: conversational multi-agent (agents debate/plan/review). Status: Microsoft moved AutoGen (last release v0.7.5, Sep 2025, ~58k stars) to maintenance in favor of the Microsoft Agent Framework; the community AG2 fork drives the lineage forward.

Pros

Pioneered conversational multi-agent patterns; still excellent for research.
AutoGen Studio provides a no-code visual builder.
Strong at code-generation and self-correcting agent loops.
First-class .NET + Python.

Cons

Not recommended for net-new production — Microsoft is steering teams to MAF.
State management only moderate.
Future split between maintenance (AutoGen) and community (AG2).

Ideal for: research-grade multi-agent experiments and conversational/debate patterns — not new production code.

6. Lab / vendor SDKs

OpenAI Agents SDK

· OpenAI PythonJSvendor

Model: imperative loop with handoffs; Responses API gives server-side state + tool execution. The default for OpenAI-committed teams.

Pros

Smallest mental footprint — an agent in under 30 lines.
Handoffs make multi-agent design clean and readable.
First-class, type-safe guardrails.
First-party Traces UI in the OpenAI dashboard — near-zero setup.

Cons

Model lock-in — built for GPT; non-OpenAI models via a litellm shim feel bolted on.
Linear handoff model is less flexible than cyclic graphs (branching/checkpoints).
Self-hosting is limited; primarily cloud-API dependent.

Ideal for: OpenAI-first shops wanting the shortest, cleanest path from zero to a working, traced agent.

Claude Agent SDK

· Anthropic PythonTSvendor

Model: API-first design with subagent spawning — the foundation that powers Claude Code. Control: permission modes, allowedTools/disallowedTools, lifecycle hooks. Latest: TS v0.3.146 / Python v0.2.83 (May 2026).

Pros

Cleanest API-first design in the lab-SDK group.
First-class MCP — built-in, in-process MCP servers.
Permission modes + hooks give auditable, safety-first guardrails.
Runs locally / in-process / on private infra; Anthropic API, Bedrock, or Vertex.
Computer-use tools and extended-thinking visibility into agent decisions.

Cons

Claude-centric — biases you to Anthropic models.
Thinner built-in multi-agent orchestration story than crew/graph frameworks.
Younger ecosystem of third-party integrations.

Ideal for: Claude-native, compliance-sensitive, and coding-agent use cases needing fine-grained tool governance.

7. TypeScript & edge-native

Mastra

· Mastra (team behind Gatsby) TypeScriptmulti-agentOSS

Model: agents + a graph-based workflow engine (.then()/.branch()/.parallel(), XState under the hood). Stars: ~23.8k · Latest: @mastra/core 1.32.0 (May 2026). Adoption: Marsh McLennan (75k employees), SoftBank Satto Workspace. Built on the Vercel AI SDK.

Pros

Best-in-class TypeScript DX — types flow through tools, outputs, workflows.
Durable workflows with step state, retries, parallel branches, suspend/resume HITL.
Built-in working + semantic memory (one of the few that ship real memory).
Native MCP (consume and author servers); 40+ model providers.
Deploys to Cloudflare Workers / Vercel / Node / standalone; integrates with Next.js, React, CopilotKit.

Cons

Smaller community than Python-first frameworks.
Not a fit for Python or Java teams.
Less battle-tested than LangGraph/CrewAI for very complex enterprise multi-agent.

Ideal for: TypeScript full-stack teams shipping production agents on the edge — directly relevant to swarm.ing's Cloudflare Workers + Turnkey stack.

Vercel AI SDK 6

· Vercel TypeScriptOSS

Model: composable tool loop; v6 adds the Agent interface + ToolLoopAgent (default 20-step loop) and DurableAgent via Workflow DevKit. Reach: 20M+ monthly downloads; the leading TS AI toolkit.

Pros

Unified API across 40+ providers; Next.js/React/Svelte/Vue/Node.
Best typed streaming UI (steps, tool calls, UI messages) for agent front-ends.
Stable MCP (@ai-sdk/mcp) with OAuth, resources, prompts (v6).
Tool-execution approval for HITL; DevTools for debugging.
Rewritten LangChain/LangGraph adapter for interop.

Cons

No graph semantics — you compose control flow yourself.
No native eval suite; wire your own harness.
Persistence is an app concern (use DurableAgent / Workflow DevKit).

Ideal for: TS product teams adding AI features and lightweight agents without needing graph orchestration.

8. Enterprise & cloud-native

Microsoft Agent Framework 1.0

· Microsoft .NETPythonOSS

Model: YAML-declarative agents; merges Semantic Kernel + AutoGen into one SDK (GA Apr 3, 2026). Native MCP + A2A.

Pros

One supported Microsoft agent product instead of two competing ones.
.NET + Python parity; YAML declarations lower the entry bar.
Native MCP and A2A protocols; Azure-native deployment + identity.
"Lowest-regret default" for enterprises without a strong cloud preference.

Cons

Brand new — consolidation churn and migration paths from SK/AutoGen.
Value concentrates in the Azure ecosystem.
Smaller community than the LangGraph/CrewAI incumbents.

Ideal for: Microsoft-stack / Azure shops with a mixed .NET + Python codebase.

Semantic Kernel

· Microsoft C#/Py/JavaOSS

Model: plugins + planners across C#/Python/Java. Status: folding into MAF 1.0 — still the enterprise .NET/Java answer today, with native Azure AD and MCP.

Pros

Only serious choice for C#/.NET shops; strong Java support.
Native Azure AD identity and Azure deployment paths.
Native MCP; mature enterprise governance.

Cons

Being merged into the Microsoft Agent Framework — plan migration.
Heavier abstractions; steeper than crew frameworks.
Less momentum for net-new projects vs. MAF.

Ideal for: existing enterprise .NET/Java/Azure systems (with a path toward MAF).

AWS Strands Agents

· AWS PythonOSS

Model: model-driven — the LLM decides orchestration dynamically via a simple, customizable agent loop. Stars: ~5.5k · Latest: v1.34.1 (Apr 2026). Model-agnostic (Bedrock, Anthropic, Gemini, LiteLLM, Ollama, OpenAI…).

Pros

Fastest path to serverless — deploy on Lambda in seconds (~5s cold start).
Model-driven autonomy with a tiny, fully customizable loop.
Native MCP (per-invocation on Lambda); 13k+ community MCP servers.
Bidirectional voice/audio streaming; Bedrock AgentCore for production.

Cons

No out-of-box visual debugger — build observability yourself (X-Ray/CloudWatch/OTel).
Model-driven orchestration is less auditable than explicit graphs.
Value concentrates in the AWS ecosystem; youngest of the cloud SDKs.

Ideal for: AWS-native teams wanting fast serverless agents with minimal orchestration ceremony.

9. Lightweight & specialist

PydanticAI

· Pydantic PythonOSS

Model: code-first; agents are ordinary Python objects with dependency injection. Treats validated, typed output as the contract. Observability via Logfire.

Pros

Type safety + validated structured outputs — agents conform to a strict schema.
Dependency injection makes swapping providers and mocking for tests trivial.
Clean API; "no abstractions for the sake of abstractions."
Excellent fit with FastAPI / the existing Pydantic ecosystem.

Cons

Younger, smaller community than LangGraph/CrewAI.
Multi-agent coordination is basic / less developed.
No native MCP yet; not for highly dynamic agent topologies.

Ideal for: production Python APIs, ETL/data-pipeline agents, and SaaS backends where output must satisfy a schema.

smolagents

· Hugging Face PythonOSS

Model: code-as-action — the agent writes Python that the runtime executes in a sandbox, instead of emitting JSON tool calls. ~27k stars, ~3k lines of core code. Latest: v1.25.0 (May 2026).

Pros

Code-acting completes multi-tool tasks in ~30% fewer steps (composition beats JSON dispatch).
Tiny core — readable in an afternoon; minimal magic.
Open-source-first; pairs naturally with self-hosted models + HF ecosystem.

Cons

Bigger blast radius — needs a sandbox (E2B / SmolVM) for safe code execution.
Basic multi-agent; no built-in memory; no native MCP yet.
Lower production-readiness than the heavyweight frameworks.

Ideal for: open-source-first teams running their own GPUs where code-as-action fits the task.

Agno

· Agno (formerly Phidata) Pythonmulti-agentOSS

Model: high-performance, full-stack multi-agent runtime with sessions, multimodal inputs, native MCP, and built-in observability. ~29k stars.

Pros

Strong on speed/latency and multimodal inputs.
Built-in sessions/memory and observability.
Native MCP; full-stack multi-agent out of the box.

Cons

Less enterprise mindshare / fewer reference deployments than LangGraph.
Smaller third-party integration ecosystem.
Rebrand from Phidata means some stale docs/links.

Ideal for: latency-sensitive, multimodal multi-agent applications.

LlamaIndex (Workflows)

· LlamaIndex PythonTSOSS

Model: event-driven Workflows layered on the best data/retrieval stack in the field. ~40k stars.

Pros

Unmatched data connectors and RAG/retrieval primitives.
Event-driven Workflows for data-centric agent pipelines.
Mature ecosystem; Python + TS.

Cons

Multi-agent orchestration is basic vs. LangGraph/CrewAI.
Less suited to general (non-RAG) agent control flow.
MCP via plugins rather than first-class.

Ideal for: document- and RAG-heavy agents where retrieval quality is the core problem.

10. Verdicts by scenario

(a) TypeScript edge / serverless apps → Mastra for full agent systems (durable workflows, memory, MCP, Cloudflare Workers); Vercel AI SDK 6 if you mostly need AI features + a light tool loop.

(b) Python data / RAG apps → LlamaIndex for retrieval-heavy work; PydanticAI when typed, validated outputs into downstream systems matter most.

(c) Complex stateful multi-agent workflows → LangGraph, decisively — durable checkpointing, time-travel debugging, and HITL are unmatched. Google ADK if you need auditable explicit graphs on GCP.

(d) Quick prototypes → CrewAI (Python, ~35 lines to a crew) or the matching vendor SDK (OpenAI / Claude / ADK) if you're already standardized on one model provider.

The recurring pattern teams report: prototype on CrewAI, migrate to LangGraph the moment they need stateful checkpointing or partial-failure recovery — and on the TS side, start on Vercel AI SDK, graduate to Mastra when they need durable workflows and memory. Keep tools behind MCP and memory behind a thin interface so the harness stays swappable.

Where the two research passes disagree. The verdicts above weight the Exa-sourced 2026 comparison consensus. The Parallel.ai deep-research pass agreed on (b) and (c) — LlamaIndex for Python RAG and LangGraph for complex stateful workflows — but weighted two scenarios differently:

(a) TS edge: Parallel ranks Vercel AI SDK #1 (first-class Next.js + edge streaming) with Mastra as runner-up; the Exa consensus puts Mastra first for full agent systems. Net: Vercel for AI-features-plus-light-agents, Mastra when you need durable graph workflows + memory.
(d) Quick prototypes: Parallel ranks the OpenAI Agents SDK #1 (minimal boilerplate, TS + Python, voice) over CrewAI. Net: OpenAI SDK for a fast single/simple agent on one provider; CrewAI when the prototype is specifically a multi-agent crew.

11. Methodology & sources

Compiled May 21, 2026 by combining Exa semantic web search (current 2026 comparison articles, GitHub repo metadata for stars/versions) with a Parallel.ai deep-research run on the pro processor — a structured task spec covering control-flow model, memory, MCP, observability, deployment, licensing, and per-framework pros/cons, returning a report backed by 195 cited sources. The two passes were cross-validated; where their verdicts diverged it is flagged in §10. Star counts and version numbers are point-in-time (May 2026) and drift quickly; production scores are synthesized estimates triangulated across the sources below, not a single benchmark.

1. The 2026 landscape — what changed

2. Quick-pick decision guide

3. Master comparison matrix

4. Graph & stateful frameworks

LangGraph

Pros

Cons

Google ADK

Pros

Cons

5. Role / crew & conversational

CrewAI

Pros

Cons

AutoGen / AG2

Pros

Cons

6. Lab / vendor SDKs

OpenAI Agents SDK

Pros

Cons

Claude Agent SDK

Pros

Cons

7. TypeScript & edge-native

Mastra

Pros

Cons

Vercel AI SDK 6

Pros

Cons

8. Enterprise & cloud-native

Microsoft Agent Framework 1.0

Pros

Cons

Semantic Kernel

Pros

Cons

AWS Strands Agents

Pros

Cons

9. Lightweight & specialist

PydanticAI

Pros

Cons

smolagents

Pros

Cons

Agno

Pros

Cons

LlamaIndex (Workflows)

Pros

Cons

10. Verdicts by scenario

11. Methodology & sources