The agent-framework field went from "the Big Four" (LangChain, LlamaIndex, AutoGen, CrewAI) to ~15 credible options in twelve months. Four structural shifts define the current moment:
Every lab now ships its own SDK. OpenAI Agents SDK (Mar 2026), Anthropic Claude Agent SDK (with Claude 4.6), and Google ADK (GA Apr 2025, hardened through 2026) are lightweight, deeply integrated, and erode the case for heavy third-party frameworks on single-provider stacks.
Microsoft consolidated.Microsoft Agent Framework 1.0 (Apr 3, 2026) merges Semantic Kernel + AutoGen into one .NET/Python SDK. AutoGen is in maintenance; the community AG2 fork carries the research lineage.
MCP went native. Model Context Protocol is now the default tool-integration standard. Native MCP: OpenAI Agents SDK, Claude Agent SDK, Semantic Kernel/MAF, Agno, Mastra, ADK, Strands, Vercel AI SDK 6. Still community/plugin on LangGraph, CrewAI, LlamaIndex.
Built-in semantic memory is the remaining gap. Only a handful ship genuine cross-session memory out of the box — CrewAI, Mastra, Agno, Google ADK. Everyone else gives you checkpointing (state snapshots) or expects BYO vector store (Mem0 / Letta / Zep).
The meta-lesson from every 2026 comparison: framework choice matters less than your evaluation, observability, and integration maturity. Pick the one that matches your language and the shape of your control flow (linear vs. graph vs. crew), then keep it replaceable — tools behind MCP, memory behind a thin interface, evals in a vendor-neutral platform.
Model-driven autonomy, Lambda deploy in seconds, native MCP, Bedrock AgentCore.
Microsoft / Azure / .NET shop
MS Agent Framework 1.0
Lowest-regret enterprise default; one supported product across .NET + Python, YAML agents.
Open-source-first, self-hosted models
smolagents
Code-as-action (~30% fewer steps), ~3K LOC core, HuggingFace ecosystem.
RAG / document-heavy agents
LlamaIndex
Unmatched data connectors and retrieval primitives.
3. Master comparison matrix
Click any column header to sort. Stars are approximate GitHub counts as of May 2026; "Prod" is a synthesized 1–10 production-readiness score triangulated across the cited comparisons.
Framework ⇅
Backer ⇅
Language ⇅
Control-flow model ⇅
Stars ⇅
Multi-agent ⇅
Built-in memory ⇅
MCP ⇅
Observability ⇅
Prod ⇅
LangGraph
LangChain
Python / JS
Stateful graph / state machine (cyclic)
~16k ★
Excellent
Checkpoint
Plugin
LangSmith (best)
9.4
OpenAI Agents SDK
OpenAI
Python / JS
Imperative handoff chain
~19k ★
Handoffs
Server state
Native
First-party Traces
9.1
CrewAI
CrewAI Inc.
Python
Role-based crews + Flows
~35k ★
Excellent
Semantic
Plugin
CrewAI+ (trails)
8.6
Mastra
Mastra (ex-Gatsby)
TypeScript
Agents + graph workflow (XState)
~23.8k ★
Native
Working+semantic
Native
Built-in evals
8.5
Vercel AI SDK 6
Vercel
TypeScript
Composable tool loop + Agent
20M+/mo dl
Compose
App concern
Native (v6)
OTel / Langfuse hooks
8.3
Google ADK
Google
Py / TS / Java / Go
Explicit Seq/Parallel/Loop tree
~18.7k ★
Hierarchical + A2A
Yes
Native
adk web debugger (best)
8.3
Claude Agent SDK
Anthropic
Python / TS
API-first + subagent spawning
n/a
Subagents
Context
First-class
Hooks / OTel
8.2
MS Agent Framework 1.0
Microsoft
.NET / Python
YAML-declarative agents
~12k ★
Native
Yes
Native + A2A
Azure / OTel
8.2
AutoGen / AG2
Microsoft → community
Python / .NET
Conversational multi-agent
~58k ★
Excellent
Moderate
Plugin
AutoGen Studio
8.0
Agno
Agno (ex-Phidata)
Python
High-perf multi-agent runtime
~29k ★
Yes
Sessions
Native
Built-in
8.0
Semantic Kernel
Microsoft
C# / Py / Java
Plugins + planners (→ MAF)
~27.9k ★
Native
Yes
Native
Azure
7.8
PydanticAI
Pydantic
Python
Code-first, typed, DI
~17.2k ★
Basic
BYO
Not yet
Logfire
7.8
AWS Strands
AWS
Python
Model-driven agent loop
~5.5k ★
Yes
Yes
Native
X-Ray / AgentCore (BYO)
7.7
LlamaIndex
LlamaIndex
Python / TS
Event-driven Workflows (RAG)
~49.5k ★
Basic
Yes
Plugin
Instrumentation
7.6
smolagents
Hugging Face
Python
Code-as-action (writes Python)
~27.4k ★
Basic
BYO
Not yet
Logging / OTel
7.2
4. Graph & stateful frameworks
LangGraph
· LangChainPythonJSmulti-agentOSS
Model: directed graph of nodes/edges over a typed shared state — cyclic, branching. State: durable checkpointers, time-travel replay. Adoption: the enterprise default (~34.5M monthly downloads of the LangChain/Graph stack). Latest: LangGraph Studio visual IDE shipped 2026.
Pros
Durable state — checkpointers persist across crashes; resume long-running runs.
Time-travel debugging: replay any execution from any node.
Human-in-the-loop is first-class (pause, inspect, edit state, resume).
LangSmith observability is the strongest trace/eval tooling in the industry.
Streaming, retries, parallelism baked in; highest ceiling for complex flows.
Cons
Steepest learning curve; you design the graph yourself.
Graph abstraction is heavy for simple, linear tasks.
TypeScript port lags the Python version.
LangSmith is a paid add-on; LangChain lineage adds surface area.
Model: explicit orchestration via SequentialAgent / ParallelAgent / LoopAgent in a hierarchical tree; blackboard shared State. Models: Model Garden 200+ (Gemini-optimized but model-agnostic). Latest: Python v1.x stable; adk-js TS SDK; A2A v0.3 (gRPC).
Pros
Explicit, auditable graphs — point at the graph and tell an auditor exactly how a decision was made.
Best out-of-box debugging: adk web visual debugger.
Native A2A protocol — interop with LangGraph/CrewAI agents without bridging code.
200+ models via Model Garden; bidirectional audio/video multimodal streaming.
Multi-language (Python/TS/Java/Go); deploy to Cloud Run / GKE / Vertex.
Cons
Value concentrates inside GCP; weaker elsewhere.
Gemini-optimized — other models may need tool-call/streaming config.
No model-driven dynamic routing (by design).
"Google sunset" trust concern for some teams.
Ideal for: GCP/Vertex-native teams needing auditable, structured multi-agent pipelines and multimodal interfaces.
5. Role / crew & conversational
CrewAI
· CrewAI Inc.Pythonmulti-agentOSS
Model: role/goal/backstory "crews" with Process modes (sequential, hierarchical, consensual); Flows (mid-2025) add deterministic control. Memory: built-in semantic memory (rare). Latest: v1.14.5 (May 2026). One of the fastest-growing frameworks of 2025–26 (~52k stars).
Pros
Lowest boilerplate-to-functionality ratio — working crew in ~35 lines.
Crew metaphor reads like an org chart; intuitive for non-engineers.
Built-in semantic memory + multi-vendor model support (OpenAI/Anthropic/Ollama).
Flows give deterministic workflow control when you need it.
Large, active community; CrewAI+ adds managed deploy, auth, monitoring.
Cons
"Magical" — hard to pinpoint which agent went rogue at scale.
Multi-agent token overhead can be ~3× a single-agent approach.
State persistence less robust than LangGraph; mid-flow failure recovery weaker.
OSS observability requires custom logging.
Ideal for: prototypes, hackathons, MVPs, and SOP-driven automation (marketing / sales / ops) where time-to-first-working-agent wins.
AutoGen / AG2
· Microsoft → community forkPython.NETOSS
Model: conversational multi-agent (agents debate/plan/review). Status: Microsoft moved AutoGen (last release v0.7.5, Sep 2025, ~58k stars) to maintenance in favor of the Microsoft Agent Framework; the community AG2 fork drives the lineage forward.
Pros
Pioneered conversational multi-agent patterns; still excellent for research.
AutoGen Studio provides a no-code visual builder.
Strong at code-generation and self-correcting agent loops.
First-class .NET + Python.
Cons
Not recommended for net-new production — Microsoft is steering teams to MAF.
State management only moderate.
Future split between maintenance (AutoGen) and community (AG2).
Ideal for: research-grade multi-agent experiments and conversational/debate patterns — not new production code.
6. Lab / vendor SDKs
OpenAI Agents SDK
· OpenAIPythonJSvendor
Model: imperative loop with handoffs; Responses API gives server-side state + tool execution. The default for OpenAI-committed teams.
Pros
Smallest mental footprint — an agent in under 30 lines.
Handoffs make multi-agent design clean and readable.
First-class, type-safe guardrails.
First-party Traces UI in the OpenAI dashboard — near-zero setup.
Cons
Model lock-in — built for GPT; non-OpenAI models via a litellm shim feel bolted on.
Linear handoff model is less flexible than cyclic graphs (branching/checkpoints).
Self-hosting is limited; primarily cloud-API dependent.
Ideal for: OpenAI-first shops wanting the shortest, cleanest path from zero to a working, traced agent.
Claude Agent SDK
· AnthropicPythonTSvendor
Model: API-first design with subagent spawning — the foundation that powers Claude Code. Control: permission modes, allowedTools/disallowedTools, lifecycle hooks. Latest: TS v0.3.146 / Python v0.2.83 (May 2026).
Model: agents + a graph-based workflow engine (.then()/.branch()/.parallel(), XState under the hood). Stars: ~23.8k · Latest:@mastra/core 1.32.0 (May 2026). Adoption: Marsh McLennan (75k employees), SoftBank Satto Workspace. Built on the Vercel AI SDK.
Pros
Best-in-class TypeScript DX — types flow through tools, outputs, workflows.
Durable workflows with step state, retries, parallel branches, suspend/resume HITL.
Built-in working + semantic memory (one of the few that ship real memory).
Native MCP (consume and author servers); 40+ model providers.
Deploys to Cloudflare Workers / Vercel / Node / standalone; integrates with Next.js, React, CopilotKit.
Cons
Smaller community than Python-first frameworks.
Not a fit for Python or Java teams.
Less battle-tested than LangGraph/CrewAI for very complex enterprise multi-agent.
Ideal for: TypeScript full-stack teams shipping production agents on the edge — directly relevant to swarm.ing's Cloudflare Workers + Turnkey stack.
Vercel AI SDK 6
· VercelTypeScriptOSS
Model: composable tool loop; v6 adds the Agent interface + ToolLoopAgent (default 20-step loop) and DurableAgent via Workflow DevKit. Reach: 20M+ monthly downloads; the leading TS AI toolkit.
Pros
Unified API across 40+ providers; Next.js/React/Svelte/Vue/Node.
Best typed streaming UI (steps, tool calls, UI messages) for agent front-ends.
Stable MCP (@ai-sdk/mcp) with OAuth, resources, prompts (v6).
Tool-execution approval for HITL; DevTools for debugging.
Rewritten LangChain/LangGraph adapter for interop.
Cons
No graph semantics — you compose control flow yourself.
No native eval suite; wire your own harness.
Persistence is an app concern (use DurableAgent / Workflow DevKit).
Ideal for: TS product teams adding AI features and lightweight agents without needing graph orchestration.
8. Enterprise & cloud-native
Microsoft Agent Framework 1.0
· Microsoft.NETPythonOSS
Model: YAML-declarative agents; merges Semantic Kernel + AutoGen into one SDK (GA Apr 3, 2026). Native MCP + A2A.
Pros
One supported Microsoft agent product instead of two competing ones.
.NET + Python parity; YAML declarations lower the entry bar.
Native MCP and A2A protocols; Azure-native deployment + identity.
"Lowest-regret default" for enterprises without a strong cloud preference.
Cons
Brand new — consolidation churn and migration paths from SK/AutoGen.
Value concentrates in the Azure ecosystem.
Smaller community than the LangGraph/CrewAI incumbents.
Ideal for: Microsoft-stack / Azure shops with a mixed .NET + Python codebase.
Semantic Kernel
· MicrosoftC#/Py/JavaOSS
Model: plugins + planners across C#/Python/Java. Status: folding into MAF 1.0 — still the enterprise .NET/Java answer today, with native Azure AD and MCP.
Pros
Only serious choice for C#/.NET shops; strong Java support.
Native Azure AD identity and Azure deployment paths.
Native MCP; mature enterprise governance.
Cons
Being merged into the Microsoft Agent Framework — plan migration.
Heavier abstractions; steeper than crew frameworks.
Less momentum for net-new projects vs. MAF.
Ideal for: existing enterprise .NET/Java/Azure systems (with a path toward MAF).
AWS Strands Agents
· AWSPythonOSS
Model: model-driven — the LLM decides orchestration dynamically via a simple, customizable agent loop. Stars: ~5.5k · Latest: v1.34.1 (Apr 2026). Model-agnostic (Bedrock, Anthropic, Gemini, LiteLLM, Ollama, OpenAI…).
Pros
Fastest path to serverless — deploy on Lambda in seconds (~5s cold start).
Model-driven autonomy with a tiny, fully customizable loop.
Native MCP (per-invocation on Lambda); 13k+ community MCP servers.
Bidirectional voice/audio streaming; Bedrock AgentCore for production.
Cons
No out-of-box visual debugger — build observability yourself (X-Ray/CloudWatch/OTel).
Model-driven orchestration is less auditable than explicit graphs.
Value concentrates in the AWS ecosystem; youngest of the cloud SDKs.
Ideal for: AWS-native teams wanting fast serverless agents with minimal orchestration ceremony.
9. Lightweight & specialist
PydanticAI
· PydanticPythonOSS
Model: code-first; agents are ordinary Python objects with dependency injection. Treats validated, typed output as the contract. Observability via Logfire.
Pros
Type safety + validated structured outputs — agents conform to a strict schema.
Dependency injection makes swapping providers and mocking for tests trivial.
Clean API; "no abstractions for the sake of abstractions."
Excellent fit with FastAPI / the existing Pydantic ecosystem.
Cons
Younger, smaller community than LangGraph/CrewAI.
Multi-agent coordination is basic / less developed.
No native MCP yet; not for highly dynamic agent topologies.
Ideal for: production Python APIs, ETL/data-pipeline agents, and SaaS backends where output must satisfy a schema.
smolagents
· Hugging FacePythonOSS
Model:code-as-action — the agent writes Python that the runtime executes in a sandbox, instead of emitting JSON tool calls. ~27k stars, ~3k lines of core code. Latest: v1.25.0 (May 2026).
Model: event-driven Workflows layered on the best data/retrieval stack in the field. ~40k stars.
Pros
Unmatched data connectors and RAG/retrieval primitives.
Event-driven Workflows for data-centric agent pipelines.
Mature ecosystem; Python + TS.
Cons
Multi-agent orchestration is basic vs. LangGraph/CrewAI.
Less suited to general (non-RAG) agent control flow.
MCP via plugins rather than first-class.
Ideal for: document- and RAG-heavy agents where retrieval quality is the core problem.
10. Verdicts by scenario
(a) TypeScript edge / serverless apps →Mastra for full agent systems (durable workflows, memory, MCP, Cloudflare Workers); Vercel AI SDK 6 if you mostly need AI features + a light tool loop.
(b) Python data / RAG apps →LlamaIndex for retrieval-heavy work; PydanticAI when typed, validated outputs into downstream systems matter most.
(c) Complex stateful multi-agent workflows →LangGraph, decisively — durable checkpointing, time-travel debugging, and HITL are unmatched. Google ADK if you need auditable explicit graphs on GCP.
(d) Quick prototypes →CrewAI (Python, ~35 lines to a crew) or the matching vendor SDK (OpenAI / Claude / ADK) if you're already standardized on one model provider.
The recurring pattern teams report: prototype on CrewAI, migrate to LangGraph the moment they need stateful checkpointing or partial-failure recovery — and on the TS side, start on Vercel AI SDK, graduate to Mastra when they need durable workflows and memory. Keep tools behind MCP and memory behind a thin interface so the harness stays swappable.
Where the two research passes disagree. The verdicts above weight the Exa-sourced 2026 comparison consensus. The Parallel.ai deep-research pass agreed on (b) and (c) — LlamaIndex for Python RAG and LangGraph for complex stateful workflows — but weighted two scenarios differently:
(a) TS edge: Parallel ranks Vercel AI SDK #1 (first-class Next.js + edge streaming) with Mastra as runner-up; the Exa consensus puts Mastra first for full agent systems. Net: Vercel for AI-features-plus-light-agents, Mastra when you need durable graph workflows + memory.
(d) Quick prototypes: Parallel ranks the OpenAI Agents SDK #1 (minimal boilerplate, TS + Python, voice) over CrewAI. Net: OpenAI SDK for a fast single/simple agent on one provider; CrewAI when the prototype is specifically a multi-agent crew.
11. Methodology & sources
Compiled May 21, 2026 by combining Exa semantic web search (current 2026 comparison articles, GitHub repo metadata for stars/versions) with a Parallel.ai deep-research run on the pro processor — a structured task spec covering control-flow model, memory, MCP, observability, deployment, licensing, and per-framework pros/cons, returning a report backed by 195 cited sources. The two passes were cross-validated; where their verdicts diverged it is flagged in §10. Star counts and version numbers are point-in-time (May 2026) and drift quickly; production scores are synthesized estimates triangulated across the sources below, not a single benchmark.