claude-flow Research

Comprehensive investigation of ruvnet/claude-flow (branded "Ruflo v3") and its relevance to Legio.

What claude-flow Is

An open-source TypeScript project (v3.1.0-alpha.39, 422MB repo, 797 TS source files in v3/) that bills itself as "the leading agent orchestration platform for Claude." It operates as an MCP server that Claude Code connects to, providing 87+ tools for swarm coordination, memory management, task orchestration, and background workers.

Key distinction: claude-flow orchestrates Claude Code instances (the Anthropic CLI product). Legio orchestrates Claude Agent SDK clients (the programmatic Python SDK). These are fundamentally different runtime models.

Architecture

Claude Code (IDE/CLI)
  ↓ MCP protocol
claude-flow MCP server (TypeScript, Node.js 20+)
  ├─ 26 CLI commands
  ├─ 87+ MCP tools (agent, memory, swarm, coordination, etc.)
  ├─ Headless Worker Executor (spawns claude code processes)
  ├─ Memory: SQLite + HNSW vector search
  ├─ RuVector intelligence layer (WASM, RL routing)
  └─ Agent/Task/Swarm state in .claude-flow/ JSON files

Technology Stack

Component	Technology
Language	TypeScript (primary)
Runtime	Node.js 20+, WASM
Package structure	Monorepo: `v3/@claude-flow/{cli,shared,claims,guidance,browser,aidefence,...}`
Memory	SQLite (sql.js WASM) + HNSW for vector search
Dependencies	`zod`, `semver` (core); optional: `agentdb`, `@ruvector/*`, `agentic-flow`
Agent execution	Spawns `claude` CLI processes in headless mode
Integration	MCP server exposing tools to Claude Code

How Agents Actually Work

After deep investigation, the actual agent execution mechanism is:

Agent state is tracked in .claude-flow/agents/store.json — a JSON file on disk
Agent "spawning" creates a record in this JSON store with metadata (type, model, status)
Actual work is done by either:
- Claude Code's built-in Task tool (Claude Code spawns subagents)
- Headless Worker Executor: literally spawn('claude', [...args]) — forks a claude CLI process
Coordination state (topology, consensus, load balancing) is stored in .claude-flow/coordination/store.json — another local JSON file

Critical finding from coordination-tools.ts:

⚠️ IMPORTANT: These tools provide LOCAL STATE MANAGEMENT.
- Topology/consensus state is tracked locally
- No actual distributed coordination
- Useful for single-machine workflow orchestration

The "Byzantine Fault Tolerance", "Raft consensus", "mesh topology" etc. are metadata labels in JSON, not actual distributed consensus implementations. The "swarm" is conceptual state tracking, not runtime coordination.

Detailed Module Analysis

MCP Tools (26 categories)

Tool Group	Files	Purpose	Actual Implementation
agent-tools	1	Agent CRUD, model routing	JSON store read/write
memory-tools	1	Store/retrieve/search	SQLite + optional HNSW
coordination-tools	1	Topology, consensus, sync	JSON store (local state only)
swarm-tools	1	Swarm init, status, control	JSON store + spawn commands
task-tools	1	Task queue, priority	In-memory priority queue
session-tools	1	Session management	JSON store
workflow-tools	1	Multi-step workflows	JSON templates
hive-mind-tools	1	Queen/worker roles	JSON store + labels
neural-tools	1	RL routing, training	Optional @ruvector
security-tools	1	Input validation, CVE	Validation + scanning
github-tools	1	PR, issues, releases	gh CLI wrappers
browser-tools	1	Headless browsing	Playwright adapter
embeddings-tools	1	Vector embeddings	ONNX MiniLM
claims-tools	1	Work claiming/stealing	DDD with event sourcing

Memory System

Production backend: SQLite via sql.js (WASM) with WAL mode
Vector search: HNSW (Hierarchical Navigable Small Worlds) for semantic similarity
Agent memory banks: Per-agent isolated storage with shared global memory
Pattern storage (ReasoningBank): Stores task→outcome patterns for learning
Legacy fallback: JSON file-based storage with auto-migration

Headless Worker Executor

The most concrete implementation — spawns Claude Code CLI processes:

Process pool with configurable concurrency
Context building from file globs
Prompt template system with context injection
Output parsing (text, JSON, markdown)
Timeout handling and graceful termination
8 headless types: audit, optimize, testgaps, document, ultralearn, refactor, deepdive, predict
4 local types: map, consolidate, benchmark, preload

Skills and Agents (Prompt Libraries)

The .agents/skills/ directory contains 120+ SKILL.md files — these are prompt templates, not code. They define agent personas, capabilities, and workflows via markdown instructions that Claude Code follows. This is consistent with claude-flow's design: the intelligence is in prompts, not in code.

Comparison with Legio

Dimension	claude-flow	Legio
Language	TypeScript	Python
Agent runtime	Claude Code CLI (headless processes)	Claude Agent SDK (programmatic API)
LLM integration	MCP server → Claude Code reads tools	SDK client → direct API calls
Interface	Claude Code IDE / CLI	Telegram bot
Agent definition	.md skill files + JSON config	castra/ directory with prompt.md
Memory	SQLite + HNSW vectors	Memoria (filesystem: edicta, acta, commentarii)
Session state	JSON files in .claude-flow/	Praetorium (SQLite via aiosqlite)
Orchestration	Metadata tracking + spawned processes	Legatus (SDK client + router)
Auth	None (trusts Claude Code)	TOTP-gated destructive ops
Target user	Developer using Claude Code IDE	Caesar via Telegram
Scope	Software engineering workflows	General-purpose AI assistant team
Maturity	v3 alpha (39 releases), 797 TS files	v3, ~15 Python modules, 100% test coverage

What Could Enrich Legio

Worth Adopting (Ideas, not code)

1. Vector Memory / Semantic Search

claude-flow's HNSW-based memory enables "find similar past interactions" instead of just key-value lookup. Legio's Memoria is purely filesystem-based (edicta, acta, commentarii). Adding vector embeddings to the praetorium would allow:

"Find conversations similar to this one" for context
Semantic search across acta (shared knowledge)
Pattern matching: "last time Caesar asked about X, the best centurio was Y"

Implementation path: Add optional sentence-transformers embeddings to praetorium nuntii. Store vectors in SQLite (we already use aiosqlite). Use a pure-Python HNSW library like hnswlib for search. This stays within our Python stack.

2. Pattern/Outcome Learning (ReasoningBank concept)

claude-flow stores task→outcome→reward patterns to improve routing over time. Legio could:

Track which centurio handles which tasks well (success/failure feedback)
Store Caesar's corrections as negative patterns
Use patterns to suggest centurio routing: "Based on history, @vorenus handles code reviews best"

Implementation path: New patternum table in praetorium. Store (task_hash, centurio, outcome, feedback). Inject patterns into legatus system prompt. Pure prompt-over-code — no RL needed.

3. Background Workers / Scheduled Tasks

claude-flow has background workers for periodic tasks (audit, optimize, consolidate). Legio could benefit from:

Periodic commentarii consolidation (summarize old notes)
Scheduled centurio health checks
Auto-cleanup of stale acta

Implementation path: We already have start_idle_reaper() for session cleanup. Extend the reaper pattern to a general operarii (workers) system with registered periodic tasks.

4. Model Routing by Task Complexity

claude-flow routes simple tasks to Haiku, complex tasks to Opus. Legio currently uses one model for everything. Adding per-centurio or per-task model selection would save cost and latency.

Implementation path: Already partially designed in the codex model-selection command. Add model field to Centurio config. Legatus chooses model based on @mention target and task complexity heuristics.

Not Worth Adopting

1. Swarm/Topology/Consensus metaphors

claude-flow's "Byzantine Fault Tolerance", "mesh topology", "Raft consensus" are labels stored in JSON — no actual distributed coordination. It's metadata theater. Legio's centurio model is honest: each centurio is a named SDK session. No need to add complexity for appearance.

2. MCP Server architecture

claude-flow is an MCP server because it needs to give Claude Code (a separate process) access to its tools. Legio doesn't need this — the Agent SDK provides tools directly via create_sdk_mcp_server(). Our tools are already in-process.

3. WASM/Agent Booster

claude-flow uses WASM for simple code transforms to avoid LLM calls. Interesting optimization but premature for Legio — our bottleneck is not "var-to-const" transforms, it's LLM reasoning quality.

4. 120+ Skill Templates

claude-flow has 120+ SKILL.md files for different agent personas. This is prompt sprawl. Legio's design is intentionally lean: centuriones are created by Caesar with specific specializations. Quality over quantity.

5. Multi-provider support (GPT, Gemini, Cohere, local models)

Not relevant. Legio is built on Claude Agent SDK — we use Claude by design.

Key Takeaways

claude-flow is primarily a prompt/config library wrapped in an MCP server. The actual intelligence comes from Claude Code following instructions in SKILL.md files. The TypeScript code manages state in JSON files and spawns CLI processes.
Legio's architecture is more direct. We use the Agent SDK's programmatic API instead of shelling out to claude CLI. Our tools are in-process MCP servers, not a separate daemon.
The valuable ideas are conceptual, not code-portable. Vector memory, pattern learning, background workers, and model routing are worth implementing in Legio's Python stack — but using claude-flow's TypeScript code directly would be a mismatch.
Marketing vs. substance. Claims like "150x-12,500x faster retrieval", "84.8% SWE-Bench solve rate", "352x faster than LLM" should be taken with skepticism. The coordination-tools.ts file literally says "No actual distributed coordination" in its own header comment.
Legio's "prompt over code" philosophy already aligns. Both projects put intelligence in prompts rather than hard-coded logic. claude-flow just has more prompts.

Recommended Action Items

Ordered by impact and feasibility:

Model routing — Add per-centurio model config (low effort, high cost savings)
Pattern learning — Track task outcomes for routing improvement (medium effort, high long-term value)
Vector memory — Add embeddings to praetorium for semantic search (medium effort, medium value)
Background operarii — Extend idle reaper to general periodic task system (low effort, medium value)

None of these require claude-flow as a dependency. They are architectural ideas implementable natively in Legio's Python stack.

claude-flow Research ​

What claude-flow Is ​

Architecture ​

Technology Stack ​

How Agents Actually Work ​

Detailed Module Analysis ​

MCP Tools (26 categories) ​

Memory System ​

Headless Worker Executor ​

Skills and Agents (Prompt Libraries) ​

Comparison with Legio ​

What Could Enrich Legio ​

Worth Adopting (Ideas, not code) ​

Not Worth Adopting ​

Key Takeaways ​

Recommended Action Items ​