claude-flow Research
Comprehensive investigation of ruvnet/claude-flow (branded "Ruflo v3") and its relevance to Legio.
What claude-flow Is
An open-source TypeScript project (v3.1.0-alpha.39, 422MB repo, 797 TS source files in v3/) that bills itself as "the leading agent orchestration platform for Claude." It operates as an MCP server that Claude Code connects to, providing 87+ tools for swarm coordination, memory management, task orchestration, and background workers.
Key distinction: claude-flow orchestrates Claude Code instances (the Anthropic CLI product). Legio orchestrates Claude Agent SDK clients (the programmatic Python SDK). These are fundamentally different runtime models.
Architecture
Claude Code (IDE/CLI)
↓ MCP protocol
claude-flow MCP server (TypeScript, Node.js 20+)
├─ 26 CLI commands
├─ 87+ MCP tools (agent, memory, swarm, coordination, etc.)
├─ Headless Worker Executor (spawns claude code processes)
├─ Memory: SQLite + HNSW vector search
├─ RuVector intelligence layer (WASM, RL routing)
└─ Agent/Task/Swarm state in .claude-flow/ JSON filesTechnology Stack
| Component | Technology |
|---|---|
| Language | TypeScript (primary) |
| Runtime | Node.js 20+, WASM |
| Package structure | Monorepo: v3/@claude-flow/{cli,shared,claims,guidance,browser,aidefence,...} |
| Memory | SQLite (sql.js WASM) + HNSW for vector search |
| Dependencies | zod, semver (core); optional: agentdb, @ruvector/*, agentic-flow |
| Agent execution | Spawns claude CLI processes in headless mode |
| Integration | MCP server exposing tools to Claude Code |
How Agents Actually Work
After deep investigation, the actual agent execution mechanism is:
- Agent state is tracked in
.claude-flow/agents/store.json— a JSON file on disk - Agent "spawning" creates a record in this JSON store with metadata (type, model, status)
- Actual work is done by either:
- Claude Code's built-in
Tasktool (Claude Code spawns subagents) - Headless Worker Executor: literally
spawn('claude', [...args])— forks a claude CLI process
- Claude Code's built-in
- Coordination state (topology, consensus, load balancing) is stored in
.claude-flow/coordination/store.json— another local JSON file
Critical finding from coordination-tools.ts:
⚠️ IMPORTANT: These tools provide LOCAL STATE MANAGEMENT.
- Topology/consensus state is tracked locally
- No actual distributed coordination
- Useful for single-machine workflow orchestrationThe "Byzantine Fault Tolerance", "Raft consensus", "mesh topology" etc. are metadata labels in JSON, not actual distributed consensus implementations. The "swarm" is conceptual state tracking, not runtime coordination.
Detailed Module Analysis
MCP Tools (26 categories)
| Tool Group | Files | Purpose | Actual Implementation |
|---|---|---|---|
| agent-tools | 1 | Agent CRUD, model routing | JSON store read/write |
| memory-tools | 1 | Store/retrieve/search | SQLite + optional HNSW |
| coordination-tools | 1 | Topology, consensus, sync | JSON store (local state only) |
| swarm-tools | 1 | Swarm init, status, control | JSON store + spawn commands |
| task-tools | 1 | Task queue, priority | In-memory priority queue |
| session-tools | 1 | Session management | JSON store |
| workflow-tools | 1 | Multi-step workflows | JSON templates |
| hive-mind-tools | 1 | Queen/worker roles | JSON store + labels |
| neural-tools | 1 | RL routing, training | Optional @ruvector |
| security-tools | 1 | Input validation, CVE | Validation + scanning |
| github-tools | 1 | PR, issues, releases | gh CLI wrappers |
| browser-tools | 1 | Headless browsing | Playwright adapter |
| embeddings-tools | 1 | Vector embeddings | ONNX MiniLM |
| claims-tools | 1 | Work claiming/stealing | DDD with event sourcing |
Memory System
- Production backend: SQLite via sql.js (WASM) with WAL mode
- Vector search: HNSW (Hierarchical Navigable Small Worlds) for semantic similarity
- Agent memory banks: Per-agent isolated storage with shared global memory
- Pattern storage (ReasoningBank): Stores task→outcome patterns for learning
- Legacy fallback: JSON file-based storage with auto-migration
Headless Worker Executor
The most concrete implementation — spawns Claude Code CLI processes:
- Process pool with configurable concurrency
- Context building from file globs
- Prompt template system with context injection
- Output parsing (text, JSON, markdown)
- Timeout handling and graceful termination
- 8 headless types: audit, optimize, testgaps, document, ultralearn, refactor, deepdive, predict
- 4 local types: map, consolidate, benchmark, preload
Skills and Agents (Prompt Libraries)
The .agents/skills/ directory contains 120+ SKILL.md files — these are prompt templates, not code. They define agent personas, capabilities, and workflows via markdown instructions that Claude Code follows. This is consistent with claude-flow's design: the intelligence is in prompts, not in code.
Comparison with Legio
| Dimension | claude-flow | Legio |
|---|---|---|
| Language | TypeScript | Python |
| Agent runtime | Claude Code CLI (headless processes) | Claude Agent SDK (programmatic API) |
| LLM integration | MCP server → Claude Code reads tools | SDK client → direct API calls |
| Interface | Claude Code IDE / CLI | Telegram bot |
| Agent definition | .md skill files + JSON config | castra/ directory with prompt.md |
| Memory | SQLite + HNSW vectors | Memoria (filesystem: edicta, acta, commentarii) |
| Session state | JSON files in .claude-flow/ | Praetorium (SQLite via aiosqlite) |
| Orchestration | Metadata tracking + spawned processes | Legatus (SDK client + router) |
| Auth | None (trusts Claude Code) | TOTP-gated destructive ops |
| Target user | Developer using Claude Code IDE | Caesar via Telegram |
| Scope | Software engineering workflows | General-purpose AI assistant team |
| Maturity | v3 alpha (39 releases), 797 TS files | v3, ~15 Python modules, 100% test coverage |
What Could Enrich Legio
Worth Adopting (Ideas, not code)
1. Vector Memory / Semantic Search
claude-flow's HNSW-based memory enables "find similar past interactions" instead of just key-value lookup. Legio's Memoria is purely filesystem-based (edicta, acta, commentarii). Adding vector embeddings to the praetorium would allow:
- "Find conversations similar to this one" for context
- Semantic search across acta (shared knowledge)
- Pattern matching: "last time Caesar asked about X, the best centurio was Y"
Implementation path: Add optional sentence-transformers embeddings to praetorium nuntii. Store vectors in SQLite (we already use aiosqlite). Use a pure-Python HNSW library like hnswlib for search. This stays within our Python stack.
2. Pattern/Outcome Learning (ReasoningBank concept)
claude-flow stores task→outcome→reward patterns to improve routing over time. Legio could:
- Track which centurio handles which tasks well (success/failure feedback)
- Store Caesar's corrections as negative patterns
- Use patterns to suggest centurio routing: "Based on history, @vorenus handles code reviews best"
Implementation path: New patternum table in praetorium. Store (task_hash, centurio, outcome, feedback). Inject patterns into legatus system prompt. Pure prompt-over-code — no RL needed.
3. Background Workers / Scheduled Tasks
claude-flow has background workers for periodic tasks (audit, optimize, consolidate). Legio could benefit from:
- Periodic commentarii consolidation (summarize old notes)
- Scheduled centurio health checks
- Auto-cleanup of stale acta
Implementation path: We already have start_idle_reaper() for session cleanup. Extend the reaper pattern to a general operarii (workers) system with registered periodic tasks.
4. Model Routing by Task Complexity
claude-flow routes simple tasks to Haiku, complex tasks to Opus. Legio currently uses one model for everything. Adding per-centurio or per-task model selection would save cost and latency.
Implementation path: Already partially designed in the codex model-selection command. Add model field to Centurio config. Legatus chooses model based on @mention target and task complexity heuristics.
Not Worth Adopting
1. Swarm/Topology/Consensus metaphors
claude-flow's "Byzantine Fault Tolerance", "mesh topology", "Raft consensus" are labels stored in JSON — no actual distributed coordination. It's metadata theater. Legio's centurio model is honest: each centurio is a named SDK session. No need to add complexity for appearance.
2. MCP Server architecture
claude-flow is an MCP server because it needs to give Claude Code (a separate process) access to its tools. Legio doesn't need this — the Agent SDK provides tools directly via create_sdk_mcp_server(). Our tools are already in-process.
3. WASM/Agent Booster
claude-flow uses WASM for simple code transforms to avoid LLM calls. Interesting optimization but premature for Legio — our bottleneck is not "var-to-const" transforms, it's LLM reasoning quality.
4. 120+ Skill Templates
claude-flow has 120+ SKILL.md files for different agent personas. This is prompt sprawl. Legio's design is intentionally lean: centuriones are created by Caesar with specific specializations. Quality over quantity.
5. Multi-provider support (GPT, Gemini, Cohere, local models)
Not relevant. Legio is built on Claude Agent SDK — we use Claude by design.
Key Takeaways
claude-flow is primarily a prompt/config library wrapped in an MCP server. The actual intelligence comes from Claude Code following instructions in SKILL.md files. The TypeScript code manages state in JSON files and spawns CLI processes.
Legio's architecture is more direct. We use the Agent SDK's programmatic API instead of shelling out to
claudeCLI. Our tools are in-process MCP servers, not a separate daemon.The valuable ideas are conceptual, not code-portable. Vector memory, pattern learning, background workers, and model routing are worth implementing in Legio's Python stack — but using claude-flow's TypeScript code directly would be a mismatch.
Marketing vs. substance. Claims like "150x-12,500x faster retrieval", "84.8% SWE-Bench solve rate", "352x faster than LLM" should be taken with skepticism. The coordination-tools.ts file literally says "No actual distributed coordination" in its own header comment.
Legio's "prompt over code" philosophy already aligns. Both projects put intelligence in prompts rather than hard-coded logic. claude-flow just has more prompts.
Recommended Action Items
Ordered by impact and feasibility:
- Model routing — Add per-centurio model config (low effort, high cost savings)
- Pattern learning — Track task outcomes for routing improvement (medium effort, high long-term value)
- Vector memory — Add embeddings to praetorium for semantic search (medium effort, medium value)
- Background operarii — Extend idle reaper to general periodic task system (low effort, medium value)
None of these require claude-flow as a dependency. They are architectural ideas implementable natively in Legio's Python stack.