Skip to content

claude-flow Research

Comprehensive investigation of ruvnet/claude-flow (branded "Ruflo v3") and its relevance to Legio.


What claude-flow Is

An open-source TypeScript project (v3.1.0-alpha.39, 422MB repo, 797 TS source files in v3/) that bills itself as "the leading agent orchestration platform for Claude." It operates as an MCP server that Claude Code connects to, providing 87+ tools for swarm coordination, memory management, task orchestration, and background workers.

Key distinction: claude-flow orchestrates Claude Code instances (the Anthropic CLI product). Legio orchestrates Claude Agent SDK clients (the programmatic Python SDK). These are fundamentally different runtime models.

Architecture

Claude Code (IDE/CLI)
  ↓ MCP protocol
claude-flow MCP server (TypeScript, Node.js 20+)
  ├─ 26 CLI commands
  ├─ 87+ MCP tools (agent, memory, swarm, coordination, etc.)
  ├─ Headless Worker Executor (spawns claude code processes)
  ├─ Memory: SQLite + HNSW vector search
  ├─ RuVector intelligence layer (WASM, RL routing)
  └─ Agent/Task/Swarm state in .claude-flow/ JSON files

Technology Stack

ComponentTechnology
LanguageTypeScript (primary)
RuntimeNode.js 20+, WASM
Package structureMonorepo: v3/@claude-flow/{cli,shared,claims,guidance,browser,aidefence,...}
MemorySQLite (sql.js WASM) + HNSW for vector search
Dependencieszod, semver (core); optional: agentdb, @ruvector/*, agentic-flow
Agent executionSpawns claude CLI processes in headless mode
IntegrationMCP server exposing tools to Claude Code

How Agents Actually Work

After deep investigation, the actual agent execution mechanism is:

  1. Agent state is tracked in .claude-flow/agents/store.json — a JSON file on disk
  2. Agent "spawning" creates a record in this JSON store with metadata (type, model, status)
  3. Actual work is done by either:
    • Claude Code's built-in Task tool (Claude Code spawns subagents)
    • Headless Worker Executor: literally spawn('claude', [...args]) — forks a claude CLI process
  4. Coordination state (topology, consensus, load balancing) is stored in .claude-flow/coordination/store.json — another local JSON file

Critical finding from coordination-tools.ts:

⚠️ IMPORTANT: These tools provide LOCAL STATE MANAGEMENT.
- Topology/consensus state is tracked locally
- No actual distributed coordination
- Useful for single-machine workflow orchestration

The "Byzantine Fault Tolerance", "Raft consensus", "mesh topology" etc. are metadata labels in JSON, not actual distributed consensus implementations. The "swarm" is conceptual state tracking, not runtime coordination.

Detailed Module Analysis

MCP Tools (26 categories)

Tool GroupFilesPurposeActual Implementation
agent-tools1Agent CRUD, model routingJSON store read/write
memory-tools1Store/retrieve/searchSQLite + optional HNSW
coordination-tools1Topology, consensus, syncJSON store (local state only)
swarm-tools1Swarm init, status, controlJSON store + spawn commands
task-tools1Task queue, priorityIn-memory priority queue
session-tools1Session managementJSON store
workflow-tools1Multi-step workflowsJSON templates
hive-mind-tools1Queen/worker rolesJSON store + labels
neural-tools1RL routing, trainingOptional @ruvector
security-tools1Input validation, CVEValidation + scanning
github-tools1PR, issues, releasesgh CLI wrappers
browser-tools1Headless browsingPlaywright adapter
embeddings-tools1Vector embeddingsONNX MiniLM
claims-tools1Work claiming/stealingDDD with event sourcing

Memory System

  • Production backend: SQLite via sql.js (WASM) with WAL mode
  • Vector search: HNSW (Hierarchical Navigable Small Worlds) for semantic similarity
  • Agent memory banks: Per-agent isolated storage with shared global memory
  • Pattern storage (ReasoningBank): Stores task→outcome patterns for learning
  • Legacy fallback: JSON file-based storage with auto-migration

Headless Worker Executor

The most concrete implementation — spawns Claude Code CLI processes:

  • Process pool with configurable concurrency
  • Context building from file globs
  • Prompt template system with context injection
  • Output parsing (text, JSON, markdown)
  • Timeout handling and graceful termination
  • 8 headless types: audit, optimize, testgaps, document, ultralearn, refactor, deepdive, predict
  • 4 local types: map, consolidate, benchmark, preload

Skills and Agents (Prompt Libraries)

The .agents/skills/ directory contains 120+ SKILL.md files — these are prompt templates, not code. They define agent personas, capabilities, and workflows via markdown instructions that Claude Code follows. This is consistent with claude-flow's design: the intelligence is in prompts, not in code.

Comparison with Legio

Dimensionclaude-flowLegio
LanguageTypeScriptPython
Agent runtimeClaude Code CLI (headless processes)Claude Agent SDK (programmatic API)
LLM integrationMCP server → Claude Code reads toolsSDK client → direct API calls
InterfaceClaude Code IDE / CLITelegram bot
Agent definition.md skill files + JSON configcastra/ directory with prompt.md
MemorySQLite + HNSW vectorsMemoria (filesystem: edicta, acta, commentarii)
Session stateJSON files in .claude-flow/Praetorium (SQLite via aiosqlite)
OrchestrationMetadata tracking + spawned processesLegatus (SDK client + router)
AuthNone (trusts Claude Code)TOTP-gated destructive ops
Target userDeveloper using Claude Code IDECaesar via Telegram
ScopeSoftware engineering workflowsGeneral-purpose AI assistant team
Maturityv3 alpha (39 releases), 797 TS filesv3, ~15 Python modules, 100% test coverage

What Could Enrich Legio

Worth Adopting (Ideas, not code)

1. Vector Memory / Semantic Search

claude-flow's HNSW-based memory enables "find similar past interactions" instead of just key-value lookup. Legio's Memoria is purely filesystem-based (edicta, acta, commentarii). Adding vector embeddings to the praetorium would allow:

  • "Find conversations similar to this one" for context
  • Semantic search across acta (shared knowledge)
  • Pattern matching: "last time Caesar asked about X, the best centurio was Y"

Implementation path: Add optional sentence-transformers embeddings to praetorium nuntii. Store vectors in SQLite (we already use aiosqlite). Use a pure-Python HNSW library like hnswlib for search. This stays within our Python stack.

2. Pattern/Outcome Learning (ReasoningBank concept)

claude-flow stores task→outcome→reward patterns to improve routing over time. Legio could:

  • Track which centurio handles which tasks well (success/failure feedback)
  • Store Caesar's corrections as negative patterns
  • Use patterns to suggest centurio routing: "Based on history, @vorenus handles code reviews best"

Implementation path: New patternum table in praetorium. Store (task_hash, centurio, outcome, feedback). Inject patterns into legatus system prompt. Pure prompt-over-code — no RL needed.

3. Background Workers / Scheduled Tasks

claude-flow has background workers for periodic tasks (audit, optimize, consolidate). Legio could benefit from:

  • Periodic commentarii consolidation (summarize old notes)
  • Scheduled centurio health checks
  • Auto-cleanup of stale acta

Implementation path: We already have start_idle_reaper() for session cleanup. Extend the reaper pattern to a general operarii (workers) system with registered periodic tasks.

4. Model Routing by Task Complexity

claude-flow routes simple tasks to Haiku, complex tasks to Opus. Legio currently uses one model for everything. Adding per-centurio or per-task model selection would save cost and latency.

Implementation path: Already partially designed in the codex model-selection command. Add model field to Centurio config. Legatus chooses model based on @mention target and task complexity heuristics.

Not Worth Adopting

1. Swarm/Topology/Consensus metaphors

claude-flow's "Byzantine Fault Tolerance", "mesh topology", "Raft consensus" are labels stored in JSON — no actual distributed coordination. It's metadata theater. Legio's centurio model is honest: each centurio is a named SDK session. No need to add complexity for appearance.

2. MCP Server architecture

claude-flow is an MCP server because it needs to give Claude Code (a separate process) access to its tools. Legio doesn't need this — the Agent SDK provides tools directly via create_sdk_mcp_server(). Our tools are already in-process.

3. WASM/Agent Booster

claude-flow uses WASM for simple code transforms to avoid LLM calls. Interesting optimization but premature for Legio — our bottleneck is not "var-to-const" transforms, it's LLM reasoning quality.

4. 120+ Skill Templates

claude-flow has 120+ SKILL.md files for different agent personas. This is prompt sprawl. Legio's design is intentionally lean: centuriones are created by Caesar with specific specializations. Quality over quantity.

5. Multi-provider support (GPT, Gemini, Cohere, local models)

Not relevant. Legio is built on Claude Agent SDK — we use Claude by design.

Key Takeaways

  1. claude-flow is primarily a prompt/config library wrapped in an MCP server. The actual intelligence comes from Claude Code following instructions in SKILL.md files. The TypeScript code manages state in JSON files and spawns CLI processes.

  2. Legio's architecture is more direct. We use the Agent SDK's programmatic API instead of shelling out to claude CLI. Our tools are in-process MCP servers, not a separate daemon.

  3. The valuable ideas are conceptual, not code-portable. Vector memory, pattern learning, background workers, and model routing are worth implementing in Legio's Python stack — but using claude-flow's TypeScript code directly would be a mismatch.

  4. Marketing vs. substance. Claims like "150x-12,500x faster retrieval", "84.8% SWE-Bench solve rate", "352x faster than LLM" should be taken with skepticism. The coordination-tools.ts file literally says "No actual distributed coordination" in its own header comment.

  5. Legio's "prompt over code" philosophy already aligns. Both projects put intelligence in prompts rather than hard-coded logic. claude-flow just has more prompts.

Ordered by impact and feasibility:

  1. Model routing — Add per-centurio model config (low effort, high cost savings)
  2. Pattern learning — Track task outcomes for routing improvement (medium effort, high long-term value)
  3. Vector memory — Add embeddings to praetorium for semantic search (medium effort, medium value)
  4. Background operarii — Extend idle reaper to general periodic task system (low effort, medium value)

None of these require claude-flow as a dependency. They are architectural ideas implementable natively in Legio's Python stack.

Built with Roman discipline.