Legio v3 — Implementation Plan

Status: DRAFT — all questions resolved, Codex review amendments applied (D27–D30), awaiting Caesar's final review

Goal

Build a single-process multi-agent system where Caesar commands a team of AI Centuriones through one Telegram chat. Built on Claude Agent SDK (Python).

Decisions Log

#	Decision	Decided by	Date
D1	Legatus is an SDK agent (ClaudeSDKClient with its own prompt), not mechanical code	Caesar	2026-02-14
D2	Centuriones have persistent sessions (ClaudeSDKClient instances kept alive)	Caesar	2026-02-14
D3	Centurio definition lives in prompt files on disk — Caesar can directly edit filesystem	Caesar	2026-02-14
D4	Memoria exposed as MCP tools to all agents	Caesar	2026-02-14
D5	Memoria file format follows Claude conventions: XML tags for structure, plain text body	Caesar	2026-02-14
D6	Praetorium persisted to SQLite (not in-memory only) for crash recovery	Caesar	2026-02-14
D7	Audience uses explicit names only — no glob patterns. `["all"]` is the only wildcard.	Antony	2026-02-14
D8	Every Nuntius carries a UUID (`id` field). Required for `reply_to` threading and SQLite PK.	Antony	2026-02-14
D9	Yes, Caesar can send Legatus-only messages. `audience=["legatus"]` excludes all Centuriones.	Antony	2026-02-14
D10	Centurio has a `status` field (idle / working / error) — tracked in-memory by Legatus, not persisted.	Antony	2026-02-14
D11	Max 10 concurrent Centuriones (configurable in `legio.toml`). Soft limit enforced at creation time.	Antony	2026-02-14
D12	Edicta and Acta are mutable (overwrite). Commentarii are append-only (immutable log).	Antony	2026-02-14
D13	No hard size limits per file. Soft warning at 50KB per file, 1MB total per Centurio's commentarii.	Antony	2026-02-14
D14	SQLite stores all history unbounded. Sliding window of last N nuntii provided as context to agents. Default N=50.	Antony	2026-02-14
D15	Centurio responses are Legatus-mediated — returned to Legatus, which posts them to Praetorium and relays to Telegram.	Antony	2026-02-14
D16	Multi-mention dispatches Centuriones in parallel via `asyncio.gather`. Responses collected, then sent sequentially.	Antony	2026-02-14
D17	Centurio responses returned as `str` from `dispatch_to_centurio()`. Legatus posts to Praetorium and returns to Telegram handler.	Antony	2026-02-14
D18	History formatted as XML context block with `<nuntius>` tags per message. Claude-native, consistent with D5.	Antony	2026-02-14
D19	Token budget: sliding window (D14). When context exceeds 75% of model window, oldest nuntii dropped. No summarization in v3.	Antony	2026-02-14
D20	Both: template in `templates/centurio/` as scaffold, but Legatus LLM customizes the prompt. Caesar can also create manually (D3).	Antony	2026-02-14
D21	Edit-in-place: send "⏳ working..." immediately, edit with result. Typing indicator every 5s for long tasks.	Antony	2026-02-14
D22	One Telegram message per Centurio, with header line identifying the responder. Sent sequentially.	Antony	2026-02-14
D23	HTML parse mode for Telegram messages. Simpler escaping than MarkdownV2, supports code blocks via `<pre>`.	Antony	2026-02-14
D24	Hybrid: support both `/commands` and @mentions. Commands for quick actions, @mentions for conversations.	Antony	2026-02-14
D25	Yes, Legatus LLM can auto-route without @mention — "prompt over code" applied to routing. Configurable in Legatus prompt.	Antony	2026-02-14
D26	Centurio list injected dynamically into Legatus system prompt on each call. MCP tool as backup for detailed queries.	Antony	2026-02-14
D27	Legatus uses persistent `ClaudeSDKClient` (same as Centuriones). System prompt includes full roster — rebuilt only when roster changes (centurio created/removed). Real-time status injected in user message. SDK conversation history preserved for continuity.	Antony	2026-02-14
D28	@mention parsing: case-insensitive `@(\w+)` regex, matched against registered names only. Unrecognized @mentions pass through as text. Email/URL `@` not matched.	Antony	2026-02-14
D29	Idle session timeout: Centurio `ClaudeSDKClient` torn down after 30min inactivity (configurable). Periodic `asyncio.Task` reaps stale sessions. Recreated on next dispatch.	Antony	2026-02-14
D30	Graceful shutdown: SIGTERM/SIGINT handler tears down all SDK clients, closes Praetorium DB, stops Telegram polling.	Antony	2026-02-14
D31	SDK requires async generator input for `client.query()` when MCP servers are attached. All query calls use `async def _prompt(): yield text` wrapper.	Antony	2026-02-14
D32	Per-Centurio MCP server: `build_memoria_centurio_server(store, centurio_name)` bakes identity into closure. No shared server with magic caller variable.	Antony	2026-02-14
D33	Legatus client rebuild detects both roster changes AND legatus prompt.md edits (mtime check). Caesar can edit prompt without restarting.	Antony	2026-02-14

Phase 1 — Data Models (no I/O, pure Python)

The foundation. Every other layer imports these. Zero dependencies on infrastructure.

1.1 `legio/nuntius.py` — Message

Per D7, D8, D9: explicit audience names, UUID for threading, Legatus-only messages supported.

python

@dataclass(frozen=True)
class Nuntius:
    id: str                    # UUID4 — unique identifier (D8)
    sender: str                # "caesar", "legatus", or centurio name
    text: str
    audience: tuple[str, ...]  # ("all",), ("legatus",), ("vorenus", "pullo") (D7, D9)
    timestamp: datetime        # UTC, timezone-aware
    reply_to: str | None       # UUID of prior nuntius, or None (D8)

Design notes:

frozen=True because nuntii are immutable once posted.
audience is tuple[str, ...], not list[str] — a mutable list inside a frozen dataclass is a Python footgun (the list can still be mutated). Tuple enforces true immutability.
audience=("all",) is the only wildcard (D7). No glob patterns — explicit names keep visibility rules simple and auditable.
audience=("legatus",) lets Caesar have private conversations with the Legatus (D9). Example: "Legatus, what do you think of Vorenus's work?" — no Centurio sees this.
id is UUID4 string (D8). Required for reply_to threading and as SQLite primary key.

1.2 `legio/centurio.py` — Agent identity (not behavior)

Per D3, D10: minimal registry entry. Status tracked in-memory by Legatus.

python

@dataclass
class Centurio:
    name: str              # "vorenus", "pullo" — lowercase
    centuria: Path         # workspace/centuriones/vorenus/
    status: str = "idle"   # "idle" | "working" | "error" (D10)

The Centuria folder is the agent definition:

workspace/centuriones/vorenus/
  prompt.md              # System prompt — the full AgentDefinition
  tools.json             # Allowed tools list
  commentarii/           # Private working memory

Status field (D10): Tracked in-memory by the Legatus, not persisted to disk or SQLite. Reason: status is ephemeral runtime state — after a restart, all Centuriones begin as idle. The Legatus sets working before dispatch, idle on completion, error on failure. Status is exposed via /status command (D24) and the list_centuriones MCP tool.

Max concurrent Centuriones (D11): Default limit of 10, configurable in legio.toml as max_centuriones = 10. Enforced at creation time — create_centurio raises LegioError if limit reached. Rationale: each Centurio holds a persistent ClaudeSDKClient (D2) with its own context window. 10 is generous for a single-user system; the limit prevents accidental resource exhaustion.

1.3 `legio/config.py` — Configuration

python

@dataclass
class CaesarConfig:
    telegram_id: int

@dataclass
class LegioConfig:
    caesar: CaesarConfig
    model: str                           # "sonnet" — Claude model for all agents
    workspace_dir: Path                  # Path to workspace/
    max_centuriones: int = 10            # D11
    history_window: int = 50             # D14 — nuntii visible per dispatch
    session_idle_timeout_minutes: int = 30  # D27 — tear down idle SDK sessions
    telegram_bot_token: str = ""         # From .env — TELEGRAM_BOT_TOKEN
    anthropic_api_key: str = ""          # From .env — ANTHROPIC_API_KEY

Secrets (telegram_bot_token, anthropic_api_key) loaded from .env via os.environ, never from legio.toml. Config loading merges both sources.

Loaded from legio.toml + .env. The legio.toml gains:

toml

[caesar]
telegram_id = 0

model = "sonnet"
workspace_dir = "workspace"
max_centuriones = 10
history_window = 50
session_idle_timeout_minutes = 30

1.4 `legio/errors.py` — Exception hierarchy

python

class LegioError(Exception): ...
class ImperiumDenied(LegioError): ...       # Auth / permission failures
class CenturioNotFound(LegioError): ...     # Unknown centurio name
class CenturioLimitReached(LegioError): ... # D11 — max centuriones exceeded
class MemoriaError(LegioError): ...         # Filesystem I/O failures
class PraetoriumError(LegioError): ...      # SQLite / message bus failures

Phase 2 — Memoria (filesystem + MCP tools)

Per D4: Memoria is exposed as MCP tools so both Legatus and Centuriones access knowledge through prompting, not Python imports.

2.1 `legio/memoria/store.py` — MemoriaStore (internal)

The Python implementation. Reads/writes the three layers:

workspace/
  edicta/           # Legio-wide standing orders (XML files)
  acta/             # Shared knowledge (XML files)
  centuriones/
    vorenus/
      prompt.md
      tools.json
      commentarii/  # Private notes (XML files)

Operations:

list_edicta() -> list[str] — returns names (filename stems)
read_edictum(name: str) -> str — returns raw XML content
publish_edictum(name: str, content: str, author: str) -> None — overwrites (D12)
revoke_edictum(name: str) -> None — deletes the file
list_acta() -> list[str]
read_actum(name: str) -> str
publish_actum(name: str, content: str, author: str) -> None — overwrites (D12)
list_commentarii(centurio_name: str) -> list[str]
read_commentarium(centurio_name: str, name: str) -> str
write_commentarium(centurio_name: str, name: str, content: str) -> None — append-only (D12)

Mutability rules (D12):

Edicta — mutable. Caesar/Legatus can update standing orders as policy evolves. Overwrite semantics.
Acta — mutable. Shared knowledge gets corrected and refined. Overwrite semantics.
Commentarii — append-only. A Centurio's private journal is an immutable log. New entries create new files; existing files are never modified. This preserves audit trail and prevents accidental data loss.

Soft size warnings (D13): No hard limits enforced in code. The Legatus prompt instructs agents to keep files under 50KB each, and total commentarii under 1MB per Centurio. These are prompt-level guidelines, not code-level enforcement — consistent with "prompt over code" philosophy.

2.2 Memoria file format (D5)

Per D5: Follow Claude's own conventions. Claude works best with XML tags for structured content.

Edicta and Acta — XML-tagged plain text files:

xml

<edictum name="security-policy" author="caesar" timestamp="2026-02-14T12:00:00Z">
All Centuriones must validate input before processing.
Never expose API keys in responses.
Rate-limit external API calls to 10/minute.
</edictum>

xml

<actum name="api-research" author="vorenus" timestamp="2026-02-14T15:30:00Z">
Researched the payments API. Key findings:
- Authentication uses OAuth 2.0 with PKCE
- Rate limit: 100 req/min per client
- Sandbox endpoint: https://sandbox.api.example.com
</actum>

Commentarii — same XML pattern, per-Centurio, append-only (D12):

xml

<commentarium name="task-notes-001" timestamp="2026-02-14T16:00:00Z">
Caesar asked me to investigate the payments API.
I found the docs at example.com/docs.
Next step: test the sandbox endpoint.
</commentarium>

Why XML, not YAML frontmatter:

Claude's own documentation recommends XML tags for structured content
XML tags are Claude-native — the model parses them more accurately
No parsing library needed — just string formatting on write, Claude reads natively
Metadata (author, timestamp, name) lives in tag attributes, body is free-form text
Files on disk are human-readable and grep-able

File naming: <name>.xml in the respective directory. The name attribute in the tag matches the filename stem.

2.3 `legio/memoria/tools.py` — MCP tool wrappers

Wraps MemoriaStore operations as MCP tools using @tool decorator:

python

@tool("read_edictum", "Read a standing order by name", {"name": str})
async def read_edictum_tool(args):
    content = store.read_edictum(args["name"])
    return {"content": [{"type": "text", "text": content}]}

@tool("publish_actum", "Publish shared knowledge", {"name": str, "content": str})
async def publish_actum_tool(args):
    store.publish_actum(args["name"], args["content"], author=caller_name)
    return {"content": [{"type": "text", "text": f"Published: {args['name']}"}]}

# ... etc for all operations

Two MCP server patterns built from these tools:

memoria_full — all operations (for Legatus: can write edicta, revoke edicta). Single shared instance.
memoria_centurio — restricted (no edictum writes/revokes, commentarii scoped to caller). Built per-Centurio via build_memoria_centurio_server(store, centurio_name) — the centurio_name is captured in a closure so every write_commentarium / list_commentarii / read_commentarium call is automatically scoped to the owning Centurio. No magic caller_name variable — the identity is baked in at server creation time.

python

def build_memoria_centurio_server(store: MemoriaStore, centurio_name: str):
    """Build a per-Centurio MCP server with commentarii scoped to owner."""

    @tool("write_commentarium", "Write to your private journal", {"name": str, "content": str})
    async def write_commentarium_tool(args):
        store.write_commentarium(centurio_name, args["name"], args["content"])
        return {"content": [{"type": "text", "text": f"Written: {args['name']}"}]}

    # ... other scoped tools ...

    return create_sdk_mcp_server(
        name=f"memoria_{centurio_name}",
        tools=[write_commentarium_tool, ...],
    )

Phase 3 — Praetorium (message bus, SQLite-backed)

3.1 `legio/praetorium.py` — Session & message routing

Per D6: The Praetorium persists to SQLite for crash recovery. Conversations survive process restarts.

Database: workspace/praetorium.db — single SQLite file.

sql

CREATE TABLE nuntii (
    id TEXT PRIMARY KEY,          -- UUID (D8)
    sender TEXT NOT NULL,         -- "caesar", "legatus", "vorenus"
    text TEXT NOT NULL,
    audience TEXT NOT NULL,       -- JSON array: '["all"]', '["legatus"]', '["vorenus","pullo"]'
    timestamp TEXT NOT NULL,      -- ISO 8601 UTC
    reply_to TEXT,                -- FK to nuntii.id, nullable (D8)
    FOREIGN KEY (reply_to) REFERENCES nuntii(id)
);

CREATE INDEX idx_nuntii_timestamp ON nuntii(timestamp);
CREATE INDEX idx_nuntii_sender ON nuntii(sender);

python

class Praetorium:
    def __init__(self, db_path: Path): ...

    async def post(self, nuntius: Nuntius) -> None
    """Insert a nuntius into the database."""

    async def get_visible_nuntii(self, viewer: str, limit: int = 50) -> list[Nuntius]
    """Return nuntii visible to a given viewer, most recent first.

    Visibility rules:
    - "caesar" and "legatus" see everything.
    - A Centurio sees: (a) nuntii where its name appears in audience JSON,
      (b) nuntii with audience '["all"]'.
    """

    async def get_history(self, limit: int = 50) -> list[Nuntius]
    """Return the last N nuntii, regardless of audience. Admin view."""

History management (D14):

SQLite stores all nuntii unbounded. Disk is cheap; history is valuable.
When providing context to agents, a sliding window of the last N nuntii is used (default N=50, configurable via legio.toml).
The limit parameter on get_visible_nuntii controls this window.
No automatic summarization or compaction in v3. If history grows very large, Caesar can issue an edictum summarizing past context and reset (manual, prompt-level).

Audience serialization: Nuntius stores audience as tuple[str, ...] (frozen dataclass). SQLite stores it as a JSON array string ('["all"]'). Praetorium handles the conversion:

Write: json.dumps(list(nuntius.audience))
Read: tuple(json.loads(row["audience"]))

Dependency: aiosqlite for async SQLite access (add to pyproject.toml).

3.2 Response flow (D15, D17)

All Centurio responses are Legatus-mediated (D15):

Caesar → Telegram → Legatus.handle_message()
                          ↓
                    dispatch_to_centurio(name, nuntius)
                          ↓
                    Centurio processes, returns str (D17)
                          ↓
                    Legatus posts response to Praetorium
                          ↓
                    Legatus returns response to Telegram handler
                          ↓
                    Telegram handler sends to Caesar

Why Legatus-mediated, not direct:

The Legatus controls what gets posted — it can filter, annotate, or suppress.
Consistent flow: all messages pass through one code path.
The Legatus can add metadata (e.g., "[Vorenus responds]:" header) before posting.
Future: the Legatus could route a Centurio's response to another Centurio for review.

Phase 4 — Legatus (SDK agent + orchestrator)

4.1 `legio/legatus.py`

Per D1: The Legatus is itself a Claude SDK agent with a persistent session (D2). It is not a Centurio — it's the built-in orchestrator — but it thinks via LLM, not if/else chains.

Persistent client with roster-aware rebuild (D27 — SDK constraint resolved): The SDK locks system_prompt at ClaudeSDKClient creation. The Legatus needs the Centurio roster in its system prompt. Solution: the full roster (names + specializations) is baked into the system prompt at client creation. The client is rebuilt only when the roster changes (centurio created or removed) — not on every call. Real-time status (idle/working/error) is injected as an XML block in the user message, since status flickers frequently but doesn't warrant a client rebuild. This preserves SDK conversation history for continuity — the Legatus remembers recent exchanges with Caesar.

Two hats:

Orchestrator (code) — message parsing, @mention extraction, session management, Centurio dispatch, Telegram I/O. This is plumbing that prompts can't do.
Default responder (SDK agent) — when no Centurio is mentioned (and auto-routing doesn't apply), the Legatus's own ClaudeSDKClient handles the conversation. Its prompt defines personality, routing heuristics, and Centurio management commands.

python

class Legatus:
    def __init__(self, config, praetorium, memoria):
        self._config = config
        self._praetorium = praetorium
        self._memoria = memoria
        self._client: ClaudeSDKClient  # lazy init — needs async
        self._roster_hash: str = ""    # track roster changes for client rebuild
        self._centurio_sessions: dict[str, ClaudeSDKClient] = {}
        self._centuriones: dict[str, Centurio] = {}  # registry

    async def handle_message(self, text: str, telegram_user_id: int) -> list[str]
    """Main entry point. Returns list of response strings (one per responder).

    Flow:
    1. Parse @mentions from text (see @mention parsing rules below)
    2. Post Caesar's nuntius to Praetorium
    3. If roster or legatus prompt.md changed → rebuild Legatus client (new system prompt)
    4. If @mentions → dispatch to mentioned Centuriones (parallel, D16)
    5. If no @mentions → delegate to Legatus LLM (may auto-route per D25)
    6. Post all responses to Praetorium
    7. Return responses for Telegram delivery
    """

    async def dispatch_to_centurio(self, name: str, nuntius: Nuntius) -> str
    """Send a nuntius to a specific Centurio. Returns response text."""

    def list_centuriones(self) -> list[Centurio]
    """Scan workspace/centuriones/ directory. Returns registered Centuriones."""

    def _build_system_prompt(self) -> str
    """Read workspace/legatus/prompt.md + inject Centurio roster (D26)."""

    def _should_rebuild_client(self) -> bool
    """Rebuild Legatus client if roster changed OR legatus prompt.md edited.
    Checks: (a) hash of centurio names+specializations, (b) mtime of prompt.md."""

    async def shutdown(self) -> None
    """Tear down all SDK client subprocesses (Legatus + all Centuriones).
    Called by __main__.py on SIGTERM/SIGINT (D30)."""

Legatus prompt lives at workspace/legatus/prompt.md — Caesar can edit it too.

Dynamic Centurio roster injection (D26): The full roster (names + specializations) is embedded in the system prompt at client creation. When the roster changes (centurio created/removed) or workspace/legatus/prompt.md is edited on disk, _should_rebuild_client() detects it and the client is torn down and rebuilt. Real-time status is injected per-call as a lightweight block in the user message:

xml

<centurio_status>
  <centurio name="vorenus" status="idle"/>
  <centurio name="pullo" status="working"/>
</centurio_status>

The full roster in the system prompt provides specialization descriptions for routing decisions:

xml

<centuriones>
  <centurio name="vorenus" status="idle">Code specialist. Handles implementation tasks.</centurio>
  <centurio name="pullo" status="working">Research specialist. Currently working on API analysis.</centurio>
</centuriones>

The roster is populated by scanning workspace/centuriones/*/prompt.md (first line or summary) and the in-memory status field (D10). This gives the Legatus LLM full awareness of available agents, their specializations, and current state — enabling smart auto-routing (D25).

Why inject, not MCP tool (D26): The Centurio list is needed for every routing decision. Making the LLM call a tool before it can even think about routing adds latency and a wasted API round-trip. Inject it into the system prompt so it's always available. The list_centuriones MCP tool still exists for detailed queries (reading full prompts, checking tools.json).

LLM-driven routing (D25): When Caesar sends a message without @mentions, the Legatus LLM decides whether to:

Handle it itself (general conversation, meta-questions)
Route to a specific Centurio based on message content and Centurio specializations
Route to multiple Centuriones if the task spans specializations

This is configured in the Legatus prompt, not hard-coded. Caesar can adjust routing behavior by editing workspace/legatus/prompt.md — "prompt over code."

@mention parsing rules:

Format: @name — case-insensitive, matched against registered Centurio names.
Word-boundary match: @vorenus matches, but @vorenus.com does not. Implementation: extract @(\w+) tokens, then filter against self._centuriones registry. Unrecognized @mentions are ignored (passed through as text to the Legatus LLM).
Multiple mentions in one message: @Vorenus @Pullo do X → dispatches to both (D16).
@ in email addresses or URLs: not matched because \w+ stops at ., /, etc.

Legatus tools (via MCP):

Full Memoria access (read/write/revoke edicta, acta)
Centurio management: create_centurio, remove_centurio, list_centuriones
Praetorium: post_nuntius, get_history
dispatch_to_centurio — so the Legatus LLM can route messages to Centuriones (D25)

4.2 Centurio session management

Per D2: Each Centurio gets a persistent ClaudeSDKClient stored in self._centurio_sessions.

Lifecycle:

First dispatch → create ClaudeSDKClient with prompt from prompt.md, tools from tools.json, Memoria MCP (restricted). Store in _centurio_sessions[name]. Set status = "working" (D10).
Subsequent dispatches → reuse existing client (SDK maintains internal conversation history). Set status = "working". Update last_active timestamp.
Prompt changed on disk → detect on next dispatch (compare file mtime or content hash against last-seen). If changed, tear down old client, create new one. Old conversation context is lost — this is intentional (new prompt = new agent).
Remove Centurio → disconnect client, delete from _centurio_sessions, remove from _centuriones registry, optionally delete folder.
Error → set status = "error", log the error, return error message to Legatus for relay.
Completion → set status = "idle".
Idle timeout → each ClaudeSDKClient is a subprocess. To avoid holding idle processes indefinitely, sessions inactive for 30 minutes are torn down. A periodic asyncio.Task checks last_active timestamps and disconnects stale clients. Next dispatch recreates the session (step 1). Timeout configurable in legio.toml as session_idle_timeout_minutes = 30.

Multi-mention parallelism (D16): When Caesar @mentions multiple Centuriones (e.g., @Vorenus @Pullo investigate this), all are dispatched in parallel via asyncio.gather. Responses are collected, then posted to Praetorium and relayed to Telegram sequentially (one message per responder, D22).

python

async def _dispatch_parallel(self, names: list[str], nuntius: Nuntius) -> list[str]:
    """Dispatch to multiple Centuriones in parallel. Returns list of responses."""
    tasks = [self.dispatch_to_centurio(name, nuntius) for name in names]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    responses = []
    for name, result in zip(names, results):
        if isinstance(result, Exception):
            responses.append(f"[{name}] Error: {result}")
        else:
            responses.append(result)
    return responses

Phase 5 — Centurio Runtime (SDK integration)

5.1 Running a Centurio

Each Centurio has a persistent ClaudeSDKClient (D2). On dispatch:

Legatus checks if session exists and prompt is unchanged. Creates/recreates if needed.
Legatus builds visible history via praetorium.get_visible_nuntii(name, limit=config.history_window).
Legatus formats history as XML context block (D18) and prepends to the new message.
Legatus sends to the Centurio's persistent client:

python

client = self._centurio_sessions[name]
context = self._format_history(visible_nuntii)
full_message = f"{context}\n\n{nuntius.text}"

# SDK requires async generator input when MCP servers are attached.
# Plain string query() won't trigger MCP tool handling.
async def _prompt():
    yield full_message

await client.query(_prompt())
# collect response text via client.receive_response()

5.2 History format (D18)

Visible history is formatted as an XML context block — consistent with Memoria format (D5):

xml

<praetorium recent="true" viewer="vorenus">
  <nuntius id="abc-123" sender="caesar" timestamp="2026-02-14T10:00:00Z">
    @Vorenus investigate the payments API
  </nuntius>
  <nuntius id="def-456" sender="vorenus" timestamp="2026-02-14T10:05:00Z">
    I'll look into it. Starting with the OAuth docs.
  </nuntius>
  <nuntius id="ghi-789" sender="caesar" timestamp="2026-02-14T10:30:00Z">
    @Vorenus also check rate limits
  </nuntius>
</praetorium>

Why XML, not raw messages (D18):

Consistent with Memoria XML format (D5) — one convention everywhere
Claude parses XML tags natively and accurately
Metadata (sender, timestamp, id) in attributes keeps body clean
Structured format lets Claude distinguish messages from instructions

5.3 Token budget management (D19)

The Claude Agent SDK does not provide automatic context truncation. We manage it ourselves:

Strategy: sliding window with monitoring.

Default window is last 50 nuntii (D14, configurable). This is the context provided per dispatch.
Token tracking: after each client.query(), the SDK returns usage data including input_tokens and output_tokens. Track cumulative tokens per session.
Overflow threshold: when cumulative input tokens exceed 75% of the model's context window (~150K for Sonnet), trigger a session reset: tear down the ClaudeSDKClient, create a new one. The new session starts fresh with just the sliding window of recent nuntii — no old conversation context.
No summarization in v3: automatic summarization adds complexity and latency. The sliding window + session reset is sufficient for a single-user system. Caesar can always issue an edictum summarizing important context if needed.

python

class SessionTokenTracker:
    """Track cumulative token usage per Centurio session."""

    def __init__(self, max_input_tokens: int = 150_000):
        self.max_input_tokens = max_input_tokens
        self.cumulative_input: int = 0

    def update(self, result: ResultMessage) -> None:
        """Extract token count from SDK ResultMessage. Usage may be None."""
        if result.usage:
            self.cumulative_input += result.usage.get("input_tokens", 0)

    def should_reset(self) -> bool:
        return self.cumulative_input > self.max_input_tokens

What the Centurio sees as MCP tools:

mcp__memoria__list_acta / read_actum / publish_actum
mcp__memoria__list_commentarii / read_commentarium / write_commentarium (scoped to self)
mcp__memoria__list_edicta / read_edictum (read-only)

What the Centurio does NOT see:

Other Centuriones' commentarii
Edictum write access
Centurio management tools
dispatch_to_centurio (only Legatus can route)

5.4 Centurio creation (D20 — template + LLM)

Per D20: Both template and LLM.

Template at templates/centurio/:

templates/centurio/
  prompt.md.template     # Scaffold with placeholders
  tools.json.template    # Default tool permissions

prompt.md.template:

markdown

# {{name}}

You are {{name}}, a Centurio in the Legio system.

## Specialization

{{specialization}}

## Guidelines

- Address the human as Caesar.
- Write your findings to commentarii for persistence.
- Read edicta before starting any task — they contain standing orders.
- Publish important discoveries to acta for other Centuriones.

## Tools

You have access to the Memoria system via MCP tools:
- `list_edicta` / `read_edictum` — standing orders (read-only)
- `list_acta` / `read_actum` / `publish_actum` — shared knowledge
- `list_commentarii` / `read_commentarium` / `write_commentarium` — your private notes

tools.json.template:

json

{
  "allowed_tools": ["memoria_centurio"]
}

Creation flow:

Caesar says: "Create a Centurio named Cicero who specializes in writing."
Legatus LLM receives the request, calls create_centurio tool with name="cicero", specialization="writing".
Tool implementation copies template, substitutes and , writes to workspace/centuriones/cicero/.
Legatus LLM may further customize the prompt by calling a follow-up tool or editing the generated file.
Legatus LLM responds to Caesar confirming creation.

Caesar can also just mkdir workspace/centuriones/cicero/ and write the files directly (D3). The Legatus discovers new Centuriones by scanning the centuriones/ directory.

Phase 6 — Telegram Integration

6.1 `legio/telegram/bot.py`

Single Telegram bot. One chat with Caesar.

Message formatting (D23): HTML parse mode.

python

import html

PARSE_MODE = "HTML"  # D23 — simpler escaping than MarkdownV2

async def start(config: LegioConfig, legatus: Legatus) -> None:
    app = ApplicationBuilder().token(bot_token).build()
    app.add_handler(CommandHandler("status", handle_status))    # D24
    app.add_handler(CommandHandler("list", handle_list))        # D24
    app.add_handler(CommandHandler("create", handle_create))    # D24
    app.add_handler(CommandHandler("help", handle_help))        # D24
    app.add_handler(MessageHandler(filters.TEXT, handle_message))
    await app.run_polling()

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    # SECURITY: verify Caesar identity
    if update.effective_user.id != config.caesar.telegram_id:
        return  # silent ignore

    text = update.message.text

    # D21: Send "working..." immediately for all messages
    status_msg = await update.message.reply_text("⏳")

    # Start typing indicator BEFORE calling Legatus — it runs concurrently.
    # This matters because dispatch_to_centurio (called as MCP tool by Legatus LLM)
    # can block for 30+ seconds while a Centurio works. The typing indicator
    # keeps Caesar informed during the entire chain: Legatus → MCP tool → Centurio.
    stop_typing = asyncio.Event()
    typing_task = asyncio.create_task(
        _keep_typing(update.effective_chat.id, stop_typing)
    )

    try:
        responses = await legatus.handle_message(text, update.effective_user.id)

        # D22: One message per responder
        # Edit the status message with the first response
        first = responses[0] if responses else "No response."
        await status_msg.edit_text(first, parse_mode=PARSE_MODE)

        # Send remaining responses as new messages
        for response in responses[1:]:
            await update.message.reply_text(response, parse_mode=PARSE_MODE)

    except Exception:
        await status_msg.edit_text("❌ An error occurred.")
        # Log the exception (never swallow, never leak secrets)
    finally:
        stop_typing.set()
        await typing_task

6.2 Long-running task UX (D21)

Per D21: Edit-in-place pattern.

On receiving any message, immediately send a "⏳" status message.
While Centuriones work, send sendChatAction("typing") every 5 seconds to keep the typing indicator alive.
When the first response arrives, edit the status message with the result.
If multiple Centuriones respond (D22), subsequent responses are sent as new reply messages.

Why edit-in-place:

Prevents chat clutter — one evolving message instead of status + result.
Industry standard for Telegram bots (per research).
Clear visual feedback — Caesar sees the ⏳ immediately, then it transforms into the answer.

Typing indicator:

python

async def _keep_typing(self, chat_id: int, stop_event: asyncio.Event) -> None:
    """Send typing action every 5 seconds until stopped."""
    while not stop_event.is_set():
        await self._bot.send_chat_action(chat_id, "typing")
        try:
            await asyncio.wait_for(stop_event.wait(), timeout=5.0)
        except TimeoutError:
            pass

6.3 Multi-Centurio response format (D22)

Per D22: One message per Centurio, with an HTML header line identifying the responder.

html

<b>⚔️ Vorenus</b>

Here are the API rate limits I found:
- 100 requests per minute per client
- OAuth 2.0 with PKCE for authentication

Responses arrive sequentially (even though Centuriones work in parallel per D16). The first response edits the status message; subsequent responses are new messages.

6.4 Telegram message limits

Telegram allows 4096 UTF-8 characters per message. For longer responses:

python

def split_message(text: str, max_len: int = 4000) -> list[str]:
    """Split at paragraph boundaries. Leave 96-char buffer for HTML tags."""
    if len(text) <= max_len:
        return [text]
    chunks = []
    current = ""
    for para in text.split("\n\n"):
        if len(current) + len(para) + 2 > max_len:
            if current:
                chunks.append(current)
            current = para
        else:
            current = f"{current}\n\n{para}" if current else para
    if current:
        chunks.append(current)
    return chunks

6.5 Slash commands (D24)

Per D24: Hybrid — both /commands and @mentions.

Command	Action	Implementation
`/status`	Show all Centuriones and their status (D10)	Direct — no LLM call
`/list`	List available Centuriones with descriptions	Direct — scan filesystem
`/create <name> <description>`	Create a new Centurio	Delegates to Legatus LLM
`/help`	Show available commands and usage	Direct — static text

Why both (D24):

Commands are instant, deterministic, no API cost. Perfect for frequent meta-operations.
@mentions engage the LLM for actual work. Natural language for complex requests.
Power users expect both. Commands are discoverable via Telegram's autocomplete.
"prompt over code" applies to agent behavior, not UX shortcuts. Slash commands are UX, not agent logic.

Phase 7 — Entry Point

7.1 `legio/main.py`

python

async def main():
    config = load_config()

    # Logging — structured, no print() (T20)
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s %(levelname)s %(name)s: %(message)s",
        datefmt="%Y-%m-%dT%H:%M:%S",
    )

    # Ensure workspace directories exist (+ default Legatus prompt)
    ensure_workspace(config.workspace_dir)

    # Infrastructure
    memoria = MemoriaStore(config.workspace_dir)
    memoria_full = build_memoria_full_server(memoria)
    memoria_centurio = build_memoria_centurio_server(memoria)
    praetorium = Praetorium(config.workspace_dir / "praetorium.db")

    # Legatus (SDK agent + orchestrator)
    legatus = Legatus(config, praetorium, memoria, memoria_full, memoria_centurio)

    # Graceful shutdown on SIGTERM/SIGINT
    shutdown_event = asyncio.Event()

    def _signal_handler() -> None:
        logging.getLogger("legio").info("Shutdown signal received")
        shutdown_event.set()

    loop = asyncio.get_running_loop()
    for sig in (signal.SIGTERM, signal.SIGINT):
        loop.add_signal_handler(sig, _signal_handler)

    try:
        # Telegram bot (blocks until shutdown_event or external stop)
        await start_telegram_bot(config, legatus, shutdown_event)
    finally:
        # Tear down all SDK client subprocesses
        await legatus.shutdown()
        # Close SQLite connection
        await praetorium.close()
        logging.getLogger("legio").info("Legio shut down cleanly")

python

def ensure_workspace(workspace_dir: Path) -> None:
    """Create workspace directory structure if it doesn't exist.

    Copies default Legatus prompt from templates/ on first run.
    """
    (workspace_dir / "edicta").mkdir(parents=True, exist_ok=True)
    (workspace_dir / "acta").mkdir(parents=True, exist_ok=True)
    (workspace_dir / "centuriones").mkdir(parents=True, exist_ok=True)
    (workspace_dir / "legatus").mkdir(parents=True, exist_ok=True)

    # First-run: copy default Legatus prompt if missing
    legatus_prompt = workspace_dir / "legatus" / "prompt.md"
    if not legatus_prompt.exists():
        template = Path("templates/legatus/prompt.md.template")
        if template.exists():
            shutil.copy(template, legatus_prompt)

Dependency Graph

legio/__main__.py
    └── telegram/bot.py
            └── legatus.py  (SDK agent + orchestrator)
                    ├── ClaudeSDKClient (Legatus own session)
                    ├── ClaudeSDKClient per Centurio (persistent sessions)
                    ├── praetorium.py → nuntius.py
                    ├── memoria/store.py
                    ├── memoria/tools.py → MCP servers
                    ├── centurio.py
                    └── config.py
    errors.py (used everywhere)

Filesystem Layout (runtime)

workspace/
  praetorium.db          # SQLite — conversation history (survives restarts)
  legatus/
    prompt.md            # Legatus system prompt (Caesar-editable)
  edicta/                # Standing orders (XML files)
    security-policy.xml
  acta/                  # Shared knowledge (XML files)
    api-research.xml
  centuriones/
    vorenus/
      prompt.md          # Centurio definition (Caesar-editable)
      tools.json         # Allowed tools (Caesar-editable)
      commentarii/       # Private notes (XML files, append-only)
        task-notes-001.xml
    pullo/
      prompt.md
      tools.json
      commentarii/

templates/               # Committed to repo (not in workspace/)
  legatus/
    prompt.md.template   # Default Legatus prompt (copied on first run)
  centurio/
    prompt.md.template   # Scaffold for new Centuriones
    tools.json.template  # Default tool permissions

Dependencies (additions to `pyproject.toml`)

toml

dependencies = [
    "claude-agent-sdk",
    "python-telegram-bot",
    "tomli",
    "aiosqlite",         # D6 — async SQLite for Praetorium
]

What Is NOT in Scope (v3)

Optio / Miles (future agent tiers)
Multi-Legio (multiple processes)
Web UI (Telegram only)
Persistent SDK sessions across process restarts (Praetorium history persists, but ClaudeSDKClient sessions are rebuilt from disk on restart)
Inter-Centurio direct messaging (must go through Praetorium)
Automatic context summarization (D19 — sliding window + session reset is sufficient)
Hard enforcement of Memoria size limits (D13 — soft, prompt-level guidelines)
File attachments / media in Telegram (text only in v3)

Open Questions Summary

All questions resolved.

#	Question	Impact	Status
Q1	~~Audience glob patterns or explicit names?~~	~~Nuntius model~~	D7: Explicit names only. `["all"]` is the sole wildcard.
Q2	~~Nuntius UUID for threading?~~	~~Nuntius model~~	D8: Yes, UUID4 `id` field on every Nuntius.
Q3	~~Legatus-only messages from Caesar?~~	~~Visibility rules~~	D9: Yes. `audience=["legatus"]` excludes all Centuriones.
Q4	~~Centurio carries AgentDefinition or builds at runtime?~~	~~Centurio model~~	D3: Filesystem is truth. Read from disk each dispatch.
Q5	~~Centurio status field?~~	~~Centurio model~~	D10: Yes — idle/working/error. In-memory, not persisted.
Q6	~~Max concurrent Centuriones?~~	~~Resource management~~	D11: Default 10, configurable in `legio.toml`.
Q7	~~Memoria file format — plain md or YAML frontmatter?~~	~~MemoriaStore~~	D5: XML tags (Claude convention). Attributes for metadata, body for content.
Q8	~~Memoria append-only or mutable?~~	~~MemoriaStore~~	D12: Edicta/Acta mutable (overwrite). Commentarii append-only.
Q9	~~Size limits per file/Centurio?~~	~~MemoriaStore~~	D13: Soft limits only (50KB/file, 1MB/commentarii). Prompt-enforced.
~~Q10~~	~~Praetorium in-memory only or persisted?~~	~~Crash recovery~~	D6: SQLite at `workspace/praetorium.db`. Survives restarts.
~~Q11~~	~~History max length?~~	~~Token budget~~	D14: SQLite unbounded. Sliding window of last 50 nuntii for context.
~~Q12~~	~~Centurio response flow — direct or Legatus-mediated?~~	~~Architecture~~	D15: Legatus-mediated. All responses flow through Legatus.
~~Q13~~	~~Legatus as SDK agent or simple query()?~~	~~Legatus design~~	D1: SDK agent with persistent session.
~~Q14~~	~~Multi-mention — parallel or sequential?~~	~~Concurrency~~	D16: Parallel via `asyncio.gather`. Responses sent sequentially.
~~Q15~~	~~Legatus own tools?~~	~~Legatus capabilities~~	D4: Yes, via Memoria MCP + management tools.
~~Q16~~	~~Response delivery mechanism?~~	~~Telegram integration~~	D17: Return `str` from `dispatch_to_centurio()`. Legatus posts + relays.
~~Q17~~	~~Centurio persistent session or stateless?~~	~~SDK usage~~	D2: Persistent ClaudeSDKClient per Centurio.
~~Q18~~	~~Memoria as MCP tools for Centuriones?~~	~~Tool design~~	D4: Yes, MCP tools for all agents.
~~Q19~~	~~History format for Centurio context?~~	~~Prompt engineering~~	D18: XML `<praetorium>` block with `<nuntius>` tags. Consistent with D5.
~~Q20~~	~~Token budget management?~~	~~Scalability~~	D19: Sliding window + session reset at 75% capacity. No summarization in v3.
~~Q21~~	~~Template-based or LLM-generated Centurio creation?~~	~~Self-modification~~	D20: Both. Template scaffold + LLM customization.
~~Q22~~	~~Post-creation prompt editing?~~	~~Centurio lifecycle~~	D3: Yes, Caesar edits files directly.
~~Q23~~	~~Long-running task UX?~~	~~Telegram UX~~	D21: Edit-in-place. Send ⏳, edit with result. Typing indicator every 5s.
~~Q24~~	~~Multi-Centurio response format?~~	~~Telegram UX~~	D22: One message per Centurio with bold header. Sent sequentially.
~~Q25~~	~~Telegram message formatting?~~	~~Telegram UX~~	D23: HTML parse mode. `<b>`, `<code>`, `<pre>` for formatting.
~~Q26~~	~~Slash commands in addition to @mentions?~~	~~Telegram UX~~	D24: Hybrid. `/status`, `/list`, `/create`, `/help` + @mentions.
~~Q27~~	~~Legatus LLM-driven routing?~~	~~Architecture~~	D25: Yes. Auto-routing via LLM. Configured in Legatus prompt.
~~Q28~~	~~Centurio list discovery?~~	~~Architecture~~	D26: Dynamic injection into system prompt. MCP tool as backup.

Legio v3 — Implementation Plan ​

Goal ​

Decisions Log ​

Phase 1 — Data Models (no I/O, pure Python) ​

1.1 legio/nuntius.py — Message ​

1.2 legio/centurio.py — Agent identity (not behavior) ​

1.3 legio/config.py — Configuration ​

1.4 legio/errors.py — Exception hierarchy ​

Phase 2 — Memoria (filesystem + MCP tools) ​

2.1 legio/memoria/store.py — MemoriaStore (internal) ​

2.2 Memoria file format (D5) ​

2.3 legio/memoria/tools.py — MCP tool wrappers ​

Phase 3 — Praetorium (message bus, SQLite-backed) ​

3.1 legio/praetorium.py — Session & message routing ​

3.2 Response flow (D15, D17) ​

Phase 4 — Legatus (SDK agent + orchestrator) ​

4.1 legio/legatus.py ​

4.2 Centurio session management ​

Phase 5 — Centurio Runtime (SDK integration) ​

5.1 Running a Centurio ​

5.2 History format (D18) ​

5.3 Token budget management (D19) ​

5.4 Centurio creation (D20 — template + LLM) ​

Phase 6 — Telegram Integration ​

6.1 legio/telegram/bot.py ​

6.2 Long-running task UX (D21) ​

6.3 Multi-Centurio response format (D22) ​

6.4 Telegram message limits ​

6.5 Slash commands (D24) ​

Phase 7 — Entry Point ​

7.1 legio/__main__.py ​

Dependency Graph ​

Filesystem Layout (runtime) ​

Dependencies (additions to pyproject.toml) ​

What Is NOT in Scope (v3) ​

Open Questions Summary ​

Legio v3 — Implementation Plan

Goal

Decisions Log

Phase 1 — Data Models (no I/O, pure Python)

1.1 `legio/nuntius.py` — Message

1.2 `legio/centurio.py` — Agent identity (not behavior)

1.3 `legio/config.py` — Configuration

1.4 `legio/errors.py` — Exception hierarchy

Phase 2 — Memoria (filesystem + MCP tools)

2.1 `legio/memoria/store.py` — MemoriaStore (internal)

2.2 Memoria file format (D5)

2.3 `legio/memoria/tools.py` — MCP tool wrappers

Phase 3 — Praetorium (message bus, SQLite-backed)

3.1 `legio/praetorium.py` — Session & message routing

3.2 Response flow (D15, D17)

Phase 4 — Legatus (SDK agent + orchestrator)

4.1 `legio/legatus.py`

4.2 Centurio session management

Phase 5 — Centurio Runtime (SDK integration)

5.1 Running a Centurio

5.2 History format (D18)

5.3 Token budget management (D19)

5.4 Centurio creation (D20 — template + LLM)

Phase 6 — Telegram Integration

6.1 `legio/telegram/bot.py`

6.2 Long-running task UX (D21)

6.3 Multi-Centurio response format (D22)

6.4 Telegram message limits

6.5 Slash commands (D24)

Phase 7 — Entry Point

7.1 `legio/main.py`

Dependency Graph

Filesystem Layout (runtime)

Dependencies (additions to `pyproject.toml`)

What Is NOT in Scope (v3)

Open Questions Summary