Skip to content

Legio v3 — Implementation Plan

Status: DRAFT — all questions resolved, Codex review amendments applied (D27–D30), awaiting Caesar's final review

Goal

Build a single-process multi-agent system where Caesar commands a team of AI Centuriones through one Telegram chat. Built on Claude Agent SDK (Python).


Decisions Log

#DecisionDecided byDate
D1Legatus is an SDK agent (ClaudeSDKClient with its own prompt), not mechanical codeCaesar2026-02-14
D2Centuriones have persistent sessions (ClaudeSDKClient instances kept alive)Caesar2026-02-14
D3Centurio definition lives in prompt files on disk — Caesar can directly edit filesystemCaesar2026-02-14
D4Memoria exposed as MCP tools to all agentsCaesar2026-02-14
D5Memoria file format follows Claude conventions: XML tags for structure, plain text bodyCaesar2026-02-14
D6Praetorium persisted to SQLite (not in-memory only) for crash recoveryCaesar2026-02-14
D7Audience uses explicit names only — no glob patterns. ["all"] is the only wildcard.Antony2026-02-14
D8Every Nuntius carries a UUID (id field). Required for reply_to threading and SQLite PK.Antony2026-02-14
D9Yes, Caesar can send Legatus-only messages. audience=["legatus"] excludes all Centuriones.Antony2026-02-14
D10Centurio has a status field (idle / working / error) — tracked in-memory by Legatus, not persisted.Antony2026-02-14
D11Max 10 concurrent Centuriones (configurable in legio.toml). Soft limit enforced at creation time.Antony2026-02-14
D12Edicta and Acta are mutable (overwrite). Commentarii are append-only (immutable log).Antony2026-02-14
D13No hard size limits per file. Soft warning at 50KB per file, 1MB total per Centurio's commentarii.Antony2026-02-14
D14SQLite stores all history unbounded. Sliding window of last N nuntii provided as context to agents. Default N=50.Antony2026-02-14
D15Centurio responses are Legatus-mediated — returned to Legatus, which posts them to Praetorium and relays to Telegram.Antony2026-02-14
D16Multi-mention dispatches Centuriones in parallel via asyncio.gather. Responses collected, then sent sequentially.Antony2026-02-14
D17Centurio responses returned as str from dispatch_to_centurio(). Legatus posts to Praetorium and returns to Telegram handler.Antony2026-02-14
D18History formatted as XML context block with <nuntius> tags per message. Claude-native, consistent with D5.Antony2026-02-14
D19Token budget: sliding window (D14). When context exceeds 75% of model window, oldest nuntii dropped. No summarization in v3.Antony2026-02-14
D20Both: template in templates/centurio/ as scaffold, but Legatus LLM customizes the prompt. Caesar can also create manually (D3).Antony2026-02-14
D21Edit-in-place: send "⏳ working..." immediately, edit with result. Typing indicator every 5s for long tasks.Antony2026-02-14
D22One Telegram message per Centurio, with header line identifying the responder. Sent sequentially.Antony2026-02-14
D23HTML parse mode for Telegram messages. Simpler escaping than MarkdownV2, supports code blocks via <pre>.Antony2026-02-14
D24Hybrid: support both /commands and @mentions. Commands for quick actions, @mentions for conversations.Antony2026-02-14
D25Yes, Legatus LLM can auto-route without @mention — "prompt over code" applied to routing. Configurable in Legatus prompt.Antony2026-02-14
D26Centurio list injected dynamically into Legatus system prompt on each call. MCP tool as backup for detailed queries.Antony2026-02-14
D27Legatus uses persistent ClaudeSDKClient (same as Centuriones). System prompt includes full roster — rebuilt only when roster changes (centurio created/removed). Real-time status injected in user message. SDK conversation history preserved for continuity.Antony2026-02-14
D28@mention parsing: case-insensitive @(\w+) regex, matched against registered names only. Unrecognized @mentions pass through as text. Email/URL @ not matched.Antony2026-02-14
D29Idle session timeout: Centurio ClaudeSDKClient torn down after 30min inactivity (configurable). Periodic asyncio.Task reaps stale sessions. Recreated on next dispatch.Antony2026-02-14
D30Graceful shutdown: SIGTERM/SIGINT handler tears down all SDK clients, closes Praetorium DB, stops Telegram polling.Antony2026-02-14
D31SDK requires async generator input for client.query() when MCP servers are attached. All query calls use async def _prompt(): yield text wrapper.Antony2026-02-14
D32Per-Centurio MCP server: build_memoria_centurio_server(store, centurio_name) bakes identity into closure. No shared server with magic caller variable.Antony2026-02-14
D33Legatus client rebuild detects both roster changes AND legatus prompt.md edits (mtime check). Caesar can edit prompt without restarting.Antony2026-02-14

Phase 1 — Data Models (no I/O, pure Python)

The foundation. Every other layer imports these. Zero dependencies on infrastructure.

1.1 legio/nuntius.py — Message

Per D7, D8, D9: explicit audience names, UUID for threading, Legatus-only messages supported.

python
@dataclass(frozen=True)
class Nuntius:
    id: str                    # UUID4 — unique identifier (D8)
    sender: str                # "caesar", "legatus", or centurio name
    text: str
    audience: tuple[str, ...]  # ("all",), ("legatus",), ("vorenus", "pullo") (D7, D9)
    timestamp: datetime        # UTC, timezone-aware
    reply_to: str | None       # UUID of prior nuntius, or None (D8)

Design notes:

  • frozen=True because nuntii are immutable once posted.
  • audience is tuple[str, ...], not list[str] — a mutable list inside a frozen dataclass is a Python footgun (the list can still be mutated). Tuple enforces true immutability.
  • audience=("all",) is the only wildcard (D7). No glob patterns — explicit names keep visibility rules simple and auditable.
  • audience=("legatus",) lets Caesar have private conversations with the Legatus (D9). Example: "Legatus, what do you think of Vorenus's work?" — no Centurio sees this.
  • id is UUID4 string (D8). Required for reply_to threading and as SQLite primary key.

1.2 legio/centurio.py — Agent identity (not behavior)

Per D3, D10: minimal registry entry. Status tracked in-memory by Legatus.

python
@dataclass
class Centurio:
    name: str              # "vorenus", "pullo" — lowercase
    centuria: Path         # workspace/centuriones/vorenus/
    status: str = "idle"   # "idle" | "working" | "error" (D10)

The Centuria folder is the agent definition:

workspace/centuriones/vorenus/
  prompt.md              # System prompt — the full AgentDefinition
  tools.json             # Allowed tools list
  commentarii/           # Private working memory

Status field (D10): Tracked in-memory by the Legatus, not persisted to disk or SQLite. Reason: status is ephemeral runtime state — after a restart, all Centuriones begin as idle. The Legatus sets working before dispatch, idle on completion, error on failure. Status is exposed via /status command (D24) and the list_centuriones MCP tool.

Max concurrent Centuriones (D11): Default limit of 10, configurable in legio.toml as max_centuriones = 10. Enforced at creation time — create_centurio raises LegioError if limit reached. Rationale: each Centurio holds a persistent ClaudeSDKClient (D2) with its own context window. 10 is generous for a single-user system; the limit prevents accidental resource exhaustion.

1.3 legio/config.py — Configuration

python
@dataclass
class CaesarConfig:
    telegram_id: int

@dataclass
class LegioConfig:
    caesar: CaesarConfig
    model: str                           # "sonnet" — Claude model for all agents
    workspace_dir: Path                  # Path to workspace/
    max_centuriones: int = 10            # D11
    history_window: int = 50             # D14 — nuntii visible per dispatch
    session_idle_timeout_minutes: int = 30  # D27 — tear down idle SDK sessions
    telegram_bot_token: str = ""         # From .env — TELEGRAM_BOT_TOKEN
    anthropic_api_key: str = ""          # From .env — ANTHROPIC_API_KEY

Secrets (telegram_bot_token, anthropic_api_key) loaded from .env via os.environ, never from legio.toml. Config loading merges both sources.

Loaded from legio.toml + .env. The legio.toml gains:

toml
[caesar]
telegram_id = 0

model = "sonnet"
workspace_dir = "workspace"
max_centuriones = 10
history_window = 50
session_idle_timeout_minutes = 30

1.4 legio/errors.py — Exception hierarchy

python
class LegioError(Exception): ...
class ImperiumDenied(LegioError): ...       # Auth / permission failures
class CenturioNotFound(LegioError): ...     # Unknown centurio name
class CenturioLimitReached(LegioError): ... # D11 — max centuriones exceeded
class MemoriaError(LegioError): ...         # Filesystem I/O failures
class PraetoriumError(LegioError): ...      # SQLite / message bus failures

Phase 2 — Memoria (filesystem + MCP tools)

Per D4: Memoria is exposed as MCP tools so both Legatus and Centuriones access knowledge through prompting, not Python imports.

2.1 legio/memoria/store.py — MemoriaStore (internal)

The Python implementation. Reads/writes the three layers:

workspace/
  edicta/           # Legio-wide standing orders (XML files)
  acta/             # Shared knowledge (XML files)
  centuriones/
    vorenus/
      prompt.md
      tools.json
      commentarii/  # Private notes (XML files)

Operations:

  • list_edicta() -> list[str] — returns names (filename stems)
  • read_edictum(name: str) -> str — returns raw XML content
  • publish_edictum(name: str, content: str, author: str) -> None — overwrites (D12)
  • revoke_edictum(name: str) -> None — deletes the file
  • list_acta() -> list[str]
  • read_actum(name: str) -> str
  • publish_actum(name: str, content: str, author: str) -> None — overwrites (D12)
  • list_commentarii(centurio_name: str) -> list[str]
  • read_commentarium(centurio_name: str, name: str) -> str
  • write_commentarium(centurio_name: str, name: str, content: str) -> None — append-only (D12)

Mutability rules (D12):

  • Edicta — mutable. Caesar/Legatus can update standing orders as policy evolves. Overwrite semantics.
  • Acta — mutable. Shared knowledge gets corrected and refined. Overwrite semantics.
  • Commentarii — append-only. A Centurio's private journal is an immutable log. New entries create new files; existing files are never modified. This preserves audit trail and prevents accidental data loss.

Soft size warnings (D13): No hard limits enforced in code. The Legatus prompt instructs agents to keep files under 50KB each, and total commentarii under 1MB per Centurio. These are prompt-level guidelines, not code-level enforcement — consistent with "prompt over code" philosophy.

2.2 Memoria file format (D5)

Per D5: Follow Claude's own conventions. Claude works best with XML tags for structured content.

Edicta and Acta — XML-tagged plain text files:

xml
<edictum name="security-policy" author="caesar" timestamp="2026-02-14T12:00:00Z">
All Centuriones must validate input before processing.
Never expose API keys in responses.
Rate-limit external API calls to 10/minute.
</edictum>
xml
<actum name="api-research" author="vorenus" timestamp="2026-02-14T15:30:00Z">
Researched the payments API. Key findings:
- Authentication uses OAuth 2.0 with PKCE
- Rate limit: 100 req/min per client
- Sandbox endpoint: https://sandbox.api.example.com
</actum>

Commentarii — same XML pattern, per-Centurio, append-only (D12):

xml
<commentarium name="task-notes-001" timestamp="2026-02-14T16:00:00Z">
Caesar asked me to investigate the payments API.
I found the docs at example.com/docs.
Next step: test the sandbox endpoint.
</commentarium>

Why XML, not YAML frontmatter:

  • Claude's own documentation recommends XML tags for structured content
  • XML tags are Claude-native — the model parses them more accurately
  • No parsing library needed — just string formatting on write, Claude reads natively
  • Metadata (author, timestamp, name) lives in tag attributes, body is free-form text
  • Files on disk are human-readable and grep-able

File naming: <name>.xml in the respective directory. The name attribute in the tag matches the filename stem.

2.3 legio/memoria/tools.py — MCP tool wrappers

Wraps MemoriaStore operations as MCP tools using @tool decorator:

python
@tool("read_edictum", "Read a standing order by name", {"name": str})
async def read_edictum_tool(args):
    content = store.read_edictum(args["name"])
    return {"content": [{"type": "text", "text": content}]}

@tool("publish_actum", "Publish shared knowledge", {"name": str, "content": str})
async def publish_actum_tool(args):
    store.publish_actum(args["name"], args["content"], author=caller_name)
    return {"content": [{"type": "text", "text": f"Published: {args['name']}"}]}

# ... etc for all operations

Two MCP server patterns built from these tools:

  • memoria_full — all operations (for Legatus: can write edicta, revoke edicta). Single shared instance.
  • memoria_centurio — restricted (no edictum writes/revokes, commentarii scoped to caller). Built per-Centurio via build_memoria_centurio_server(store, centurio_name) — the centurio_name is captured in a closure so every write_commentarium / list_commentarii / read_commentarium call is automatically scoped to the owning Centurio. No magic caller_name variable — the identity is baked in at server creation time.
python
def build_memoria_centurio_server(store: MemoriaStore, centurio_name: str):
    """Build a per-Centurio MCP server with commentarii scoped to owner."""

    @tool("write_commentarium", "Write to your private journal", {"name": str, "content": str})
    async def write_commentarium_tool(args):
        store.write_commentarium(centurio_name, args["name"], args["content"])
        return {"content": [{"type": "text", "text": f"Written: {args['name']}"}]}

    # ... other scoped tools ...

    return create_sdk_mcp_server(
        name=f"memoria_{centurio_name}",
        tools=[write_commentarium_tool, ...],
    )

Phase 3 — Praetorium (message bus, SQLite-backed)

3.1 legio/praetorium.py — Session & message routing

Per D6: The Praetorium persists to SQLite for crash recovery. Conversations survive process restarts.

Database: workspace/praetorium.db — single SQLite file.

sql
CREATE TABLE nuntii (
    id TEXT PRIMARY KEY,          -- UUID (D8)
    sender TEXT NOT NULL,         -- "caesar", "legatus", "vorenus"
    text TEXT NOT NULL,
    audience TEXT NOT NULL,       -- JSON array: '["all"]', '["legatus"]', '["vorenus","pullo"]'
    timestamp TEXT NOT NULL,      -- ISO 8601 UTC
    reply_to TEXT,                -- FK to nuntii.id, nullable (D8)
    FOREIGN KEY (reply_to) REFERENCES nuntii(id)
);

CREATE INDEX idx_nuntii_timestamp ON nuntii(timestamp);
CREATE INDEX idx_nuntii_sender ON nuntii(sender);
python
class Praetorium:
    def __init__(self, db_path: Path): ...

    async def post(self, nuntius: Nuntius) -> None
    """Insert a nuntius into the database."""

    async def get_visible_nuntii(self, viewer: str, limit: int = 50) -> list[Nuntius]
    """Return nuntii visible to a given viewer, most recent first.

    Visibility rules:
    - "caesar" and "legatus" see everything.
    - A Centurio sees: (a) nuntii where its name appears in audience JSON,
      (b) nuntii with audience '["all"]'.
    """

    async def get_history(self, limit: int = 50) -> list[Nuntius]
    """Return the last N nuntii, regardless of audience. Admin view."""

History management (D14):

  • SQLite stores all nuntii unbounded. Disk is cheap; history is valuable.
  • When providing context to agents, a sliding window of the last N nuntii is used (default N=50, configurable via legio.toml).
  • The limit parameter on get_visible_nuntii controls this window.
  • No automatic summarization or compaction in v3. If history grows very large, Caesar can issue an edictum summarizing past context and reset (manual, prompt-level).

Audience serialization: Nuntius stores audience as tuple[str, ...] (frozen dataclass). SQLite stores it as a JSON array string ('["all"]'). Praetorium handles the conversion:

  • Write: json.dumps(list(nuntius.audience))
  • Read: tuple(json.loads(row["audience"]))

Dependency: aiosqlite for async SQLite access (add to pyproject.toml).

3.2 Response flow (D15, D17)

All Centurio responses are Legatus-mediated (D15):

Caesar → Telegram → Legatus.handle_message()

                    dispatch_to_centurio(name, nuntius)

                    Centurio processes, returns str (D17)

                    Legatus posts response to Praetorium

                    Legatus returns response to Telegram handler

                    Telegram handler sends to Caesar

Why Legatus-mediated, not direct:

  • The Legatus controls what gets posted — it can filter, annotate, or suppress.
  • Consistent flow: all messages pass through one code path.
  • The Legatus can add metadata (e.g., "[Vorenus responds]:" header) before posting.
  • Future: the Legatus could route a Centurio's response to another Centurio for review.

Phase 4 — Legatus (SDK agent + orchestrator)

4.1 legio/legatus.py

Per D1: The Legatus is itself a Claude SDK agent with a persistent session (D2). It is not a Centurio — it's the built-in orchestrator — but it thinks via LLM, not if/else chains.

Persistent client with roster-aware rebuild (D27 — SDK constraint resolved): The SDK locks system_prompt at ClaudeSDKClient creation. The Legatus needs the Centurio roster in its system prompt. Solution: the full roster (names + specializations) is baked into the system prompt at client creation. The client is rebuilt only when the roster changes (centurio created or removed) — not on every call. Real-time status (idle/working/error) is injected as an XML block in the user message, since status flickers frequently but doesn't warrant a client rebuild. This preserves SDK conversation history for continuity — the Legatus remembers recent exchanges with Caesar.

Two hats:

  1. Orchestrator (code) — message parsing, @mention extraction, session management, Centurio dispatch, Telegram I/O. This is plumbing that prompts can't do.
  2. Default responder (SDK agent) — when no Centurio is mentioned (and auto-routing doesn't apply), the Legatus's own ClaudeSDKClient handles the conversation. Its prompt defines personality, routing heuristics, and Centurio management commands.
python
class Legatus:
    def __init__(self, config, praetorium, memoria):
        self._config = config
        self._praetorium = praetorium
        self._memoria = memoria
        self._client: ClaudeSDKClient  # lazy init — needs async
        self._roster_hash: str = ""    # track roster changes for client rebuild
        self._centurio_sessions: dict[str, ClaudeSDKClient] = {}
        self._centuriones: dict[str, Centurio] = {}  # registry

    async def handle_message(self, text: str, telegram_user_id: int) -> list[str]
    """Main entry point. Returns list of response strings (one per responder).

    Flow:
    1. Parse @mentions from text (see @mention parsing rules below)
    2. Post Caesar's nuntius to Praetorium
    3. If roster or legatus prompt.md changed → rebuild Legatus client (new system prompt)
    4. If @mentions → dispatch to mentioned Centuriones (parallel, D16)
    5. If no @mentions → delegate to Legatus LLM (may auto-route per D25)
    6. Post all responses to Praetorium
    7. Return responses for Telegram delivery
    """

    async def dispatch_to_centurio(self, name: str, nuntius: Nuntius) -> str
    """Send a nuntius to a specific Centurio. Returns response text."""

    def list_centuriones(self) -> list[Centurio]
    """Scan workspace/centuriones/ directory. Returns registered Centuriones."""

    def _build_system_prompt(self) -> str
    """Read workspace/legatus/prompt.md + inject Centurio roster (D26)."""

    def _should_rebuild_client(self) -> bool
    """Rebuild Legatus client if roster changed OR legatus prompt.md edited.
    Checks: (a) hash of centurio names+specializations, (b) mtime of prompt.md."""

    async def shutdown(self) -> None
    """Tear down all SDK client subprocesses (Legatus + all Centuriones).
    Called by __main__.py on SIGTERM/SIGINT (D30)."""

Legatus prompt lives at workspace/legatus/prompt.md — Caesar can edit it too.

Dynamic Centurio roster injection (D26): The full roster (names + specializations) is embedded in the system prompt at client creation. When the roster changes (centurio created/removed) or workspace/legatus/prompt.md is edited on disk, _should_rebuild_client() detects it and the client is torn down and rebuilt. Real-time status is injected per-call as a lightweight block in the user message:

xml
<centurio_status>
  <centurio name="vorenus" status="idle"/>
  <centurio name="pullo" status="working"/>
</centurio_status>

The full roster in the system prompt provides specialization descriptions for routing decisions:

xml
<centuriones>
  <centurio name="vorenus" status="idle">Code specialist. Handles implementation tasks.</centurio>
  <centurio name="pullo" status="working">Research specialist. Currently working on API analysis.</centurio>
</centuriones>

The roster is populated by scanning workspace/centuriones/*/prompt.md (first line or summary) and the in-memory status field (D10). This gives the Legatus LLM full awareness of available agents, their specializations, and current state — enabling smart auto-routing (D25).

Why inject, not MCP tool (D26): The Centurio list is needed for every routing decision. Making the LLM call a tool before it can even think about routing adds latency and a wasted API round-trip. Inject it into the system prompt so it's always available. The list_centuriones MCP tool still exists for detailed queries (reading full prompts, checking tools.json).

LLM-driven routing (D25): When Caesar sends a message without @mentions, the Legatus LLM decides whether to:

  • Handle it itself (general conversation, meta-questions)
  • Route to a specific Centurio based on message content and Centurio specializations
  • Route to multiple Centuriones if the task spans specializations

This is configured in the Legatus prompt, not hard-coded. Caesar can adjust routing behavior by editing workspace/legatus/prompt.md — "prompt over code."

@mention parsing rules:

  • Format: @name — case-insensitive, matched against registered Centurio names.
  • Word-boundary match: @vorenus matches, but @vorenus.com does not. Implementation: extract @(\w+) tokens, then filter against self._centuriones registry. Unrecognized @mentions are ignored (passed through as text to the Legatus LLM).
  • Multiple mentions in one message: @Vorenus @Pullo do X → dispatches to both (D16).
  • @ in email addresses or URLs: not matched because \w+ stops at ., /, etc.

Legatus tools (via MCP):

  • Full Memoria access (read/write/revoke edicta, acta)
  • Centurio management: create_centurio, remove_centurio, list_centuriones
  • Praetorium: post_nuntius, get_history
  • dispatch_to_centurio — so the Legatus LLM can route messages to Centuriones (D25)

4.2 Centurio session management

Per D2: Each Centurio gets a persistent ClaudeSDKClient stored in self._centurio_sessions.

Lifecycle:

  1. First dispatch → create ClaudeSDKClient with prompt from prompt.md, tools from tools.json, Memoria MCP (restricted). Store in _centurio_sessions[name]. Set status = "working" (D10).
  2. Subsequent dispatches → reuse existing client (SDK maintains internal conversation history). Set status = "working". Update last_active timestamp.
  3. Prompt changed on disk → detect on next dispatch (compare file mtime or content hash against last-seen). If changed, tear down old client, create new one. Old conversation context is lost — this is intentional (new prompt = new agent).
  4. Remove Centurio → disconnect client, delete from _centurio_sessions, remove from _centuriones registry, optionally delete folder.
  5. Error → set status = "error", log the error, return error message to Legatus for relay.
  6. Completion → set status = "idle".
  7. Idle timeout → each ClaudeSDKClient is a subprocess. To avoid holding idle processes indefinitely, sessions inactive for 30 minutes are torn down. A periodic asyncio.Task checks last_active timestamps and disconnects stale clients. Next dispatch recreates the session (step 1). Timeout configurable in legio.toml as session_idle_timeout_minutes = 30.

Multi-mention parallelism (D16): When Caesar @mentions multiple Centuriones (e.g., @Vorenus @Pullo investigate this), all are dispatched in parallel via asyncio.gather. Responses are collected, then posted to Praetorium and relayed to Telegram sequentially (one message per responder, D22).

python
async def _dispatch_parallel(self, names: list[str], nuntius: Nuntius) -> list[str]:
    """Dispatch to multiple Centuriones in parallel. Returns list of responses."""
    tasks = [self.dispatch_to_centurio(name, nuntius) for name in names]
    results = await asyncio.gather(*tasks, return_exceptions=True)
    responses = []
    for name, result in zip(names, results):
        if isinstance(result, Exception):
            responses.append(f"[{name}] Error: {result}")
        else:
            responses.append(result)
    return responses

Phase 5 — Centurio Runtime (SDK integration)

5.1 Running a Centurio

Each Centurio has a persistent ClaudeSDKClient (D2). On dispatch:

  1. Legatus checks if session exists and prompt is unchanged. Creates/recreates if needed.
  2. Legatus builds visible history via praetorium.get_visible_nuntii(name, limit=config.history_window).
  3. Legatus formats history as XML context block (D18) and prepends to the new message.
  4. Legatus sends to the Centurio's persistent client:
python
client = self._centurio_sessions[name]
context = self._format_history(visible_nuntii)
full_message = f"{context}\n\n{nuntius.text}"

# SDK requires async generator input when MCP servers are attached.
# Plain string query() won't trigger MCP tool handling.
async def _prompt():
    yield full_message

await client.query(_prompt())
# collect response text via client.receive_response()

5.2 History format (D18)

Visible history is formatted as an XML context block — consistent with Memoria format (D5):

xml
<praetorium recent="true" viewer="vorenus">
  <nuntius id="abc-123" sender="caesar" timestamp="2026-02-14T10:00:00Z">
    @Vorenus investigate the payments API
  </nuntius>
  <nuntius id="def-456" sender="vorenus" timestamp="2026-02-14T10:05:00Z">
    I'll look into it. Starting with the OAuth docs.
  </nuntius>
  <nuntius id="ghi-789" sender="caesar" timestamp="2026-02-14T10:30:00Z">
    @Vorenus also check rate limits
  </nuntius>
</praetorium>

Why XML, not raw messages (D18):

  • Consistent with Memoria XML format (D5) — one convention everywhere
  • Claude parses XML tags natively and accurately
  • Metadata (sender, timestamp, id) in attributes keeps body clean
  • Structured format lets Claude distinguish messages from instructions

5.3 Token budget management (D19)

The Claude Agent SDK does not provide automatic context truncation. We manage it ourselves:

Strategy: sliding window with monitoring.

  1. Default window is last 50 nuntii (D14, configurable). This is the context provided per dispatch.
  2. Token tracking: after each client.query(), the SDK returns usage data including input_tokens and output_tokens. Track cumulative tokens per session.
  3. Overflow threshold: when cumulative input tokens exceed 75% of the model's context window (~150K for Sonnet), trigger a session reset: tear down the ClaudeSDKClient, create a new one. The new session starts fresh with just the sliding window of recent nuntii — no old conversation context.
  4. No summarization in v3: automatic summarization adds complexity and latency. The sliding window + session reset is sufficient for a single-user system. Caesar can always issue an edictum summarizing important context if needed.
python
class SessionTokenTracker:
    """Track cumulative token usage per Centurio session."""

    def __init__(self, max_input_tokens: int = 150_000):
        self.max_input_tokens = max_input_tokens
        self.cumulative_input: int = 0

    def update(self, result: ResultMessage) -> None:
        """Extract token count from SDK ResultMessage. Usage may be None."""
        if result.usage:
            self.cumulative_input += result.usage.get("input_tokens", 0)

    def should_reset(self) -> bool:
        return self.cumulative_input > self.max_input_tokens

What the Centurio sees as MCP tools:

  • mcp__memoria__list_acta / read_actum / publish_actum
  • mcp__memoria__list_commentarii / read_commentarium / write_commentarium (scoped to self)
  • mcp__memoria__list_edicta / read_edictum (read-only)

What the Centurio does NOT see:

  • Other Centuriones' commentarii
  • Edictum write access
  • Centurio management tools
  • dispatch_to_centurio (only Legatus can route)

5.4 Centurio creation (D20 — template + LLM)

Per D20: Both template and LLM.

Template at templates/centurio/:

templates/centurio/
  prompt.md.template     # Scaffold with placeholders
  tools.json.template    # Default tool permissions

prompt.md.template:

markdown
# {{name}}

You are {{name}}, a Centurio in the Legio system.

## Specialization

{{specialization}}

## Guidelines

- Address the human as Caesar.
- Write your findings to commentarii for persistence.
- Read edicta before starting any task — they contain standing orders.
- Publish important discoveries to acta for other Centuriones.

## Tools

You have access to the Memoria system via MCP tools:
- `list_edicta` / `read_edictum` — standing orders (read-only)
- `list_acta` / `read_actum` / `publish_actum` — shared knowledge
- `list_commentarii` / `read_commentarium` / `write_commentarium` — your private notes

tools.json.template:

json
{
  "allowed_tools": ["memoria_centurio"]
}

Creation flow:

  1. Caesar says: "Create a Centurio named Cicero who specializes in writing."
  2. Legatus LLM receives the request, calls create_centurio tool with name="cicero", specialization="writing".
  3. Tool implementation copies template, substitutes and , writes to workspace/centuriones/cicero/.
  4. Legatus LLM may further customize the prompt by calling a follow-up tool or editing the generated file.
  5. Legatus LLM responds to Caesar confirming creation.

Caesar can also just mkdir workspace/centuriones/cicero/ and write the files directly (D3). The Legatus discovers new Centuriones by scanning the centuriones/ directory.


Phase 6 — Telegram Integration

6.1 legio/telegram/bot.py

Single Telegram bot. One chat with Caesar.

Message formatting (D23): HTML parse mode.

python
import html

PARSE_MODE = "HTML"  # D23 — simpler escaping than MarkdownV2

async def start(config: LegioConfig, legatus: Legatus) -> None:
    app = ApplicationBuilder().token(bot_token).build()
    app.add_handler(CommandHandler("status", handle_status))    # D24
    app.add_handler(CommandHandler("list", handle_list))        # D24
    app.add_handler(CommandHandler("create", handle_create))    # D24
    app.add_handler(CommandHandler("help", handle_help))        # D24
    app.add_handler(MessageHandler(filters.TEXT, handle_message))
    await app.run_polling()

async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
    # SECURITY: verify Caesar identity
    if update.effective_user.id != config.caesar.telegram_id:
        return  # silent ignore

    text = update.message.text

    # D21: Send "working..." immediately for all messages
    status_msg = await update.message.reply_text("⏳")

    # Start typing indicator BEFORE calling Legatus — it runs concurrently.
    # This matters because dispatch_to_centurio (called as MCP tool by Legatus LLM)
    # can block for 30+ seconds while a Centurio works. The typing indicator
    # keeps Caesar informed during the entire chain: Legatus → MCP tool → Centurio.
    stop_typing = asyncio.Event()
    typing_task = asyncio.create_task(
        _keep_typing(update.effective_chat.id, stop_typing)
    )

    try:
        responses = await legatus.handle_message(text, update.effective_user.id)

        # D22: One message per responder
        # Edit the status message with the first response
        first = responses[0] if responses else "No response."
        await status_msg.edit_text(first, parse_mode=PARSE_MODE)

        # Send remaining responses as new messages
        for response in responses[1:]:
            await update.message.reply_text(response, parse_mode=PARSE_MODE)

    except Exception:
        await status_msg.edit_text("❌ An error occurred.")
        # Log the exception (never swallow, never leak secrets)
    finally:
        stop_typing.set()
        await typing_task

6.2 Long-running task UX (D21)

Per D21: Edit-in-place pattern.

  1. On receiving any message, immediately send a "⏳" status message.
  2. While Centuriones work, send sendChatAction("typing") every 5 seconds to keep the typing indicator alive.
  3. When the first response arrives, edit the status message with the result.
  4. If multiple Centuriones respond (D22), subsequent responses are sent as new reply messages.

Why edit-in-place:

  • Prevents chat clutter — one evolving message instead of status + result.
  • Industry standard for Telegram bots (per research).
  • Clear visual feedback — Caesar sees the ⏳ immediately, then it transforms into the answer.

Typing indicator:

python
async def _keep_typing(self, chat_id: int, stop_event: asyncio.Event) -> None:
    """Send typing action every 5 seconds until stopped."""
    while not stop_event.is_set():
        await self._bot.send_chat_action(chat_id, "typing")
        try:
            await asyncio.wait_for(stop_event.wait(), timeout=5.0)
        except TimeoutError:
            pass

6.3 Multi-Centurio response format (D22)

Per D22: One message per Centurio, with an HTML header line identifying the responder.

html
<b>⚔️ Vorenus</b>

Here are the API rate limits I found:
- 100 requests per minute per client
- OAuth 2.0 with PKCE for authentication

Responses arrive sequentially (even though Centuriones work in parallel per D16). The first response edits the status message; subsequent responses are new messages.

6.4 Telegram message limits

Telegram allows 4096 UTF-8 characters per message. For longer responses:

python
def split_message(text: str, max_len: int = 4000) -> list[str]:
    """Split at paragraph boundaries. Leave 96-char buffer for HTML tags."""
    if len(text) <= max_len:
        return [text]
    chunks = []
    current = ""
    for para in text.split("\n\n"):
        if len(current) + len(para) + 2 > max_len:
            if current:
                chunks.append(current)
            current = para
        else:
            current = f"{current}\n\n{para}" if current else para
    if current:
        chunks.append(current)
    return chunks

6.5 Slash commands (D24)

Per D24: Hybrid — both /commands and @mentions.

CommandActionImplementation
/statusShow all Centuriones and their status (D10)Direct — no LLM call
/listList available Centuriones with descriptionsDirect — scan filesystem
/create <name> <description>Create a new CenturioDelegates to Legatus LLM
/helpShow available commands and usageDirect — static text

Why both (D24):

  • Commands are instant, deterministic, no API cost. Perfect for frequent meta-operations.
  • @mentions engage the LLM for actual work. Natural language for complex requests.
  • Power users expect both. Commands are discoverable via Telegram's autocomplete.
  • "prompt over code" applies to agent behavior, not UX shortcuts. Slash commands are UX, not agent logic.

Phase 7 — Entry Point

7.1 legio/__main__.py

python
async def main():
    config = load_config()

    # Logging — structured, no print() (T20)
    logging.basicConfig(
        level=logging.INFO,
        format="%(asctime)s %(levelname)s %(name)s: %(message)s",
        datefmt="%Y-%m-%dT%H:%M:%S",
    )

    # Ensure workspace directories exist (+ default Legatus prompt)
    ensure_workspace(config.workspace_dir)

    # Infrastructure
    memoria = MemoriaStore(config.workspace_dir)
    memoria_full = build_memoria_full_server(memoria)
    memoria_centurio = build_memoria_centurio_server(memoria)
    praetorium = Praetorium(config.workspace_dir / "praetorium.db")

    # Legatus (SDK agent + orchestrator)
    legatus = Legatus(config, praetorium, memoria, memoria_full, memoria_centurio)

    # Graceful shutdown on SIGTERM/SIGINT
    shutdown_event = asyncio.Event()

    def _signal_handler() -> None:
        logging.getLogger("legio").info("Shutdown signal received")
        shutdown_event.set()

    loop = asyncio.get_running_loop()
    for sig in (signal.SIGTERM, signal.SIGINT):
        loop.add_signal_handler(sig, _signal_handler)

    try:
        # Telegram bot (blocks until shutdown_event or external stop)
        await start_telegram_bot(config, legatus, shutdown_event)
    finally:
        # Tear down all SDK client subprocesses
        await legatus.shutdown()
        # Close SQLite connection
        await praetorium.close()
        logging.getLogger("legio").info("Legio shut down cleanly")
python
def ensure_workspace(workspace_dir: Path) -> None:
    """Create workspace directory structure if it doesn't exist.

    Copies default Legatus prompt from templates/ on first run.
    """
    (workspace_dir / "edicta").mkdir(parents=True, exist_ok=True)
    (workspace_dir / "acta").mkdir(parents=True, exist_ok=True)
    (workspace_dir / "centuriones").mkdir(parents=True, exist_ok=True)
    (workspace_dir / "legatus").mkdir(parents=True, exist_ok=True)

    # First-run: copy default Legatus prompt if missing
    legatus_prompt = workspace_dir / "legatus" / "prompt.md"
    if not legatus_prompt.exists():
        template = Path("templates/legatus/prompt.md.template")
        if template.exists():
            shutil.copy(template, legatus_prompt)

Dependency Graph

legio/__main__.py
    └── telegram/bot.py
            └── legatus.py  (SDK agent + orchestrator)
                    ├── ClaudeSDKClient (Legatus own session)
                    ├── ClaudeSDKClient per Centurio (persistent sessions)
                    ├── praetorium.py → nuntius.py
                    ├── memoria/store.py
                    ├── memoria/tools.py → MCP servers
                    ├── centurio.py
                    └── config.py
    errors.py (used everywhere)

Filesystem Layout (runtime)

workspace/
  praetorium.db          # SQLite — conversation history (survives restarts)
  legatus/
    prompt.md            # Legatus system prompt (Caesar-editable)
  edicta/                # Standing orders (XML files)
    security-policy.xml
  acta/                  # Shared knowledge (XML files)
    api-research.xml
  centuriones/
    vorenus/
      prompt.md          # Centurio definition (Caesar-editable)
      tools.json         # Allowed tools (Caesar-editable)
      commentarii/       # Private notes (XML files, append-only)
        task-notes-001.xml
    pullo/
      prompt.md
      tools.json
      commentarii/

templates/               # Committed to repo (not in workspace/)
  legatus/
    prompt.md.template   # Default Legatus prompt (copied on first run)
  centurio/
    prompt.md.template   # Scaffold for new Centuriones
    tools.json.template  # Default tool permissions

Dependencies (additions to pyproject.toml)

toml
dependencies = [
    "claude-agent-sdk",
    "python-telegram-bot",
    "tomli",
    "aiosqlite",         # D6 — async SQLite for Praetorium
]

What Is NOT in Scope (v3)

  • Optio / Miles (future agent tiers)
  • Multi-Legio (multiple processes)
  • Web UI (Telegram only)
  • Persistent SDK sessions across process restarts (Praetorium history persists, but ClaudeSDKClient sessions are rebuilt from disk on restart)
  • Inter-Centurio direct messaging (must go through Praetorium)
  • Automatic context summarization (D19 — sliding window + session reset is sufficient)
  • Hard enforcement of Memoria size limits (D13 — soft, prompt-level guidelines)
  • File attachments / media in Telegram (text only in v3)

Open Questions Summary

All questions resolved.

#QuestionImpactStatus
Q1Audience glob patterns or explicit names?Nuntius modelD7: Explicit names only. ["all"] is the sole wildcard.
Q2Nuntius UUID for threading?Nuntius modelD8: Yes, UUID4 id field on every Nuntius.
Q3Legatus-only messages from Caesar?Visibility rulesD9: Yes. audience=["legatus"] excludes all Centuriones.
Q4Centurio carries AgentDefinition or builds at runtime?Centurio modelD3: Filesystem is truth. Read from disk each dispatch.
Q5Centurio status field?Centurio modelD10: Yes — idle/working/error. In-memory, not persisted.
Q6Max concurrent Centuriones?Resource managementD11: Default 10, configurable in legio.toml.
Q7Memoria file format — plain md or YAML frontmatter?MemoriaStoreD5: XML tags (Claude convention). Attributes for metadata, body for content.
Q8Memoria append-only or mutable?MemoriaStoreD12: Edicta/Acta mutable (overwrite). Commentarii append-only.
Q9Size limits per file/Centurio?MemoriaStoreD13: Soft limits only (50KB/file, 1MB/commentarii). Prompt-enforced.
Q10Praetorium in-memory only or persisted?Crash recoveryD6: SQLite at workspace/praetorium.db. Survives restarts.
Q11History max length?Token budgetD14: SQLite unbounded. Sliding window of last 50 nuntii for context.
Q12Centurio response flow — direct or Legatus-mediated?ArchitectureD15: Legatus-mediated. All responses flow through Legatus.
Q13Legatus as SDK agent or simple query()?Legatus designD1: SDK agent with persistent session.
Q14Multi-mention — parallel or sequential?ConcurrencyD16: Parallel via asyncio.gather. Responses sent sequentially.
Q15Legatus own tools?Legatus capabilitiesD4: Yes, via Memoria MCP + management tools.
Q16Response delivery mechanism?Telegram integrationD17: Return str from dispatch_to_centurio(). Legatus posts + relays.
Q17Centurio persistent session or stateless?SDK usageD2: Persistent ClaudeSDKClient per Centurio.
Q18Memoria as MCP tools for Centuriones?Tool designD4: Yes, MCP tools for all agents.
Q19History format for Centurio context?Prompt engineeringD18: XML <praetorium> block with <nuntius> tags. Consistent with D5.
Q20Token budget management?ScalabilityD19: Sliding window + session reset at 75% capacity. No summarization in v3.
Q21Template-based or LLM-generated Centurio creation?Self-modificationD20: Both. Template scaffold + LLM customization.
Q22Post-creation prompt editing?Centurio lifecycleD3: Yes, Caesar edits files directly.
Q23Long-running task UX?Telegram UXD21: Edit-in-place. Send ⏳, edit with result. Typing indicator every 5s.
Q24Multi-Centurio response format?Telegram UXD22: One message per Centurio with bold header. Sent sequentially.
Q25Telegram message formatting?Telegram UXD23: HTML parse mode. <b>, <code>, <pre> for formatting.
Q26Slash commands in addition to @mentions?Telegram UXD24: Hybrid. /status, /list, /create, /help + @mentions.
Q27Legatus LLM-driven routing?ArchitectureD25: Yes. Auto-routing via LLM. Configured in Legatus prompt.
Q28Centurio list discovery?ArchitectureD26: Dynamic injection into system prompt. MCP tool as backup.

Built with Roman discipline.