Legio v3 — Implementation Plan
Status: DRAFT — all questions resolved, Codex review amendments applied (D27–D30), awaiting Caesar's final review
Goal
Build a single-process multi-agent system where Caesar commands a team of AI Centuriones through one Telegram chat. Built on Claude Agent SDK (Python).
Decisions Log
| # | Decision | Decided by | Date |
|---|---|---|---|
| D1 | Legatus is an SDK agent (ClaudeSDKClient with its own prompt), not mechanical code | Caesar | 2026-02-14 |
| D2 | Centuriones have persistent sessions (ClaudeSDKClient instances kept alive) | Caesar | 2026-02-14 |
| D3 | Centurio definition lives in prompt files on disk — Caesar can directly edit filesystem | Caesar | 2026-02-14 |
| D4 | Memoria exposed as MCP tools to all agents | Caesar | 2026-02-14 |
| D5 | Memoria file format follows Claude conventions: XML tags for structure, plain text body | Caesar | 2026-02-14 |
| D6 | Praetorium persisted to SQLite (not in-memory only) for crash recovery | Caesar | 2026-02-14 |
| D7 | Audience uses explicit names only — no glob patterns. ["all"] is the only wildcard. | Antony | 2026-02-14 |
| D8 | Every Nuntius carries a UUID (id field). Required for reply_to threading and SQLite PK. | Antony | 2026-02-14 |
| D9 | Yes, Caesar can send Legatus-only messages. audience=["legatus"] excludes all Centuriones. | Antony | 2026-02-14 |
| D10 | Centurio has a status field (idle / working / error) — tracked in-memory by Legatus, not persisted. | Antony | 2026-02-14 |
| D11 | Max 10 concurrent Centuriones (configurable in legio.toml). Soft limit enforced at creation time. | Antony | 2026-02-14 |
| D12 | Edicta and Acta are mutable (overwrite). Commentarii are append-only (immutable log). | Antony | 2026-02-14 |
| D13 | No hard size limits per file. Soft warning at 50KB per file, 1MB total per Centurio's commentarii. | Antony | 2026-02-14 |
| D14 | SQLite stores all history unbounded. Sliding window of last N nuntii provided as context to agents. Default N=50. | Antony | 2026-02-14 |
| D15 | Centurio responses are Legatus-mediated — returned to Legatus, which posts them to Praetorium and relays to Telegram. | Antony | 2026-02-14 |
| D16 | Multi-mention dispatches Centuriones in parallel via asyncio.gather. Responses collected, then sent sequentially. | Antony | 2026-02-14 |
| D17 | Centurio responses returned as str from dispatch_to_centurio(). Legatus posts to Praetorium and returns to Telegram handler. | Antony | 2026-02-14 |
| D18 | History formatted as XML context block with <nuntius> tags per message. Claude-native, consistent with D5. | Antony | 2026-02-14 |
| D19 | Token budget: sliding window (D14). When context exceeds 75% of model window, oldest nuntii dropped. No summarization in v3. | Antony | 2026-02-14 |
| D20 | Both: template in templates/centurio/ as scaffold, but Legatus LLM customizes the prompt. Caesar can also create manually (D3). | Antony | 2026-02-14 |
| D21 | Edit-in-place: send "⏳ working..." immediately, edit with result. Typing indicator every 5s for long tasks. | Antony | 2026-02-14 |
| D22 | One Telegram message per Centurio, with header line identifying the responder. Sent sequentially. | Antony | 2026-02-14 |
| D23 | HTML parse mode for Telegram messages. Simpler escaping than MarkdownV2, supports code blocks via <pre>. | Antony | 2026-02-14 |
| D24 | Hybrid: support both /commands and @mentions. Commands for quick actions, @mentions for conversations. | Antony | 2026-02-14 |
| D25 | Yes, Legatus LLM can auto-route without @mention — "prompt over code" applied to routing. Configurable in Legatus prompt. | Antony | 2026-02-14 |
| D26 | Centurio list injected dynamically into Legatus system prompt on each call. MCP tool as backup for detailed queries. | Antony | 2026-02-14 |
| D27 | Legatus uses persistent ClaudeSDKClient (same as Centuriones). System prompt includes full roster — rebuilt only when roster changes (centurio created/removed). Real-time status injected in user message. SDK conversation history preserved for continuity. | Antony | 2026-02-14 |
| D28 | @mention parsing: case-insensitive @(\w+) regex, matched against registered names only. Unrecognized @mentions pass through as text. Email/URL @ not matched. | Antony | 2026-02-14 |
| D29 | Idle session timeout: Centurio ClaudeSDKClient torn down after 30min inactivity (configurable). Periodic asyncio.Task reaps stale sessions. Recreated on next dispatch. | Antony | 2026-02-14 |
| D30 | Graceful shutdown: SIGTERM/SIGINT handler tears down all SDK clients, closes Praetorium DB, stops Telegram polling. | Antony | 2026-02-14 |
| D31 | SDK requires async generator input for client.query() when MCP servers are attached. All query calls use async def _prompt(): yield text wrapper. | Antony | 2026-02-14 |
| D32 | Per-Centurio MCP server: build_memoria_centurio_server(store, centurio_name) bakes identity into closure. No shared server with magic caller variable. | Antony | 2026-02-14 |
| D33 | Legatus client rebuild detects both roster changes AND legatus prompt.md edits (mtime check). Caesar can edit prompt without restarting. | Antony | 2026-02-14 |
Phase 1 — Data Models (no I/O, pure Python)
The foundation. Every other layer imports these. Zero dependencies on infrastructure.
1.1 legio/nuntius.py — Message
Per D7, D8, D9: explicit audience names, UUID for threading, Legatus-only messages supported.
@dataclass(frozen=True)
class Nuntius:
id: str # UUID4 — unique identifier (D8)
sender: str # "caesar", "legatus", or centurio name
text: str
audience: tuple[str, ...] # ("all",), ("legatus",), ("vorenus", "pullo") (D7, D9)
timestamp: datetime # UTC, timezone-aware
reply_to: str | None # UUID of prior nuntius, or None (D8)Design notes:
frozen=Truebecause nuntii are immutable once posted.audienceistuple[str, ...], notlist[str]— a mutable list inside a frozen dataclass is a Python footgun (the list can still be mutated). Tuple enforces true immutability.audience=("all",)is the only wildcard (D7). No glob patterns — explicit names keep visibility rules simple and auditable.audience=("legatus",)lets Caesar have private conversations with the Legatus (D9). Example: "Legatus, what do you think of Vorenus's work?" — no Centurio sees this.idis UUID4 string (D8). Required forreply_tothreading and as SQLite primary key.
1.2 legio/centurio.py — Agent identity (not behavior)
Per D3, D10: minimal registry entry. Status tracked in-memory by Legatus.
@dataclass
class Centurio:
name: str # "vorenus", "pullo" — lowercase
centuria: Path # workspace/centuriones/vorenus/
status: str = "idle" # "idle" | "working" | "error" (D10)The Centuria folder is the agent definition:
workspace/centuriones/vorenus/
prompt.md # System prompt — the full AgentDefinition
tools.json # Allowed tools list
commentarii/ # Private working memoryStatus field (D10): Tracked in-memory by the Legatus, not persisted to disk or SQLite. Reason: status is ephemeral runtime state — after a restart, all Centuriones begin as idle. The Legatus sets working before dispatch, idle on completion, error on failure. Status is exposed via /status command (D24) and the list_centuriones MCP tool.
Max concurrent Centuriones (D11): Default limit of 10, configurable in legio.toml as max_centuriones = 10. Enforced at creation time — create_centurio raises LegioError if limit reached. Rationale: each Centurio holds a persistent ClaudeSDKClient (D2) with its own context window. 10 is generous for a single-user system; the limit prevents accidental resource exhaustion.
1.3 legio/config.py — Configuration
@dataclass
class CaesarConfig:
telegram_id: int
@dataclass
class LegioConfig:
caesar: CaesarConfig
model: str # "sonnet" — Claude model for all agents
workspace_dir: Path # Path to workspace/
max_centuriones: int = 10 # D11
history_window: int = 50 # D14 — nuntii visible per dispatch
session_idle_timeout_minutes: int = 30 # D27 — tear down idle SDK sessions
telegram_bot_token: str = "" # From .env — TELEGRAM_BOT_TOKEN
anthropic_api_key: str = "" # From .env — ANTHROPIC_API_KEYSecrets (telegram_bot_token, anthropic_api_key) loaded from .env via os.environ, never from legio.toml. Config loading merges both sources.
Loaded from legio.toml + .env. The legio.toml gains:
[caesar]
telegram_id = 0
model = "sonnet"
workspace_dir = "workspace"
max_centuriones = 10
history_window = 50
session_idle_timeout_minutes = 301.4 legio/errors.py — Exception hierarchy
class LegioError(Exception): ...
class ImperiumDenied(LegioError): ... # Auth / permission failures
class CenturioNotFound(LegioError): ... # Unknown centurio name
class CenturioLimitReached(LegioError): ... # D11 — max centuriones exceeded
class MemoriaError(LegioError): ... # Filesystem I/O failures
class PraetoriumError(LegioError): ... # SQLite / message bus failuresPhase 2 — Memoria (filesystem + MCP tools)
Per D4: Memoria is exposed as MCP tools so both Legatus and Centuriones access knowledge through prompting, not Python imports.
2.1 legio/memoria/store.py — MemoriaStore (internal)
The Python implementation. Reads/writes the three layers:
workspace/
edicta/ # Legio-wide standing orders (XML files)
acta/ # Shared knowledge (XML files)
centuriones/
vorenus/
prompt.md
tools.json
commentarii/ # Private notes (XML files)Operations:
list_edicta() -> list[str]— returns names (filename stems)read_edictum(name: str) -> str— returns raw XML contentpublish_edictum(name: str, content: str, author: str) -> None— overwrites (D12)revoke_edictum(name: str) -> None— deletes the filelist_acta() -> list[str]read_actum(name: str) -> strpublish_actum(name: str, content: str, author: str) -> None— overwrites (D12)list_commentarii(centurio_name: str) -> list[str]read_commentarium(centurio_name: str, name: str) -> strwrite_commentarium(centurio_name: str, name: str, content: str) -> None— append-only (D12)
Mutability rules (D12):
- Edicta — mutable. Caesar/Legatus can update standing orders as policy evolves. Overwrite semantics.
- Acta — mutable. Shared knowledge gets corrected and refined. Overwrite semantics.
- Commentarii — append-only. A Centurio's private journal is an immutable log. New entries create new files; existing files are never modified. This preserves audit trail and prevents accidental data loss.
Soft size warnings (D13): No hard limits enforced in code. The Legatus prompt instructs agents to keep files under 50KB each, and total commentarii under 1MB per Centurio. These are prompt-level guidelines, not code-level enforcement — consistent with "prompt over code" philosophy.
2.2 Memoria file format (D5)
Per D5: Follow Claude's own conventions. Claude works best with XML tags for structured content.
Edicta and Acta — XML-tagged plain text files:
<edictum name="security-policy" author="caesar" timestamp="2026-02-14T12:00:00Z">
All Centuriones must validate input before processing.
Never expose API keys in responses.
Rate-limit external API calls to 10/minute.
</edictum><actum name="api-research" author="vorenus" timestamp="2026-02-14T15:30:00Z">
Researched the payments API. Key findings:
- Authentication uses OAuth 2.0 with PKCE
- Rate limit: 100 req/min per client
- Sandbox endpoint: https://sandbox.api.example.com
</actum>Commentarii — same XML pattern, per-Centurio, append-only (D12):
<commentarium name="task-notes-001" timestamp="2026-02-14T16:00:00Z">
Caesar asked me to investigate the payments API.
I found the docs at example.com/docs.
Next step: test the sandbox endpoint.
</commentarium>Why XML, not YAML frontmatter:
- Claude's own documentation recommends XML tags for structured content
- XML tags are Claude-native — the model parses them more accurately
- No parsing library needed — just string formatting on write, Claude reads natively
- Metadata (author, timestamp, name) lives in tag attributes, body is free-form text
- Files on disk are human-readable and
grep-able
File naming: <name>.xml in the respective directory. The name attribute in the tag matches the filename stem.
2.3 legio/memoria/tools.py — MCP tool wrappers
Wraps MemoriaStore operations as MCP tools using @tool decorator:
@tool("read_edictum", "Read a standing order by name", {"name": str})
async def read_edictum_tool(args):
content = store.read_edictum(args["name"])
return {"content": [{"type": "text", "text": content}]}
@tool("publish_actum", "Publish shared knowledge", {"name": str, "content": str})
async def publish_actum_tool(args):
store.publish_actum(args["name"], args["content"], author=caller_name)
return {"content": [{"type": "text", "text": f"Published: {args['name']}"}]}
# ... etc for all operationsTwo MCP server patterns built from these tools:
memoria_full— all operations (for Legatus: can write edicta, revoke edicta). Single shared instance.memoria_centurio— restricted (no edictum writes/revokes, commentarii scoped to caller). Built per-Centurio viabuild_memoria_centurio_server(store, centurio_name)— thecenturio_nameis captured in a closure so everywrite_commentarium/list_commentarii/read_commentariumcall is automatically scoped to the owning Centurio. No magiccaller_namevariable — the identity is baked in at server creation time.
def build_memoria_centurio_server(store: MemoriaStore, centurio_name: str):
"""Build a per-Centurio MCP server with commentarii scoped to owner."""
@tool("write_commentarium", "Write to your private journal", {"name": str, "content": str})
async def write_commentarium_tool(args):
store.write_commentarium(centurio_name, args["name"], args["content"])
return {"content": [{"type": "text", "text": f"Written: {args['name']}"}]}
# ... other scoped tools ...
return create_sdk_mcp_server(
name=f"memoria_{centurio_name}",
tools=[write_commentarium_tool, ...],
)Phase 3 — Praetorium (message bus, SQLite-backed)
3.1 legio/praetorium.py — Session & message routing
Per D6: The Praetorium persists to SQLite for crash recovery. Conversations survive process restarts.
Database: workspace/praetorium.db — single SQLite file.
CREATE TABLE nuntii (
id TEXT PRIMARY KEY, -- UUID (D8)
sender TEXT NOT NULL, -- "caesar", "legatus", "vorenus"
text TEXT NOT NULL,
audience TEXT NOT NULL, -- JSON array: '["all"]', '["legatus"]', '["vorenus","pullo"]'
timestamp TEXT NOT NULL, -- ISO 8601 UTC
reply_to TEXT, -- FK to nuntii.id, nullable (D8)
FOREIGN KEY (reply_to) REFERENCES nuntii(id)
);
CREATE INDEX idx_nuntii_timestamp ON nuntii(timestamp);
CREATE INDEX idx_nuntii_sender ON nuntii(sender);class Praetorium:
def __init__(self, db_path: Path): ...
async def post(self, nuntius: Nuntius) -> None
"""Insert a nuntius into the database."""
async def get_visible_nuntii(self, viewer: str, limit: int = 50) -> list[Nuntius]
"""Return nuntii visible to a given viewer, most recent first.
Visibility rules:
- "caesar" and "legatus" see everything.
- A Centurio sees: (a) nuntii where its name appears in audience JSON,
(b) nuntii with audience '["all"]'.
"""
async def get_history(self, limit: int = 50) -> list[Nuntius]
"""Return the last N nuntii, regardless of audience. Admin view."""History management (D14):
- SQLite stores all nuntii unbounded. Disk is cheap; history is valuable.
- When providing context to agents, a sliding window of the last
Nnuntii is used (defaultN=50, configurable vialegio.toml). - The
limitparameter onget_visible_nuntiicontrols this window. - No automatic summarization or compaction in v3. If history grows very large, Caesar can issue an edictum summarizing past context and reset (manual, prompt-level).
Audience serialization: Nuntius stores audience as tuple[str, ...] (frozen dataclass). SQLite stores it as a JSON array string ('["all"]'). Praetorium handles the conversion:
- Write:
json.dumps(list(nuntius.audience)) - Read:
tuple(json.loads(row["audience"]))
Dependency: aiosqlite for async SQLite access (add to pyproject.toml).
3.2 Response flow (D15, D17)
All Centurio responses are Legatus-mediated (D15):
Caesar → Telegram → Legatus.handle_message()
↓
dispatch_to_centurio(name, nuntius)
↓
Centurio processes, returns str (D17)
↓
Legatus posts response to Praetorium
↓
Legatus returns response to Telegram handler
↓
Telegram handler sends to CaesarWhy Legatus-mediated, not direct:
- The Legatus controls what gets posted — it can filter, annotate, or suppress.
- Consistent flow: all messages pass through one code path.
- The Legatus can add metadata (e.g., "[Vorenus responds]:" header) before posting.
- Future: the Legatus could route a Centurio's response to another Centurio for review.
Phase 4 — Legatus (SDK agent + orchestrator)
4.1 legio/legatus.py
Per D1: The Legatus is itself a Claude SDK agent with a persistent session (D2). It is not a Centurio — it's the built-in orchestrator — but it thinks via LLM, not if/else chains.
Persistent client with roster-aware rebuild (D27 — SDK constraint resolved): The SDK locks system_prompt at ClaudeSDKClient creation. The Legatus needs the Centurio roster in its system prompt. Solution: the full roster (names + specializations) is baked into the system prompt at client creation. The client is rebuilt only when the roster changes (centurio created or removed) — not on every call. Real-time status (idle/working/error) is injected as an XML block in the user message, since status flickers frequently but doesn't warrant a client rebuild. This preserves SDK conversation history for continuity — the Legatus remembers recent exchanges with Caesar.
Two hats:
- Orchestrator (code) — message parsing, @mention extraction, session management, Centurio dispatch, Telegram I/O. This is plumbing that prompts can't do.
- Default responder (SDK agent) — when no Centurio is mentioned (and auto-routing doesn't apply), the Legatus's own
ClaudeSDKClienthandles the conversation. Its prompt defines personality, routing heuristics, and Centurio management commands.
class Legatus:
def __init__(self, config, praetorium, memoria):
self._config = config
self._praetorium = praetorium
self._memoria = memoria
self._client: ClaudeSDKClient # lazy init — needs async
self._roster_hash: str = "" # track roster changes for client rebuild
self._centurio_sessions: dict[str, ClaudeSDKClient] = {}
self._centuriones: dict[str, Centurio] = {} # registry
async def handle_message(self, text: str, telegram_user_id: int) -> list[str]
"""Main entry point. Returns list of response strings (one per responder).
Flow:
1. Parse @mentions from text (see @mention parsing rules below)
2. Post Caesar's nuntius to Praetorium
3. If roster or legatus prompt.md changed → rebuild Legatus client (new system prompt)
4. If @mentions → dispatch to mentioned Centuriones (parallel, D16)
5. If no @mentions → delegate to Legatus LLM (may auto-route per D25)
6. Post all responses to Praetorium
7. Return responses for Telegram delivery
"""
async def dispatch_to_centurio(self, name: str, nuntius: Nuntius) -> str
"""Send a nuntius to a specific Centurio. Returns response text."""
def list_centuriones(self) -> list[Centurio]
"""Scan workspace/centuriones/ directory. Returns registered Centuriones."""
def _build_system_prompt(self) -> str
"""Read workspace/legatus/prompt.md + inject Centurio roster (D26)."""
def _should_rebuild_client(self) -> bool
"""Rebuild Legatus client if roster changed OR legatus prompt.md edited.
Checks: (a) hash of centurio names+specializations, (b) mtime of prompt.md."""
async def shutdown(self) -> None
"""Tear down all SDK client subprocesses (Legatus + all Centuriones).
Called by __main__.py on SIGTERM/SIGINT (D30)."""Legatus prompt lives at workspace/legatus/prompt.md — Caesar can edit it too.
Dynamic Centurio roster injection (D26): The full roster (names + specializations) is embedded in the system prompt at client creation. When the roster changes (centurio created/removed) or workspace/legatus/prompt.md is edited on disk, _should_rebuild_client() detects it and the client is torn down and rebuilt. Real-time status is injected per-call as a lightweight block in the user message:
<centurio_status>
<centurio name="vorenus" status="idle"/>
<centurio name="pullo" status="working"/>
</centurio_status>The full roster in the system prompt provides specialization descriptions for routing decisions:
<centuriones>
<centurio name="vorenus" status="idle">Code specialist. Handles implementation tasks.</centurio>
<centurio name="pullo" status="working">Research specialist. Currently working on API analysis.</centurio>
</centuriones>The roster is populated by scanning workspace/centuriones/*/prompt.md (first line or summary) and the in-memory status field (D10). This gives the Legatus LLM full awareness of available agents, their specializations, and current state — enabling smart auto-routing (D25).
Why inject, not MCP tool (D26): The Centurio list is needed for every routing decision. Making the LLM call a tool before it can even think about routing adds latency and a wasted API round-trip. Inject it into the system prompt so it's always available. The list_centuriones MCP tool still exists for detailed queries (reading full prompts, checking tools.json).
LLM-driven routing (D25): When Caesar sends a message without @mentions, the Legatus LLM decides whether to:
- Handle it itself (general conversation, meta-questions)
- Route to a specific Centurio based on message content and Centurio specializations
- Route to multiple Centuriones if the task spans specializations
This is configured in the Legatus prompt, not hard-coded. Caesar can adjust routing behavior by editing workspace/legatus/prompt.md — "prompt over code."
@mention parsing rules:
- Format:
@name— case-insensitive, matched against registered Centurio names. - Word-boundary match:
@vorenusmatches, but@vorenus.comdoes not. Implementation: extract@(\w+)tokens, then filter againstself._centurionesregistry. Unrecognized @mentions are ignored (passed through as text to the Legatus LLM). - Multiple mentions in one message:
@Vorenus @Pullo do X→ dispatches to both (D16). @in email addresses or URLs: not matched because\w+stops at.,/, etc.
Legatus tools (via MCP):
- Full Memoria access (read/write/revoke edicta, acta)
- Centurio management:
create_centurio,remove_centurio,list_centuriones - Praetorium:
post_nuntius,get_history dispatch_to_centurio— so the Legatus LLM can route messages to Centuriones (D25)
4.2 Centurio session management
Per D2: Each Centurio gets a persistent ClaudeSDKClient stored in self._centurio_sessions.
Lifecycle:
- First dispatch → create
ClaudeSDKClientwith prompt fromprompt.md, tools fromtools.json, Memoria MCP (restricted). Store in_centurio_sessions[name]. Setstatus = "working"(D10). - Subsequent dispatches → reuse existing client (SDK maintains internal conversation history). Set
status = "working". Updatelast_activetimestamp. - Prompt changed on disk → detect on next dispatch (compare file mtime or content hash against last-seen). If changed, tear down old client, create new one. Old conversation context is lost — this is intentional (new prompt = new agent).
- Remove Centurio → disconnect client, delete from
_centurio_sessions, remove from_centurionesregistry, optionally delete folder. - Error → set
status = "error", log the error, return error message to Legatus for relay. - Completion → set
status = "idle". - Idle timeout → each
ClaudeSDKClientis a subprocess. To avoid holding idle processes indefinitely, sessions inactive for 30 minutes are torn down. A periodicasyncio.Taskcheckslast_activetimestamps and disconnects stale clients. Next dispatch recreates the session (step 1). Timeout configurable inlegio.tomlassession_idle_timeout_minutes = 30.
Multi-mention parallelism (D16): When Caesar @mentions multiple Centuriones (e.g., @Vorenus @Pullo investigate this), all are dispatched in parallel via asyncio.gather. Responses are collected, then posted to Praetorium and relayed to Telegram sequentially (one message per responder, D22).
async def _dispatch_parallel(self, names: list[str], nuntius: Nuntius) -> list[str]:
"""Dispatch to multiple Centuriones in parallel. Returns list of responses."""
tasks = [self.dispatch_to_centurio(name, nuntius) for name in names]
results = await asyncio.gather(*tasks, return_exceptions=True)
responses = []
for name, result in zip(names, results):
if isinstance(result, Exception):
responses.append(f"[{name}] Error: {result}")
else:
responses.append(result)
return responsesPhase 5 — Centurio Runtime (SDK integration)
5.1 Running a Centurio
Each Centurio has a persistent ClaudeSDKClient (D2). On dispatch:
- Legatus checks if session exists and prompt is unchanged. Creates/recreates if needed.
- Legatus builds visible history via
praetorium.get_visible_nuntii(name, limit=config.history_window). - Legatus formats history as XML context block (D18) and prepends to the new message.
- Legatus sends to the Centurio's persistent client:
client = self._centurio_sessions[name]
context = self._format_history(visible_nuntii)
full_message = f"{context}\n\n{nuntius.text}"
# SDK requires async generator input when MCP servers are attached.
# Plain string query() won't trigger MCP tool handling.
async def _prompt():
yield full_message
await client.query(_prompt())
# collect response text via client.receive_response()5.2 History format (D18)
Visible history is formatted as an XML context block — consistent with Memoria format (D5):
<praetorium recent="true" viewer="vorenus">
<nuntius id="abc-123" sender="caesar" timestamp="2026-02-14T10:00:00Z">
@Vorenus investigate the payments API
</nuntius>
<nuntius id="def-456" sender="vorenus" timestamp="2026-02-14T10:05:00Z">
I'll look into it. Starting with the OAuth docs.
</nuntius>
<nuntius id="ghi-789" sender="caesar" timestamp="2026-02-14T10:30:00Z">
@Vorenus also check rate limits
</nuntius>
</praetorium>Why XML, not raw messages (D18):
- Consistent with Memoria XML format (D5) — one convention everywhere
- Claude parses XML tags natively and accurately
- Metadata (sender, timestamp, id) in attributes keeps body clean
- Structured format lets Claude distinguish messages from instructions
5.3 Token budget management (D19)
The Claude Agent SDK does not provide automatic context truncation. We manage it ourselves:
Strategy: sliding window with monitoring.
- Default window is last 50 nuntii (D14, configurable). This is the context provided per dispatch.
- Token tracking: after each
client.query(), the SDK returns usage data includinginput_tokensandoutput_tokens. Track cumulative tokens per session. - Overflow threshold: when cumulative input tokens exceed 75% of the model's context window (~150K for Sonnet), trigger a session reset: tear down the
ClaudeSDKClient, create a new one. The new session starts fresh with just the sliding window of recent nuntii — no old conversation context. - No summarization in v3: automatic summarization adds complexity and latency. The sliding window + session reset is sufficient for a single-user system. Caesar can always issue an edictum summarizing important context if needed.
class SessionTokenTracker:
"""Track cumulative token usage per Centurio session."""
def __init__(self, max_input_tokens: int = 150_000):
self.max_input_tokens = max_input_tokens
self.cumulative_input: int = 0
def update(self, result: ResultMessage) -> None:
"""Extract token count from SDK ResultMessage. Usage may be None."""
if result.usage:
self.cumulative_input += result.usage.get("input_tokens", 0)
def should_reset(self) -> bool:
return self.cumulative_input > self.max_input_tokensWhat the Centurio sees as MCP tools:
mcp__memoria__list_acta/read_actum/publish_actummcp__memoria__list_commentarii/read_commentarium/write_commentarium(scoped to self)mcp__memoria__list_edicta/read_edictum(read-only)
What the Centurio does NOT see:
- Other Centuriones' commentarii
- Edictum write access
- Centurio management tools
dispatch_to_centurio(only Legatus can route)
5.4 Centurio creation (D20 — template + LLM)
Per D20: Both template and LLM.
Template at templates/centurio/:
templates/centurio/
prompt.md.template # Scaffold with placeholders
tools.json.template # Default tool permissionsprompt.md.template:
# {{name}}
You are {{name}}, a Centurio in the Legio system.
## Specialization
{{specialization}}
## Guidelines
- Address the human as Caesar.
- Write your findings to commentarii for persistence.
- Read edicta before starting any task — they contain standing orders.
- Publish important discoveries to acta for other Centuriones.
## Tools
You have access to the Memoria system via MCP tools:
- `list_edicta` / `read_edictum` — standing orders (read-only)
- `list_acta` / `read_actum` / `publish_actum` — shared knowledge
- `list_commentarii` / `read_commentarium` / `write_commentarium` — your private notestools.json.template:
{
"allowed_tools": ["memoria_centurio"]
}Creation flow:
- Caesar says: "Create a Centurio named Cicero who specializes in writing."
- Legatus LLM receives the request, calls
create_centuriotool withname="cicero",specialization="writing". - Tool implementation copies template, substitutes
and, writes toworkspace/centuriones/cicero/. - Legatus LLM may further customize the prompt by calling a follow-up tool or editing the generated file.
- Legatus LLM responds to Caesar confirming creation.
Caesar can also just mkdir workspace/centuriones/cicero/ and write the files directly (D3). The Legatus discovers new Centuriones by scanning the centuriones/ directory.
Phase 6 — Telegram Integration
6.1 legio/telegram/bot.py
Single Telegram bot. One chat with Caesar.
Message formatting (D23): HTML parse mode.
import html
PARSE_MODE = "HTML" # D23 — simpler escaping than MarkdownV2
async def start(config: LegioConfig, legatus: Legatus) -> None:
app = ApplicationBuilder().token(bot_token).build()
app.add_handler(CommandHandler("status", handle_status)) # D24
app.add_handler(CommandHandler("list", handle_list)) # D24
app.add_handler(CommandHandler("create", handle_create)) # D24
app.add_handler(CommandHandler("help", handle_help)) # D24
app.add_handler(MessageHandler(filters.TEXT, handle_message))
await app.run_polling()
async def handle_message(update: Update, context: ContextTypes.DEFAULT_TYPE) -> None:
# SECURITY: verify Caesar identity
if update.effective_user.id != config.caesar.telegram_id:
return # silent ignore
text = update.message.text
# D21: Send "working..." immediately for all messages
status_msg = await update.message.reply_text("⏳")
# Start typing indicator BEFORE calling Legatus — it runs concurrently.
# This matters because dispatch_to_centurio (called as MCP tool by Legatus LLM)
# can block for 30+ seconds while a Centurio works. The typing indicator
# keeps Caesar informed during the entire chain: Legatus → MCP tool → Centurio.
stop_typing = asyncio.Event()
typing_task = asyncio.create_task(
_keep_typing(update.effective_chat.id, stop_typing)
)
try:
responses = await legatus.handle_message(text, update.effective_user.id)
# D22: One message per responder
# Edit the status message with the first response
first = responses[0] if responses else "No response."
await status_msg.edit_text(first, parse_mode=PARSE_MODE)
# Send remaining responses as new messages
for response in responses[1:]:
await update.message.reply_text(response, parse_mode=PARSE_MODE)
except Exception:
await status_msg.edit_text("❌ An error occurred.")
# Log the exception (never swallow, never leak secrets)
finally:
stop_typing.set()
await typing_task6.2 Long-running task UX (D21)
Per D21: Edit-in-place pattern.
- On receiving any message, immediately send a
"⏳"status message. - While Centuriones work, send
sendChatAction("typing")every 5 seconds to keep the typing indicator alive. - When the first response arrives, edit the status message with the result.
- If multiple Centuriones respond (D22), subsequent responses are sent as new reply messages.
Why edit-in-place:
- Prevents chat clutter — one evolving message instead of status + result.
- Industry standard for Telegram bots (per research).
- Clear visual feedback — Caesar sees the ⏳ immediately, then it transforms into the answer.
Typing indicator:
async def _keep_typing(self, chat_id: int, stop_event: asyncio.Event) -> None:
"""Send typing action every 5 seconds until stopped."""
while not stop_event.is_set():
await self._bot.send_chat_action(chat_id, "typing")
try:
await asyncio.wait_for(stop_event.wait(), timeout=5.0)
except TimeoutError:
pass6.3 Multi-Centurio response format (D22)
Per D22: One message per Centurio, with an HTML header line identifying the responder.
<b>⚔️ Vorenus</b>
Here are the API rate limits I found:
- 100 requests per minute per client
- OAuth 2.0 with PKCE for authenticationResponses arrive sequentially (even though Centuriones work in parallel per D16). The first response edits the status message; subsequent responses are new messages.
6.4 Telegram message limits
Telegram allows 4096 UTF-8 characters per message. For longer responses:
def split_message(text: str, max_len: int = 4000) -> list[str]:
"""Split at paragraph boundaries. Leave 96-char buffer for HTML tags."""
if len(text) <= max_len:
return [text]
chunks = []
current = ""
for para in text.split("\n\n"):
if len(current) + len(para) + 2 > max_len:
if current:
chunks.append(current)
current = para
else:
current = f"{current}\n\n{para}" if current else para
if current:
chunks.append(current)
return chunks6.5 Slash commands (D24)
Per D24: Hybrid — both /commands and @mentions.
| Command | Action | Implementation |
|---|---|---|
/status | Show all Centuriones and their status (D10) | Direct — no LLM call |
/list | List available Centuriones with descriptions | Direct — scan filesystem |
/create <name> <description> | Create a new Centurio | Delegates to Legatus LLM |
/help | Show available commands and usage | Direct — static text |
Why both (D24):
- Commands are instant, deterministic, no API cost. Perfect for frequent meta-operations.
- @mentions engage the LLM for actual work. Natural language for complex requests.
- Power users expect both. Commands are discoverable via Telegram's autocomplete.
- "prompt over code" applies to agent behavior, not UX shortcuts. Slash commands are UX, not agent logic.
Phase 7 — Entry Point
7.1 legio/__main__.py
async def main():
config = load_config()
# Logging — structured, no print() (T20)
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s %(levelname)s %(name)s: %(message)s",
datefmt="%Y-%m-%dT%H:%M:%S",
)
# Ensure workspace directories exist (+ default Legatus prompt)
ensure_workspace(config.workspace_dir)
# Infrastructure
memoria = MemoriaStore(config.workspace_dir)
memoria_full = build_memoria_full_server(memoria)
memoria_centurio = build_memoria_centurio_server(memoria)
praetorium = Praetorium(config.workspace_dir / "praetorium.db")
# Legatus (SDK agent + orchestrator)
legatus = Legatus(config, praetorium, memoria, memoria_full, memoria_centurio)
# Graceful shutdown on SIGTERM/SIGINT
shutdown_event = asyncio.Event()
def _signal_handler() -> None:
logging.getLogger("legio").info("Shutdown signal received")
shutdown_event.set()
loop = asyncio.get_running_loop()
for sig in (signal.SIGTERM, signal.SIGINT):
loop.add_signal_handler(sig, _signal_handler)
try:
# Telegram bot (blocks until shutdown_event or external stop)
await start_telegram_bot(config, legatus, shutdown_event)
finally:
# Tear down all SDK client subprocesses
await legatus.shutdown()
# Close SQLite connection
await praetorium.close()
logging.getLogger("legio").info("Legio shut down cleanly")def ensure_workspace(workspace_dir: Path) -> None:
"""Create workspace directory structure if it doesn't exist.
Copies default Legatus prompt from templates/ on first run.
"""
(workspace_dir / "edicta").mkdir(parents=True, exist_ok=True)
(workspace_dir / "acta").mkdir(parents=True, exist_ok=True)
(workspace_dir / "centuriones").mkdir(parents=True, exist_ok=True)
(workspace_dir / "legatus").mkdir(parents=True, exist_ok=True)
# First-run: copy default Legatus prompt if missing
legatus_prompt = workspace_dir / "legatus" / "prompt.md"
if not legatus_prompt.exists():
template = Path("templates/legatus/prompt.md.template")
if template.exists():
shutil.copy(template, legatus_prompt)Dependency Graph
legio/__main__.py
└── telegram/bot.py
└── legatus.py (SDK agent + orchestrator)
├── ClaudeSDKClient (Legatus own session)
├── ClaudeSDKClient per Centurio (persistent sessions)
├── praetorium.py → nuntius.py
├── memoria/store.py
├── memoria/tools.py → MCP servers
├── centurio.py
└── config.py
errors.py (used everywhere)Filesystem Layout (runtime)
workspace/
praetorium.db # SQLite — conversation history (survives restarts)
legatus/
prompt.md # Legatus system prompt (Caesar-editable)
edicta/ # Standing orders (XML files)
security-policy.xml
acta/ # Shared knowledge (XML files)
api-research.xml
centuriones/
vorenus/
prompt.md # Centurio definition (Caesar-editable)
tools.json # Allowed tools (Caesar-editable)
commentarii/ # Private notes (XML files, append-only)
task-notes-001.xml
pullo/
prompt.md
tools.json
commentarii/
templates/ # Committed to repo (not in workspace/)
legatus/
prompt.md.template # Default Legatus prompt (copied on first run)
centurio/
prompt.md.template # Scaffold for new Centuriones
tools.json.template # Default tool permissionsDependencies (additions to pyproject.toml)
dependencies = [
"claude-agent-sdk",
"python-telegram-bot",
"tomli",
"aiosqlite", # D6 — async SQLite for Praetorium
]What Is NOT in Scope (v3)
- Optio / Miles (future agent tiers)
- Multi-Legio (multiple processes)
- Web UI (Telegram only)
- Persistent SDK sessions across process restarts (Praetorium history persists, but ClaudeSDKClient sessions are rebuilt from disk on restart)
- Inter-Centurio direct messaging (must go through Praetorium)
- Automatic context summarization (D19 — sliding window + session reset is sufficient)
- Hard enforcement of Memoria size limits (D13 — soft, prompt-level guidelines)
- File attachments / media in Telegram (text only in v3)
Open Questions Summary
All questions resolved.
| # | Question | Impact | Status |
|---|---|---|---|
D7: Explicit names only. ["all"] is the sole wildcard. | |||
D8: Yes, UUID4 id field on every Nuntius. | |||
D9: Yes. audience=["legatus"] excludes all Centuriones. | |||
| D3: Filesystem is truth. Read from disk each dispatch. | |||
| D10: Yes — idle/working/error. In-memory, not persisted. | |||
D11: Default 10, configurable in legio.toml. | |||
| D5: XML tags (Claude convention). Attributes for metadata, body for content. | |||
| D12: Edicta/Acta mutable (overwrite). Commentarii append-only. | |||
| D13: Soft limits only (50KB/file, 1MB/commentarii). Prompt-enforced. | |||
D6: SQLite at workspace/praetorium.db. Survives restarts. | |||
| D14: SQLite unbounded. Sliding window of last 50 nuntii for context. | |||
| D15: Legatus-mediated. All responses flow through Legatus. | |||
| D1: SDK agent with persistent session. | |||
D16: Parallel via asyncio.gather. Responses sent sequentially. | |||
| D4: Yes, via Memoria MCP + management tools. | |||
D17: Return str from dispatch_to_centurio(). Legatus posts + relays. | |||
| D2: Persistent ClaudeSDKClient per Centurio. | |||
| D4: Yes, MCP tools for all agents. | |||
D18: XML <praetorium> block with <nuntius> tags. Consistent with D5. | |||
| D19: Sliding window + session reset at 75% capacity. No summarization in v3. | |||
| D20: Both. Template scaffold + LLM customization. | |||
| D3: Yes, Caesar edits files directly. | |||
| D21: Edit-in-place. Send ⏳, edit with result. Typing indicator every 5s. | |||
| D22: One message per Centurio with bold header. Sent sequentially. | |||
D23: HTML parse mode. <b>, <code>, <pre> for formatting. | |||
D24: Hybrid. /status, /list, /create, /help + @mentions. | |||
| D25: Yes. Auto-routing via LLM. Configured in Legatus prompt. | |||
| D26: Dynamic injection into system prompt. MCP tool as backup. |