Design Principles

The design philosophy behind Legio, explaining the reasoning behind key architectural decisions.

Prompt Over Code

If behavior can be achieved through agent prompting, do not hard-code it.

This is the foundational principle. Code handles three things:

Infrastructure — Telegram integration, SQLite, filesystem I/O
Data models — Nuntius, Centurio, config schemas
Orchestration plumbing — routing, session management, MCP tool wiring

Everything else belongs in prompts and edicta:

How the Legatus routes messages → prompt instructions
Response style and tone → prompt instructions
Domain-specific rules → edicta (standing orders)
Centurio specialization → per-centurio prompt.md

Why? Prompts can be changed at runtime without redeployment. A centurio's behavior is redefined by editing a markdown file, not by writing and testing new code.

Flat Core

One domain concept per module:

Module	Lines	Concept
`nuntius.py`	~50	Immutable message type
`centurio.py`	~80	Agent identity and validation
`config.py`	~120	Configuration loading
`errors.py`	~35	Exception hierarchy (7 classes)
`legatus.py`	~465	Orchestrator + SDK agent
`session.py`	~390	SDK session lifecycle
`praetorium.py`	~170	SQLite message bus
`rendering.py`	~115	XML/template rendering
`auctoritas.py`	~130	TOTP authorization store
`totp.py`	~60	TOTP verification

Sub-packages exist only for infrastructure that needs multiple files:

telegram/ — bot.py, commands.py, utils.py, markdown_render.py
memoria/ — store.py, tools.py

Why? Flat structure minimizes import depth and makes finding code trivial. You never wonder which package a concept lives in.

File Size Discipline

Every production module must have ≤350 pure code lines (blanks, comments, docstrings excluded). Split proactively at 300 lines. Test files are exempt.

Enforced by scripts/check_file_length.py which counts only executable lines.

Why? Large files signal that a module handles too many concerns. The 350-line limit forces decomposition before complexity accumulates.

Dependencies Flow Downward

No circular imports. Infrastructure never leaks into domain code. The domain layer has zero knowledge of Telegram.

Why? Directional dependencies make testing simple (mock downward, never upward) and ensure the core logic is portable — you could swap Telegram for Slack without touching domain code.

Domain Exceptions

All errors inherit from a single base:

Rules:

Use domain exceptions (CenturioNotFound), never stdlib (KeyError)
Catch at boundaries (Telegram handler), propagate in domain code
Never swallow exceptions, never leak internal details in error messages
Generic error messages at the Telegram boundary prevent information disclosure

Immutable Messages

Nuntius is a frozen dataclass — once created, it cannot be modified. Every message gets a UUID4, a UTC timestamp, and explicit audience routing.

Why? Immutability prevents a class of bugs where message state is accidentally mutated after being posted to the praetorium. It also simplifies concurrent dispatch — multiple centuriones can safely reference the same nuntius.

Security in Depth

Security is not a layer — it's woven into every boundary:

Boundary	Control	Module
Telegram entry	User ID + private chat filter	`bot.py`
Destructive actions	TOTP with timing-safe comparison	`totp.py`, `auctoritas.py`
Centurio names	Regex `^[a-z][a-z0-9_-]*$` + reserved names	`centurio.py`
Memoria names	Regex `^[a-z0-9][a-z0-9_-]*$`	`memoria/store.py`
XML injection	`escape()` + `quoteattr()` on all context	`rendering.py`
HTML injection	`HTMLRenderer(escape=True)` on all output	`markdown_render.py`
Symlink attacks	`path.is_symlink()` rejection	`memoria/store.py`
Raw HTML in LLM output	Escaped, never passed through	`markdown_render.py`
Error messages	Generic at boundary, detailed in logs	`telegram/utils.py`
OTP replay	TTL=120s, max 3 attempts, auto-cleanup	`auctoritas.py`

Every defensive measure has a # SECURITY: prefixed inline comment explaining what attack it prevents.

Dual-Session Architecture

The system maintains two layers of conversation state:

SDK Conversation Thread (volatile) — each ClaudeSDKClient holds an in-memory conversation. Lost on restart.
Praetorium (persistent) — SQLite stores all nuntii with visibility rules. Survives restarts.

On a fresh SDK session (startup, rebuild, idle timeout), the Legatus bootstraps context by injecting praetorium history as XML into the first query. This gives the LLM conversational continuity without relying on the volatile SDK state.

Why? The Claude Agent SDK doesn't persist conversation state. Rather than fighting this, Legio embraces it — the praetorium is the source of truth, and SDK sessions are ephemeral workers that receive context on demand.

Token Conservation

Two mechanisms keep token usage efficient:

History Deduplication

Each centurio session tracks last_injected_ts — the timestamp of the most recent nuntius injected. Subsequent dispatches only inject nuntii newer than this timestamp.

First dispatch gets full history. Second dispatch gets only what's new.

Session Lifecycle Management

An idle reaper task runs every 60 seconds and disconnects centurio sessions that have been inactive beyond a configurable timeout (default: 30 minutes). This frees resources without losing context — the praetorium always has the full history for re-injection.

Token Threshold Rebuild

SessionTokenTracker accumulates input tokens from each ResultMessage. When cumulative input exceeds ~150K tokens (75% of the 200K context window), the session is flagged for rebuild rather than letting performance degrade from an overfull context.

MCP Tool Scoping

The Memoria system uses two different MCP server configurations:

Centurio tools are scoped via Python closures — the centurio_name is captured at server creation time, making it impossible for a centurio to access another's private data even if it crafts the right tool arguments.

Configuration Separation

Source	Contains	Committed?
`legio.toml`	Settings (model, timeouts, caesar ID)	✅ Yes
`.env`	Secrets (API keys, TOTP secret)	❌ No
`blueprints/`	Templates for centurio creation	✅ Yes
`castra/`	Runtime state (prompts, data, db)	Partial

Config is loaded once into a frozen LegioConfig dataclass and passed through constructors. No global state, no mutable singletons.

Design Principles ​

Prompt Over Code ​

Flat Core ​

File Size Discipline ​

Dependencies Flow Downward ​

Domain Exceptions ​

Immutable Messages ​

Security in Depth ​

Dual-Session Architecture ​

Token Conservation ​

History Deduplication ​

Session Lifecycle Management ​

Token Threshold Rebuild ​

MCP Tool Scoping ​

Configuration Separation ​