Design Principles
The design philosophy behind Legio, explaining the reasoning behind key architectural decisions.
Prompt Over Code
If behavior can be achieved through agent prompting, do not hard-code it.
This is the foundational principle. Code handles three things:
- Infrastructure — Telegram integration, SQLite, filesystem I/O
- Data models — Nuntius, Centurio, config schemas
- Orchestration plumbing — routing, session management, MCP tool wiring
Everything else belongs in prompts and edicta:
- How the Legatus routes messages → prompt instructions
- Response style and tone → prompt instructions
- Domain-specific rules → edicta (standing orders)
- Centurio specialization → per-centurio
prompt.md
Why? Prompts can be changed at runtime without redeployment. A centurio's behavior is redefined by editing a markdown file, not by writing and testing new code.
Flat Core
One domain concept per module:
| Module | Lines | Concept |
|---|---|---|
nuntius.py | ~50 | Immutable message type |
centurio.py | ~80 | Agent identity and validation |
config.py | ~120 | Configuration loading |
errors.py | ~35 | Exception hierarchy (7 classes) |
legatus.py | ~465 | Orchestrator + SDK agent |
session.py | ~390 | SDK session lifecycle |
praetorium.py | ~170 | SQLite message bus |
rendering.py | ~115 | XML/template rendering |
auctoritas.py | ~130 | TOTP authorization store |
totp.py | ~60 | TOTP verification |
Sub-packages exist only for infrastructure that needs multiple files:
telegram/—bot.py,commands.py,utils.py,markdown_render.pymemoria/—store.py,tools.py
Why? Flat structure minimizes import depth and makes finding code trivial. You never wonder which package a concept lives in.
File Size Discipline
Every production module must have ≤350 pure code lines (blanks, comments, docstrings excluded). Split proactively at 300 lines. Test files are exempt.
Enforced by scripts/check_file_length.py which counts only executable lines.
Why? Large files signal that a module handles too many concerns. The 350-line limit forces decomposition before complexity accumulates.
Dependencies Flow Downward
No circular imports. Infrastructure never leaks into domain code. The domain layer has zero knowledge of Telegram.
Why? Directional dependencies make testing simple (mock downward, never upward) and ensure the core logic is portable — you could swap Telegram for Slack without touching domain code.
Domain Exceptions
All errors inherit from a single base:
Rules:
- Use domain exceptions (
CenturioNotFound), never stdlib (KeyError) - Catch at boundaries (Telegram handler), propagate in domain code
- Never swallow exceptions, never leak internal details in error messages
- Generic error messages at the Telegram boundary prevent information disclosure
Immutable Messages
Nuntius is a frozen dataclass — once created, it cannot be modified. Every message gets a UUID4, a UTC timestamp, and explicit audience routing.
Why? Immutability prevents a class of bugs where message state is accidentally mutated after being posted to the praetorium. It also simplifies concurrent dispatch — multiple centuriones can safely reference the same nuntius.
Security in Depth
Security is not a layer — it's woven into every boundary:
| Boundary | Control | Module |
|---|---|---|
| Telegram entry | User ID + private chat filter | bot.py |
| Destructive actions | TOTP with timing-safe comparison | totp.py, auctoritas.py |
| Centurio names | Regex ^[a-z][a-z0-9_-]*$ + reserved names | centurio.py |
| Memoria names | Regex ^[a-z0-9][a-z0-9_-]*$ | memoria/store.py |
| XML injection | escape() + quoteattr() on all context | rendering.py |
| HTML injection | HTMLRenderer(escape=True) on all output | markdown_render.py |
| Symlink attacks | path.is_symlink() rejection | memoria/store.py |
| Raw HTML in LLM output | Escaped, never passed through | markdown_render.py |
| Error messages | Generic at boundary, detailed in logs | telegram/utils.py |
| OTP replay | TTL=120s, max 3 attempts, auto-cleanup | auctoritas.py |
Every defensive measure has a # SECURITY: prefixed inline comment explaining what attack it prevents.
Dual-Session Architecture
The system maintains two layers of conversation state:
SDK Conversation Thread (volatile) — each
ClaudeSDKClientholds an in-memory conversation. Lost on restart.Praetorium (persistent) — SQLite stores all nuntii with visibility rules. Survives restarts.
On a fresh SDK session (startup, rebuild, idle timeout), the Legatus bootstraps context by injecting praetorium history as XML into the first query. This gives the LLM conversational continuity without relying on the volatile SDK state.
Why? The Claude Agent SDK doesn't persist conversation state. Rather than fighting this, Legio embraces it — the praetorium is the source of truth, and SDK sessions are ephemeral workers that receive context on demand.
Token Conservation
Two mechanisms keep token usage efficient:
History Deduplication
Each centurio session tracks last_injected_ts — the timestamp of the most recent nuntius injected. Subsequent dispatches only inject nuntii newer than this timestamp.
First dispatch gets full history. Second dispatch gets only what's new.
Session Lifecycle Management
An idle reaper task runs every 60 seconds and disconnects centurio sessions that have been inactive beyond a configurable timeout (default: 30 minutes). This frees resources without losing context — the praetorium always has the full history for re-injection.
Token Threshold Rebuild
SessionTokenTracker accumulates input tokens from each ResultMessage. When cumulative input exceeds ~150K tokens (75% of the 200K context window), the session is flagged for rebuild rather than letting performance degrade from an overfull context.
MCP Tool Scoping
The Memoria system uses two different MCP server configurations:
Centurio tools are scoped via Python closures — the centurio_name is captured at server creation time, making it impossible for a centurio to access another's private data even if it crafts the right tool arguments.
Configuration Separation
| Source | Contains | Committed? |
|---|---|---|
legio.toml | Settings (model, timeouts, caesar ID) | ✅ Yes |
.env | Secrets (API keys, TOTP secret) | ❌ No |
blueprints/ | Templates for centurio creation | ✅ Yes |
castra/ | Runtime state (prompts, data, db) | Partial |
Config is loaded once into a frozen LegioConfig dataclass and passed through constructors. No global state, no mutable singletons.