Skip to content

Design Principles

The design philosophy behind Legio, explaining the reasoning behind key architectural decisions.

Prompt Over Code

If behavior can be achieved through agent prompting, do not hard-code it.

This is the foundational principle. Code handles three things:

  1. Infrastructure — Telegram integration, SQLite, filesystem I/O
  2. Data models — Nuntius, Centurio, config schemas
  3. Orchestration plumbing — routing, session management, MCP tool wiring

Everything else belongs in prompts and edicta:

  • How the Legatus routes messages → prompt instructions
  • Response style and tone → prompt instructions
  • Domain-specific rules → edicta (standing orders)
  • Centurio specialization → per-centurio prompt.md

Why? Prompts can be changed at runtime without redeployment. A centurio's behavior is redefined by editing a markdown file, not by writing and testing new code.

Flat Core

One domain concept per module:

ModuleLinesConcept
nuntius.py~50Immutable message type
centurio.py~80Agent identity and validation
config.py~120Configuration loading
errors.py~35Exception hierarchy (7 classes)
legatus.py~465Orchestrator + SDK agent
session.py~390SDK session lifecycle
praetorium.py~170SQLite message bus
rendering.py~115XML/template rendering
auctoritas.py~130TOTP authorization store
totp.py~60TOTP verification

Sub-packages exist only for infrastructure that needs multiple files:

  • telegram/bot.py, commands.py, utils.py, markdown_render.py
  • memoria/store.py, tools.py

Why? Flat structure minimizes import depth and makes finding code trivial. You never wonder which package a concept lives in.

File Size Discipline

Every production module must have ≤350 pure code lines (blanks, comments, docstrings excluded). Split proactively at 300 lines. Test files are exempt.

Enforced by scripts/check_file_length.py which counts only executable lines.

Why? Large files signal that a module handles too many concerns. The 350-line limit forces decomposition before complexity accumulates.

Dependencies Flow Downward

No circular imports. Infrastructure never leaks into domain code. The domain layer has zero knowledge of Telegram.

Why? Directional dependencies make testing simple (mock downward, never upward) and ensure the core logic is portable — you could swap Telegram for Slack without touching domain code.

Domain Exceptions

All errors inherit from a single base:

Rules:

  • Use domain exceptions (CenturioNotFound), never stdlib (KeyError)
  • Catch at boundaries (Telegram handler), propagate in domain code
  • Never swallow exceptions, never leak internal details in error messages
  • Generic error messages at the Telegram boundary prevent information disclosure

Immutable Messages

Nuntius is a frozen dataclass — once created, it cannot be modified. Every message gets a UUID4, a UTC timestamp, and explicit audience routing.

Why? Immutability prevents a class of bugs where message state is accidentally mutated after being posted to the praetorium. It also simplifies concurrent dispatch — multiple centuriones can safely reference the same nuntius.

Security in Depth

Security is not a layer — it's woven into every boundary:

BoundaryControlModule
Telegram entryUser ID + private chat filterbot.py
Destructive actionsTOTP with timing-safe comparisontotp.py, auctoritas.py
Centurio namesRegex ^[a-z][a-z0-9_-]*$ + reserved namescenturio.py
Memoria namesRegex ^[a-z0-9][a-z0-9_-]*$memoria/store.py
XML injectionescape() + quoteattr() on all contextrendering.py
HTML injectionHTMLRenderer(escape=True) on all outputmarkdown_render.py
Symlink attackspath.is_symlink() rejectionmemoria/store.py
Raw HTML in LLM outputEscaped, never passed throughmarkdown_render.py
Error messagesGeneric at boundary, detailed in logstelegram/utils.py
OTP replayTTL=120s, max 3 attempts, auto-cleanupauctoritas.py

Every defensive measure has a # SECURITY: prefixed inline comment explaining what attack it prevents.

Dual-Session Architecture

The system maintains two layers of conversation state:

  1. SDK Conversation Thread (volatile) — each ClaudeSDKClient holds an in-memory conversation. Lost on restart.

  2. Praetorium (persistent) — SQLite stores all nuntii with visibility rules. Survives restarts.

On a fresh SDK session (startup, rebuild, idle timeout), the Legatus bootstraps context by injecting praetorium history as XML into the first query. This gives the LLM conversational continuity without relying on the volatile SDK state.

Why? The Claude Agent SDK doesn't persist conversation state. Rather than fighting this, Legio embraces it — the praetorium is the source of truth, and SDK sessions are ephemeral workers that receive context on demand.

Token Conservation

Two mechanisms keep token usage efficient:

History Deduplication

Each centurio session tracks last_injected_ts — the timestamp of the most recent nuntius injected. Subsequent dispatches only inject nuntii newer than this timestamp.

First dispatch gets full history. Second dispatch gets only what's new.

Session Lifecycle Management

An idle reaper task runs every 60 seconds and disconnects centurio sessions that have been inactive beyond a configurable timeout (default: 30 minutes). This frees resources without losing context — the praetorium always has the full history for re-injection.

Token Threshold Rebuild

SessionTokenTracker accumulates input tokens from each ResultMessage. When cumulative input exceeds ~150K tokens (75% of the 200K context window), the session is flagged for rebuild rather than letting performance degrade from an overfull context.

MCP Tool Scoping

The Memoria system uses two different MCP server configurations:

Centurio tools are scoped via Python closures — the centurio_name is captured at server creation time, making it impossible for a centurio to access another's private data even if it crafts the right tool arguments.

Configuration Separation

SourceContainsCommitted?
legio.tomlSettings (model, timeouts, caesar ID)✅ Yes
.envSecrets (API keys, TOTP secret)❌ No
blueprints/Templates for centurio creation✅ Yes
castra/Runtime state (prompts, data, db)Partial

Config is loaded once into a frozen LegioConfig dataclass and passed through constructors. No global state, no mutable singletons.

Built with Roman discipline.