Sessions & Memory
How Legio manages agent sessions, persistent context, and the shared memory system.
Dual-Session Architecture
Legio maintains two layers of conversation state:
- SDK Conversation Thread (volatile) — each
ClaudeSDKClientholds an in-memory conversation. Lost on restart, idle timeout, or session rebuild. - Praetorium (persistent) — SQLite stores all nuntii with audience-based visibility. Survives restarts indefinitely.
The praetorium is the source of truth. SDK sessions are ephemeral workers that receive context on demand via history injection.
SDK Session Lifecycle
Each agent (Legatus and every centurio) runs as a ClaudeSDKClient — a subprocess communicating with the Claude API.
Session States
Session Creation
When a centurio is first dispatched to, _ensure_session():
- Reads the centurio's
prompt.mdand computes SHA-256 hash - Builds a scoped MCP server (
memoria_<name>) viabuild_memoria_centurio_server() - Creates
ClaudeAgentOptionswith model, system prompt, and MCP config - Creates and connects a
ClaudeSDKClientwithpermission_mode="bypassPermissions" - Stores the session with prompt hash, token tracker, and
last_injected_ts=None
@dataclass
class _CenturioSession:
client: ClaudeSDKClient
tracker: SessionTokenTracker
prompt_hash: str # SHA-256 of prompt.md at creation
last_active: float # time.monotonic()
last_injected_ts: str | None # ISO timestamp for dedupSession Dispatch
On each dispatch to a centurio:
- Check if session needs rebuilding (prompt hash mismatch or tokens exceeded)
- Build visible history from praetorium, deduplicated against
last_injected_ts - Format history as XML via
format_history_xml() - Send
history_xml + "\n\n" + nuntius.textviaclient.query() - Collect response by iterating
client.receive_messages()(handlesMessageParseErrorgracefully) - Update token tracker and
last_activetimestamp - Set centurio status to
"idle"(or"error"on exception)
Parallel Dispatch
When Caesar @mentions multiple centuriones:
tasks = [session_mgr.dispatch(name, ...) for name in names]
results = await asyncio.gather(*tasks, return_exceptions=True)Each failure returns "[name] An error occurred during processing." — one centurio's error doesn't block others.
Reset Triggers
A session is automatically rebuilt when:
| Trigger | Detection | Effect |
|---|---|---|
| Prompt changed | SHA-256 hash mismatch | Disconnect old client, create new one |
| Token threshold | cumulative_input > 150,000 | Disconnect old client, create new one |
| Idle timeout | time.monotonic() - last_active > timeout_secs | Background reaper disconnects |
| Manual reset | /reset <name> command | Immediate disconnect |
Idle Reaping
A background asyncio.Task runs every 60 seconds:
async def _reap_loop() -> None:
while True:
await asyncio.sleep(60)
cleaned = await self.cleanup_idle()
if cleaned:
logger.info("Reaped %d idle sessions", cleaned)Default idle timeout: 30 minutes (configurable via session_idle_timeout_minutes).
Shutdown
On SIGTERM or SIGINT:
- Cancel idle reaper task
- Disconnect all centurio SDK clients
- Disconnect Legatus SDK client
- Close praetorium database
Legatus Session
The Legatus has its own ClaudeSDKClient with additional rebuild triggers:
| Trigger | Detection |
|---|---|
| Roster hash change | SHA-256 of name:description pairs changes (centurio added/removed/edited) |
| Prompt mtime change | castra/legatus/prompt.md file modification time changes |
On rebuild, _client_is_fresh = True triggers history bootstrap on the next query.
System Prompt Construction
The Legatus system prompt is built from:
- Base prompt —
castra/legatus/prompt.md - Roster XML — all centurio names + descriptions (no status):
<centuriones>
<centurio name="vorenus">Technology analysis specialist</centurio>
<centurio name="brutus">Code review expert</centurio>
</centuriones>Status is injected per-call in the user message (not the system prompt) to prevent staleness:
<centurio_status>
<centurio name="vorenus" status="idle"/>
<centurio name="brutus" status="working"/>
</centurio_status>Legatus MCP Configuration
legatus_mcp = create_sdk_mcp_server(
name="legatus_tools",
version="1.0.0",
tools=legatus_tools + memoria_full.tools, # 6 + 11 = 17 tools
)
options = ClaudeAgentOptions(
system_prompt=system_prompt,
mcp_servers={"legatus_tools": legatus_mcp},
model=config.model,
permission_mode="bypassPermissions",
)History Bootstrap
When a fresh SDK session is created (startup, rebuild, idle timeout), the first query is enriched with praetorium history:
<praetorium recent="true" viewer="legatus">
<nuntius id="abc-123" sender="caesar" timestamp="2026-02-15T12:00:00+00:00">
Previous message text
</nuntius>
<nuntius id="def-456" sender="legatus" timestamp="2026-02-15T12:01:00+00:00">
Previous response text
</nuntius>
</praetorium>
<context_notice>Session restored from praetorium.
Ask Caesar for clarification if context is unclear.</context_notice>
<centurio_status>
<centurio name="vorenus" status="idle"/>
</centurio_status>
The actual user message here...The <context_notice> signals to the LLM that this is a restored session, not a fresh conversation.
History Deduplication
Each centurio session tracks last_injected_ts — the ISO timestamp of the most recent nuntius injected on the previous dispatch.
First dispatch: full praetorium history (up to history_window nuntii).
Subsequent dispatches: only nuntii newer than last_injected_ts.
visible = await praetorium.get_visible_nuntii(name, limit=config.history_window)
if session.last_injected_ts is not None:
visible = [n for n in visible
if n.timestamp.isoformat() > session.last_injected_ts]Token Tracking
SessionTokenTracker accumulates input tokens from each SDK ResultMessage:
class SessionTokenTracker:
max_input_tokens: int = 150_000 # ~75% of 200K context window
cumulative_input: int = 0
def update(self, result: ResultMessage) -> None:
if result.usage:
self.cumulative_input += result.usage.get("input_tokens", 0)
def should_reset(self) -> bool:
return self.cumulative_input > self.max_input_tokensWhen cumulative input exceeds 150K tokens, the session is flagged for rebuild. This prevents degraded performance from overly long conversation threads.
Response Collection
collect_response() handles SDK message iteration with graceful error recovery:
Status messages are extracted from SDK intermediate messages:
| Block Type | Status Display |
|---|---|
ToolUseBlock | ⏳ Reading edicta... or ⏳ Dispatching to vorenus... |
ThinkingBlock | ⏳ Thinking... |
TextBlock | ⏳ First 80 chars of preview... |
Praetorium (Message Bus)
The praetorium is the persistent backbone — a SQLite database that stores every nuntius exchanged in the system.
Schema
CREATE TABLE nuntii (
id TEXT PRIMARY KEY, -- UUID4
sender TEXT NOT NULL, -- "caesar", "legatus", or centurio name
text TEXT NOT NULL, -- message body
audience TEXT NOT NULL, -- JSON array of recipient names
timestamp TEXT NOT NULL, -- ISO 8601 UTC
reply_to TEXT, -- UUID of parent nuntius
FOREIGN KEY (reply_to) REFERENCES nuntii(id)
);
CREATE INDEX idx_nuntii_timestamp ON nuntii(timestamp);
CREATE INDEX idx_nuntii_sender ON nuntii(sender);PRAGMA Settings
| Setting | Value | Purpose |
|---|---|---|
journal_mode | WAL | Concurrent read/write performance |
foreign_keys | ON | Enforce reply_to referential integrity |
Visibility Rules
- Caesar and Legatus see all nuntii (god-view)
- Centuriones see only nuntii where their name appears in
audienceoraudienceis["all"] - Visibility is enforced at the application layer with exact-match JSON parsing (no SQL
LIKE— prevents substring false positives)
Over-Fetch Heuristic
When filtering for centuriones, the praetorium fetches limit * 5 rows from the database, then filters in Python. This compensates for interleaved audiences where many rows may not be visible to the requesting centurio.
Memoria (Persistent Storage)
Three filesystem-based storage layers, all using XML format:
Layer Comparison
| Layer | Scope | Mutability | Location | Access |
|---|---|---|---|---|
| Edicta | Global | Read/Write/Revoke | castra/edicta/*.xml | Legatus: full, Centurio: read-only |
| Acta | Global | Read/Write | castra/acta/*.xml | All agents: full |
| Commentarii | Per-centurio | Append-only | castra/centuriones/<name>/commentarii/*.xml | Owner: read/write, Legatus: read-all |
MCP Tool Access
| Tool | Centurio Server | Legatus Server |
|---|---|---|
list_edicta | ✅ read | ✅ read |
read_edictum | ✅ read | ✅ read |
publish_edictum | ❌ | ✅ write |
revoke_edictum | ❌ | ✅ write |
list_acta | ✅ read | ✅ read |
read_actum | ✅ read | ✅ read |
publish_actum | ✅ write (author auto-set) | ✅ write |
list_commentarii | 🔒 own only | ✅ all (requires centurio_name arg) |
read_commentarium | 🔒 own only | ✅ all (requires centurio_name arg) |
write_commentarium | 🔒 own only | ✅ all (requires centurio_name arg) |
Scoping Mechanism
Centurio MCP tools are scoped via Python closures:
def build_memoria_centurio_server(store: MemoriaStore, centurio_name: str) -> MemoriaServer:
# centurio_name captured in closures — commentarii auto-scoped
# publish_actum author auto-set to centurio_name
# No access to publish_edictum or revoke_edictumThis means a centurio cannot access another centurio's commentarii, even if it knows the name — the MCP tool signature doesn't accept a centurio_name parameter.