Skip to content

Sessions & Memory

How Legio manages agent sessions, persistent context, and the shared memory system.

Dual-Session Architecture

Legio maintains two layers of conversation state:

  1. SDK Conversation Thread (volatile) — each ClaudeSDKClient holds an in-memory conversation. Lost on restart, idle timeout, or session rebuild.
  2. Praetorium (persistent) — SQLite stores all nuntii with audience-based visibility. Survives restarts indefinitely.

The praetorium is the source of truth. SDK sessions are ephemeral workers that receive context on demand via history injection.

SDK Session Lifecycle

Each agent (Legatus and every centurio) runs as a ClaudeSDKClient — a subprocess communicating with the Claude API.

Session States

Session Creation

When a centurio is first dispatched to, _ensure_session():

  1. Reads the centurio's prompt.md and computes SHA-256 hash
  2. Builds a scoped MCP server (memoria_<name>) via build_memoria_centurio_server()
  3. Creates ClaudeAgentOptions with model, system prompt, and MCP config
  4. Creates and connects a ClaudeSDKClient with permission_mode="bypassPermissions"
  5. Stores the session with prompt hash, token tracker, and last_injected_ts=None
python
@dataclass
class _CenturioSession:
    client: ClaudeSDKClient
    tracker: SessionTokenTracker
    prompt_hash: str                   # SHA-256 of prompt.md at creation
    last_active: float                 # time.monotonic()
    last_injected_ts: str | None       # ISO timestamp for dedup

Session Dispatch

On each dispatch to a centurio:

  1. Check if session needs rebuilding (prompt hash mismatch or tokens exceeded)
  2. Build visible history from praetorium, deduplicated against last_injected_ts
  3. Format history as XML via format_history_xml()
  4. Send history_xml + "\n\n" + nuntius.text via client.query()
  5. Collect response by iterating client.receive_messages() (handles MessageParseError gracefully)
  6. Update token tracker and last_active timestamp
  7. Set centurio status to "idle" (or "error" on exception)

Parallel Dispatch

When Caesar @mentions multiple centuriones:

python
tasks = [session_mgr.dispatch(name, ...) for name in names]
results = await asyncio.gather(*tasks, return_exceptions=True)

Each failure returns "[name] An error occurred during processing." — one centurio's error doesn't block others.

Reset Triggers

A session is automatically rebuilt when:

TriggerDetectionEffect
Prompt changedSHA-256 hash mismatchDisconnect old client, create new one
Token thresholdcumulative_input > 150,000Disconnect old client, create new one
Idle timeouttime.monotonic() - last_active > timeout_secsBackground reaper disconnects
Manual reset/reset <name> commandImmediate disconnect

Idle Reaping

A background asyncio.Task runs every 60 seconds:

python
async def _reap_loop() -> None:
    while True:
        await asyncio.sleep(60)
        cleaned = await self.cleanup_idle()
        if cleaned:
            logger.info("Reaped %d idle sessions", cleaned)

Default idle timeout: 30 minutes (configurable via session_idle_timeout_minutes).

Shutdown

On SIGTERM or SIGINT:

  1. Cancel idle reaper task
  2. Disconnect all centurio SDK clients
  3. Disconnect Legatus SDK client
  4. Close praetorium database

Legatus Session

The Legatus has its own ClaudeSDKClient with additional rebuild triggers:

TriggerDetection
Roster hash changeSHA-256 of name:description pairs changes (centurio added/removed/edited)
Prompt mtime changecastra/legatus/prompt.md file modification time changes

On rebuild, _client_is_fresh = True triggers history bootstrap on the next query.

System Prompt Construction

The Legatus system prompt is built from:

  1. Base promptcastra/legatus/prompt.md
  2. Roster XML — all centurio names + descriptions (no status):
xml
<centuriones>
  <centurio name="vorenus">Technology analysis specialist</centurio>
  <centurio name="brutus">Code review expert</centurio>
</centuriones>

Status is injected per-call in the user message (not the system prompt) to prevent staleness:

xml
<centurio_status>
  <centurio name="vorenus" status="idle"/>
  <centurio name="brutus" status="working"/>
</centurio_status>

Legatus MCP Configuration

python
legatus_mcp = create_sdk_mcp_server(
    name="legatus_tools",
    version="1.0.0",
    tools=legatus_tools + memoria_full.tools,  # 6 + 11 = 17 tools
)
options = ClaudeAgentOptions(
    system_prompt=system_prompt,
    mcp_servers={"legatus_tools": legatus_mcp},
    model=config.model,
    permission_mode="bypassPermissions",
)

History Bootstrap

When a fresh SDK session is created (startup, rebuild, idle timeout), the first query is enriched with praetorium history:

xml
<praetorium recent="true" viewer="legatus">
  <nuntius id="abc-123" sender="caesar" timestamp="2026-02-15T12:00:00+00:00">
    Previous message text
  </nuntius>
  <nuntius id="def-456" sender="legatus" timestamp="2026-02-15T12:01:00+00:00">
    Previous response text
  </nuntius>
</praetorium>

<context_notice>Session restored from praetorium.
Ask Caesar for clarification if context is unclear.</context_notice>

<centurio_status>
  <centurio name="vorenus" status="idle"/>
</centurio_status>

The actual user message here...

The <context_notice> signals to the LLM that this is a restored session, not a fresh conversation.

History Deduplication

Each centurio session tracks last_injected_ts — the ISO timestamp of the most recent nuntius injected on the previous dispatch.

First dispatch: full praetorium history (up to history_window nuntii).

Subsequent dispatches: only nuntii newer than last_injected_ts.

python
visible = await praetorium.get_visible_nuntii(name, limit=config.history_window)
if session.last_injected_ts is not None:
    visible = [n for n in visible
               if n.timestamp.isoformat() > session.last_injected_ts]

Token Tracking

SessionTokenTracker accumulates input tokens from each SDK ResultMessage:

python
class SessionTokenTracker:
    max_input_tokens: int = 150_000   # ~75% of 200K context window
    cumulative_input: int = 0

    def update(self, result: ResultMessage) -> None:
        if result.usage:
            self.cumulative_input += result.usage.get("input_tokens", 0)

    def should_reset(self) -> bool:
        return self.cumulative_input > self.max_input_tokens

When cumulative input exceeds 150K tokens, the session is flagged for rebuild. This prevents degraded performance from overly long conversation threads.

Response Collection

collect_response() handles SDK message iteration with graceful error recovery:

Status messages are extracted from SDK intermediate messages:

Block TypeStatus Display
ToolUseBlock⏳ Reading edicta... or ⏳ Dispatching to vorenus...
ThinkingBlock⏳ Thinking...
TextBlock⏳ First 80 chars of preview...

Praetorium (Message Bus)

The praetorium is the persistent backbone — a SQLite database that stores every nuntius exchanged in the system.

Schema

sql
CREATE TABLE nuntii (
    id TEXT PRIMARY KEY,          -- UUID4
    sender TEXT NOT NULL,         -- "caesar", "legatus", or centurio name
    text TEXT NOT NULL,           -- message body
    audience TEXT NOT NULL,       -- JSON array of recipient names
    timestamp TEXT NOT NULL,      -- ISO 8601 UTC
    reply_to TEXT,                -- UUID of parent nuntius
    FOREIGN KEY (reply_to) REFERENCES nuntii(id)
);

CREATE INDEX idx_nuntii_timestamp ON nuntii(timestamp);
CREATE INDEX idx_nuntii_sender ON nuntii(sender);

PRAGMA Settings

SettingValuePurpose
journal_modeWALConcurrent read/write performance
foreign_keysONEnforce reply_to referential integrity

Visibility Rules

  • Caesar and Legatus see all nuntii (god-view)
  • Centuriones see only nuntii where their name appears in audience or audience is ["all"]
  • Visibility is enforced at the application layer with exact-match JSON parsing (no SQL LIKE — prevents substring false positives)

Over-Fetch Heuristic

When filtering for centuriones, the praetorium fetches limit * 5 rows from the database, then filters in Python. This compensates for interleaved audiences where many rows may not be visible to the requesting centurio.

Memoria (Persistent Storage)

Three filesystem-based storage layers, all using XML format:

Layer Comparison

LayerScopeMutabilityLocationAccess
EdictaGlobalRead/Write/Revokecastra/edicta/*.xmlLegatus: full, Centurio: read-only
ActaGlobalRead/Writecastra/acta/*.xmlAll agents: full
CommentariiPer-centurioAppend-onlycastra/centuriones/<name>/commentarii/*.xmlOwner: read/write, Legatus: read-all

MCP Tool Access

ToolCenturio ServerLegatus Server
list_edicta✅ read✅ read
read_edictum✅ read✅ read
publish_edictum✅ write
revoke_edictum✅ write
list_acta✅ read✅ read
read_actum✅ read✅ read
publish_actum✅ write (author auto-set)✅ write
list_commentarii🔒 own only✅ all (requires centurio_name arg)
read_commentarium🔒 own only✅ all (requires centurio_name arg)
write_commentarium🔒 own only✅ all (requires centurio_name arg)

Scoping Mechanism

Centurio MCP tools are scoped via Python closures:

python
def build_memoria_centurio_server(store: MemoriaStore, centurio_name: str) -> MemoriaServer:
    # centurio_name captured in closures — commentarii auto-scoped
    # publish_actum author auto-set to centurio_name
    # No access to publish_edictum or revoke_edictum

This means a centurio cannot access another centurio's commentarii, even if it knows the name — the MCP tool signature doesn't accept a centurio_name parameter.

Built with Roman discipline.