Sessions & Memory

How Legio manages agent sessions, persistent context, and the shared memory system.

Dual-Session Architecture

Legio maintains two layers of conversation state:

SDK Conversation Thread (volatile) — each ClaudeSDKClient holds an in-memory conversation. Lost on restart, idle timeout, or session rebuild.
Praetorium (persistent) — SQLite stores all nuntii with audience-based visibility. Survives restarts indefinitely.

The praetorium is the source of truth. SDK sessions are ephemeral workers that receive context on demand via history injection.

SDK Session Lifecycle

Each agent (Legatus and every centurio) runs as a ClaudeSDKClient — a subprocess communicating with the Claude API.

Session States

Session Creation

When a centurio is first dispatched to, _ensure_session():

Reads the centurio's prompt.md and computes SHA-256 hash
Builds a scoped MCP server (memoria_<name>) via build_memoria_centurio_server()
Creates ClaudeAgentOptions with model, system prompt, and MCP config
Creates and connects a ClaudeSDKClient with permission_mode="bypassPermissions"
Stores the session with prompt hash, token tracker, and last_injected_ts=None

python

@dataclass
class _CenturioSession:
    client: ClaudeSDKClient
    tracker: SessionTokenTracker
    prompt_hash: str                   # SHA-256 of prompt.md at creation
    last_active: float                 # time.monotonic()
    last_injected_ts: str | None       # ISO timestamp for dedup

Session Dispatch

On each dispatch to a centurio:

Check if session needs rebuilding (prompt hash mismatch or tokens exceeded)
Build visible history from praetorium, deduplicated against last_injected_ts
Format history as XML via format_history_xml()
Send history_xml + "\n\n" + nuntius.text via client.query()
Collect response by iterating client.receive_messages() (handles MessageParseError gracefully)
Update token tracker and last_active timestamp
Set centurio status to "idle" (or "error" on exception)

Parallel Dispatch

When Caesar @mentions multiple centuriones:

python

tasks = [session_mgr.dispatch(name, ...) for name in names]
results = await asyncio.gather(*tasks, return_exceptions=True)

Each failure returns "[name] An error occurred during processing." — one centurio's error doesn't block others.

Reset Triggers

A session is automatically rebuilt when:

Trigger	Detection	Effect
Prompt changed	SHA-256 hash mismatch	Disconnect old client, create new one
Token threshold	`cumulative_input > 150,000`	Disconnect old client, create new one
Idle timeout	`time.monotonic() - last_active > timeout_secs`	Background reaper disconnects
Manual reset	`/reset <name>` command	Immediate disconnect

Idle Reaping

A background asyncio.Task runs every 60 seconds:

python

async def _reap_loop() -> None:
    while True:
        await asyncio.sleep(60)
        cleaned = await self.cleanup_idle()
        if cleaned:
            logger.info("Reaped %d idle sessions", cleaned)

Default idle timeout: 30 minutes (configurable via session_idle_timeout_minutes).

Shutdown

On SIGTERM or SIGINT:

Cancel idle reaper task
Disconnect all centurio SDK clients
Disconnect Legatus SDK client
Close praetorium database

Legatus Session

The Legatus has its own ClaudeSDKClient with additional rebuild triggers:

Trigger	Detection
Roster hash change	SHA-256 of `name:description` pairs changes (centurio added/removed/edited)
Prompt mtime change	`castra/legatus/prompt.md` file modification time changes

On rebuild, _client_is_fresh = True triggers history bootstrap on the next query.

System Prompt Construction

The Legatus system prompt is built from:

Base prompt — castra/legatus/prompt.md
Roster XML — all centurio names + descriptions (no status):

xml

<centuriones>
  <centurio name="vorenus">Technology analysis specialist</centurio>
  <centurio name="brutus">Code review expert</centurio>
</centuriones>

Status is injected per-call in the user message (not the system prompt) to prevent staleness:

xml

<centurio_status>
  <centurio name="vorenus" status="idle"/>
  <centurio name="brutus" status="working"/>
</centurio_status>

Legatus MCP Configuration

python

legatus_mcp = create_sdk_mcp_server(
    name="legatus_tools",
    version="1.0.0",
    tools=legatus_tools + memoria_full.tools,  # 6 + 11 = 17 tools
)
options = ClaudeAgentOptions(
    system_prompt=system_prompt,
    mcp_servers={"legatus_tools": legatus_mcp},
    model=config.model,
    permission_mode="bypassPermissions",
)

History Bootstrap

When a fresh SDK session is created (startup, rebuild, idle timeout), the first query is enriched with praetorium history:

xml

<praetorium recent="true" viewer="legatus">
  <nuntius id="abc-123" sender="caesar" timestamp="2026-02-15T12:00:00+00:00">
    Previous message text
  </nuntius>
  <nuntius id="def-456" sender="legatus" timestamp="2026-02-15T12:01:00+00:00">
    Previous response text
  </nuntius>
</praetorium>

<context_notice>Session restored from praetorium.
Ask Caesar for clarification if context is unclear.</context_notice>

<centurio_status>
  <centurio name="vorenus" status="idle"/>
</centurio_status>

The actual user message here...

The <context_notice> signals to the LLM that this is a restored session, not a fresh conversation.

History Deduplication

Each centurio session tracks last_injected_ts — the ISO timestamp of the most recent nuntius injected on the previous dispatch.

First dispatch: full praetorium history (up to history_window nuntii).

Subsequent dispatches: only nuntii newer than last_injected_ts.

python

visible = await praetorium.get_visible_nuntii(name, limit=config.history_window)
if session.last_injected_ts is not None:
    visible = [n for n in visible
               if n.timestamp.isoformat() > session.last_injected_ts]

Token Tracking

SessionTokenTracker accumulates input tokens from each SDK ResultMessage:

python

class SessionTokenTracker:
    max_input_tokens: int = 150_000   # ~75% of 200K context window
    cumulative_input: int = 0

    def update(self, result: ResultMessage) -> None:
        if result.usage:
            self.cumulative_input += result.usage.get("input_tokens", 0)

    def should_reset(self) -> bool:
        return self.cumulative_input > self.max_input_tokens

When cumulative input exceeds 150K tokens, the session is flagged for rebuild. This prevents degraded performance from overly long conversation threads.

Response Collection

collect_response() handles SDK message iteration with graceful error recovery:

Status messages are extracted from SDK intermediate messages:

Block Type	Status Display
`ToolUseBlock`	`⏳ Reading edicta...` or `⏳ Dispatching to vorenus...`
`ThinkingBlock`	`⏳ Thinking...`
`TextBlock`	`⏳ First 80 chars of preview...`

Praetorium (Message Bus)

The praetorium is the persistent backbone — a SQLite database that stores every nuntius exchanged in the system.

Schema

sql

CREATE TABLE nuntii (
    id TEXT PRIMARY KEY,          -- UUID4
    sender TEXT NOT NULL,         -- "caesar", "legatus", or centurio name
    text TEXT NOT NULL,           -- message body
    audience TEXT NOT NULL,       -- JSON array of recipient names
    timestamp TEXT NOT NULL,      -- ISO 8601 UTC
    reply_to TEXT,                -- UUID of parent nuntius
    FOREIGN KEY (reply_to) REFERENCES nuntii(id)
);

CREATE INDEX idx_nuntii_timestamp ON nuntii(timestamp);
CREATE INDEX idx_nuntii_sender ON nuntii(sender);

PRAGMA Settings

Setting	Value	Purpose
`journal_mode`	`WAL`	Concurrent read/write performance
`foreign_keys`	`ON`	Enforce `reply_to` referential integrity

Visibility Rules

Caesar and Legatus see all nuntii (god-view)
Centuriones see only nuntii where their name appears in audience or audience is ["all"]
Visibility is enforced at the application layer with exact-match JSON parsing (no SQL LIKE — prevents substring false positives)

Over-Fetch Heuristic

When filtering for centuriones, the praetorium fetches limit * 5 rows from the database, then filters in Python. This compensates for interleaved audiences where many rows may not be visible to the requesting centurio.

Memoria (Persistent Storage)

Three filesystem-based storage layers, all using XML format:

Layer Comparison

Layer	Scope	Mutability	Location	Access
Edicta	Global	Read/Write/Revoke	`castra/edicta/*.xml`	Legatus: full, Centurio: read-only
Acta	Global	Read/Write	`castra/acta/*.xml`	All agents: full
Commentarii	Per-centurio	Append-only	`castra/centuriones/<name>/commentarii/*.xml`	Owner: read/write, Legatus: read-all

MCP Tool Access

Tool	Centurio Server	Legatus Server
`list_edicta`	✅ read	✅ read
`read_edictum`	✅ read	✅ read
`publish_edictum`	❌	✅ write
`revoke_edictum`	❌	✅ write
`list_acta`	✅ read	✅ read
`read_actum`	✅ read	✅ read
`publish_actum`	✅ write (author auto-set)	✅ write
`list_commentarii`	🔒 own only	✅ all (requires centurio_name arg)
`read_commentarium`	🔒 own only	✅ all (requires centurio_name arg)
`write_commentarium`	🔒 own only	✅ all (requires centurio_name arg)

Scoping Mechanism

Centurio MCP tools are scoped via Python closures:

python

def build_memoria_centurio_server(store: MemoriaStore, centurio_name: str) -> MemoriaServer:
    # centurio_name captured in closures — commentarii auto-scoped
    # publish_actum author auto-set to centurio_name
    # No access to publish_edictum or revoke_edictum

This means a centurio cannot access another centurio's commentarii, even if it knows the name — the MCP tool signature doesn't accept a centurio_name parameter.

Sessions & Memory ​

Dual-Session Architecture ​

SDK Session Lifecycle ​

Session States ​

Session Creation ​

Session Dispatch ​

Parallel Dispatch ​

Reset Triggers ​

Idle Reaping ​

Shutdown ​

Legatus Session ​

System Prompt Construction ​

Legatus MCP Configuration ​

History Bootstrap ​

History Deduplication ​

Token Tracking ​

Response Collection ​

Praetorium (Message Bus) ​

Schema ​

PRAGMA Settings ​

Visibility Rules ​

Over-Fetch Heuristic ​

Memoria (Persistent Storage) ​

Layer Comparison ​

MCP Tool Access ​

Scoping Mechanism ​

Sessions & Memory

Dual-Session Architecture

SDK Session Lifecycle

Session States

Session Creation

Session Dispatch

Parallel Dispatch

Reset Triggers

Idle Reaping

Shutdown

Legatus Session

System Prompt Construction

Legatus MCP Configuration

History Bootstrap

History Deduplication

Token Tracking

Response Collection

Praetorium (Message Bus)

Schema

PRAGMA Settings

Visibility Rules

Over-Fetch Heuristic

Memoria (Persistent Storage)

Layer Comparison

MCP Tool Access

Scoping Mechanism