Chat History Management

Investigation of how chatting history works in Legio: current implementation, gaps, design decisions, and recommendations for single-session and multi-agent scenarios.

Current Architecture

Legio has two independent history systems that serve different purposes:

                        Caesar (Telegram)
                             │
                    ┌────────┴────────┐
                    ▼                 ▼
           SDK Conversation      Praetorium
           (implicit, per        (explicit, SQLite
            client session)       nuntii table)

1. SDK Conversation History (Implicit)

Each ClaudeSDKClient instance maintains an internal conversation thread with Claude. This is the standard LLM multi-turn behavior — every client.query() builds on prior turns.

Property	Legatus	Centurio
Client instance	`Legatus._client`	`_CenturioSession.client` per centurio
Persistence	In-memory (SDK process)	In-memory (SDK process)
Lifetime	Until roster/prompt change → rebuild	Until idle timeout, prompt change, or token limit
Token tracking	None	`SessionTokenTracker` (150k input token ceiling)
Reset trigger	`_should_rebuild_client()` → disconnect + reconnect	`tracker.should_reset()`, prompt hash change, `/reset` command
Survives restart	No	No

How it works: When Caesar sends "do X" and then "now do Y", the legatus SDK client has both messages in its conversation. Claude naturally remembers the context. This is the primary mechanism for conversational continuity — it's not code we wrote, it's the SDK's built-in behavior.

When it breaks:

System restart — all SDK sessions are lost, conversation starts fresh
Legatus client rebuild — roster or prompt file change triggers _should_rebuild_client() → disconnect + new client → history gone
Centurio idle reap — cleanup_idle() disconnects sessions after 30 min idle → history gone
Centurio token ceiling — SessionTokenTracker triggers reset at 150k input tokens → history gone
Manual reset — /reset <name> command disconnects a centurio session

2. Praetorium (Explicit Nuntii Store)

The praetorium is a SQLite database (castra/praetorium.db) that persists every message exchanged in the system.

Schema:

sql

CREATE TABLE nuntii (
    id TEXT PRIMARY KEY,           -- UUID4
    sender TEXT NOT NULL,          -- "caesar", "legatus", or centurio name
    text TEXT NOT NULL,            -- message body
    audience TEXT NOT NULL,        -- JSON array: ["legatus"], ["vorenus"], ["all"]
    timestamp TEXT NOT NULL,       -- ISO 8601 UTC
    reply_to TEXT,                 -- UUID of parent nuntius (threading)
    FOREIGN KEY (reply_to) REFERENCES nuntii(id)
);
CREATE INDEX idx_nuntii_timestamp ON nuntii(timestamp);
CREATE INDEX idx_nuntii_sender ON nuntii(sender);

What gets stored:

Event	sender	audience	reply_to
Caesar types a message (no @mention)	`"caesar"`	`("legatus",)`	None
Caesar types `@vorenus do X`	`"caesar"`	`("vorenus",)`	None
Caesar types `@vorenus @pullo compare`	`"caesar"`	`("vorenus", "pullo")`	None
Legatus responds	`"legatus"`	`("caesar",)`	caesar_nuntius.id
Centurio responds	`"vorenus"`	`("caesar",)`	caesar_nuntius.id

How it's queried:

Caller	Method	What it sees
`/history` command	`get_history(limit)`	All nuntii, newest first (admin view)
Centurio dispatch context	`get_visible_nuntii(name, limit)`	Only nuntii where centurio is in audience, or audience is `["all"]`
Legatus `get_history` tool	`get_history(limit)`	All nuntii (legatus has full visibility)
Caesar and legatus viewers	`get_visible_nuntii("caesar")`	Everything (privileged viewers)

Where it's injected:

Centurio dispatch (session.py:110-113): Before each centurio dispatch, the last N visible nuntii are fetched, formatted as XML via format_history_xml(), and prepended to the user message:
python
```
visible = await praetorium.get_visible_nuntii(name, limit=config.history_window)
context = format_history_xml(visible, viewer=name)
full_message = f"{context}\n\n{nuntius.text}"
```
Legatus query (legatus.py:390-391): The legatus gets centurio status XML prepended, but NOT praetorium history. The legatus relies on its SDK conversation thread for context, and has a get_history MCP tool for explicit history retrieval.

XML format injected into centurio context:

xml

<praetorium recent="true" viewer="vorenus">
  <nuntius id="uuid-1" sender="caesar" timestamp="2026-02-15T12:00:00+00:00">
    @vorenus review this code
  </nuntius>
  <nuntius id="uuid-2" sender="vorenus" timestamp="2026-02-15T12:01:00+00:00">
    I reviewed the code and found 3 issues...
  </nuntius>
</praetorium>

3. Config Parameters

From legio.toml:

toml

history_window = 50                    # nuntii per dispatch context
session_idle_timeout_minutes = 30      # centurio idle reap

From session.py:

python

SessionTokenTracker(max_input_tokens=150_000)   # ~75% of context window

Message Flow Diagrams

Flow A: Caesar → Legatus (no @mention)

Caesar: "What's the status?"
  ↓
bot.py: _handle_message()
  → _handle_request()
    → text = update.message.text                    # "What's the status?"
    → run_legatus_request(update, bot, handle_message, text)
      ↓
legatus.py: handle_message(text)
  → mentions = []                                   # no @mention
  → Post Nuntius(sender="caesar", audience=("legatus",))  →  PRAETORIUM
  → _query_legatus(text)
    → status_xml = _build_status_xml()              # <centurio_status>...</centurio_status>
    → full_message = status_xml + "\n\n" + text     # status XML + "What's the status?"
    → client.query(full_message)                    # SDK conversation — has prior turns
    → collect_response(client)
  → Post Nuntius(sender="legatus", audience=("caesar",), reply_to=...) → PRAETORIUM
  → return [response]

Key observation: The legatus does NOT inject praetorium history into its query. It relies entirely on the SDK conversation thread. The praetorium is a write-through log, not a context source for legatus.

Flow B: Caesar → Centurio (explicit @mention)

Caesar: "@vorenus review this module"
  ↓
legatus.py: handle_message(text)
  → mentions = ["vorenus"]
  → Post Nuntius(sender="caesar", audience=("vorenus",))  → PRAETORIUM
  → session_mgr.dispatch("vorenus", nuntius, ...)
    ↓
session.py: dispatch()
  → ensure_session("vorenus", ...)                  # create or reuse SDK client
  → visible = praetorium.get_visible_nuntii("vorenus", limit=50)  ← PRAETORIUM READ
  → context = format_history_xml(visible, viewer="vorenus")
  → full_message = context + "\n\n" + nuntius.text  # history XML + "review this module"
  → session.client.query(full_message)              # SDK conversation — has prior turns
  → collect_response(client, tracker=...)
  ↓
legatus.py:
  → Post Nuntius(sender="vorenus", audience=("caesar",), reply_to=...) → PRAETORIUM
  → Prepend attribution header: "⚔️ vorenus — Code Specialist\n━━━━━━━━━━━━"
  → return [header + response]

Key observation: Centuriones get both praetorium history (injected) AND SDK conversation history (implicit). This is potentially redundant — the same messages appear twice if the centurio has been active recently.

Gaps and Issues

Gap 1: Legatus Has No Praetorium Context After Restart

When the system restarts, the legatus SDK client is recreated from scratch. The SDK conversation is empty. But the praetorium still has all historical nuntii. The legatus has no mechanism to bootstrap from praetorium — it starts as if no prior conversation happened.

Impact: After restart, Caesar says "continue where we left off" — legatus has no idea what that means. The get_history MCP tool exists but the LLM doesn't know to call it proactively.

Severity: High — breaks continuity across restarts.

Gap 2: Legatus Never Receives Praetorium History Inline

Unlike centuriones (which get history XML prepended), the legatus receives only centurio_status_xml + text. This is intentional (legatus has SDK conversation context), but creates an asymmetry:

Centuriones see messages from other centuriones and Caesar via praetorium
Legatus only sees messages from its own conversation thread

Impact: If Caesar asks the legatus "what did @vorenus say about X?", the legatus doesn't have that information in context unless it explicitly calls get_history.

Severity: Medium — legatus has the tool, just doesn't auto-use it.

Gap 3: Double History for Centuriones

When a centurio has an active SDK session and receives a new dispatch, the full_message contains:

Praetorium XML history (last 50 nuntii visible to this centurio)
The new message text

But the SDK client already has the conversation from prior turns. So messages from the last few turns appear twice — once in the XML block and once in the SDK's internal context.

Impact: Wastes tokens. At 50 nuntii, the XML block can be 5-10k tokens. If most of those are already in the SDK conversation, that's pure waste.

Severity: Medium — token cost, not correctness.

Gap 4: No Session Survival Across Restarts

Both legatus and centurio SDK sessions are in-memory. A system restart (crash, deploy, manual stop) loses all conversation state. The praetorium persists nuntii, but there's no mechanism to:

Detect that a prior session existed
Rebuild SDK context from praetorium history
Inform the LLM that context was restored (not original)

Impact: Every restart is a clean slate for the LLM, even though the praetorium has full history.

Severity: High — the most impactful gap.

Gap 5: No Cross-Centurio Conversation Visibility

By design (D7), centuriones only see nuntii addressed to them or to "all". If @vorenus produces findings and Caesar then asks @pullo to build on them, pullo has no visibility into vorenus's conversation.

Current workaround: Centuriones can publish to acta (shared knowledge). But this requires the centurio to proactively publish.

Impact: Caesar must manually copy context between centuriones or use explicit @all broadcasts.

Severity: Low — this is by design, but worth noting.

Gap 6: History Window is Static

history_window = 50 is fixed for all centuriones regardless of context window size or task complexity. A simple Q&A centurio wastes tokens on 50 nuntii; a deep analysis centurio might need more context.

Severity: Low — configurable but not per-centurio.

Gap 7: No History Pruning or Summarization

The praetorium grows indefinitely. No mechanism to:

Archive old nuntii
Summarize long conversations into condensed context
Limit database size

Severity: Low in the short term — SQLite handles millions of rows fine. Long-term maintenance concern.

Design Decisions to Make

D-H1: Should Legatus Receive Praetorium History After Restart?

Option A — Bootstrap from praetorium (recommended): On first query after client creation, prepend the last N nuntii (formatted as XML) to the user message. The legatus gets context even after restart.

Option B — Prompt instructs legatus to call get_history: Add a line to the legatus prompt: "When you don't have context for a request, call get_history first." Relies on LLM initiative.

Option C — No change: Accept that restarts clear context. Caesar uses /history manually.

Recommendation: Option A. Simple to implement — check self._client_is_new flag, prepend history XML on first query.

D-H2: Should Centurio History Injection Be Deduplicated?

Option A — Inject only new nuntii: Track the last injected nuntius ID per centurio session. On subsequent dispatches, only inject nuntii newer than the last injection.

Option B — Reduce window on active sessions: If the session already has conversation context, use a smaller history window (e.g., 5 instead of 50).

Option C — No change: Accept the duplication. The LLM handles redundant context gracefully.

Recommendation: Option A for correctness. Track last_injected_nuntius_id on _CenturioSession. Cheap to implement, meaningful token savings.

D-H3: Should Cross-Centurio Visibility Be Expanded?

Option A — Broadcast responses to all centuriones: Change response audience from ("caesar",) to ("all",). Every centurio sees every response.

Option B — Selective broadcast via legatus: Legatus decides which responses to broadcast based on relevance. Requires prompt engineering.

Option C — No change (current): Centuriones publish to acta explicitly.

Recommendation: Option C for now. Cross-centurio context belongs in acta (persistent shared knowledge), not in the nuntii stream (transient conversation). The acta mechanism works — it just needs Caesar or centuriones to use it.

D-H4: Should History Be Summarized?

Option A — LLM-based summarization: Periodically summarize old nuntii into a condensed context block. Store summaries as special nuntii or in acta.

Option B — Sliding window with overflow to commentarii: After N nuntii, auto-archive older ones to a centurio's commentarii.

Option C — No change: Keep all nuntii, rely on LIMIT queries.

Recommendation: Option C for now. Premature optimization. The praetorium LIMIT queries are already bounded. Consider Option A when conversations routinely exceed 200+ nuntii.

D-H5: Per-Centurio History Window?

Option A — Config per centurio in prompt.md frontmatter:

markdown

---
history_window: 20
---
# Vorenus

Option B — Model-based heuristic: Larger models get larger windows.

Option C — No change: Global history_window = 50.

Recommendation: Option C for now. Not a bottleneck yet. When per-centurio model routing is implemented (from claude-flow research), revisit this.

Multi-Agent History: How It All Fits Together

Current Mental Model

  PRAETORIUM (shared SQLite)
  ┌─────────────────────────────────────────────────┐
  │  nuntius: caesar → ("legatus",) "do X"          │
  │  nuntius: legatus → ("caesar",) "done X"        │
  │  nuntius: caesar → ("vorenus",) "@vorenus Y"    │
  │  nuntius: vorenus → ("caesar",) "Y result"      │
  │  nuntius: caesar → ("vorenus","pullo") "debate"  │
  │  nuntius: vorenus → ("caesar",) "my position"   │
  │  nuntius: pullo → ("caesar",) "my position"     │
  └─────────────────────────────────────────────────┘
        │               │              │
    visible to       visible to     visible to
    legatus+caesar   vorenus+caesar  pullo+caesar
        │               │              │
  ┌─────┴─────┐  ┌──────┴──────┐ ┌────┴──────┐
  │ Legatus   │  │  Vorenus    │ │   Pullo   │
  │ SDK client│  │  SDK client │ │ SDK client│
  │ (has own  │  │ (gets XML   │ │(gets XML  │
  │  thread)  │  │  + thread)  │ │ + thread) │
  └───────────┘  └─────────────┘ └───────────┘

What Each Agent Knows

Agent	SDK conversation (implicit)	Praetorium history (injected)	MCP tools
Legatus	Full thread with Caesar since last rebuild	NOT injected (only status XML)	`get_history`, `post_nuntius`
Centurio	Full thread since session creation	Last 50 visible nuntii as XML	`list_commentarii`, `read_commentarium`, `write_commentarium`, `list_edicta`, etc.

History Lifetime Summary

Event	SDK thread	Praetorium
Normal message exchange	Grows per turn	Grows per nuntius (persisted)
System restart	Lost	Survives (SQLite)
Centurio idle timeout	Lost (session reaped)	Survives
Centurio token ceiling	Lost (session reset)	Survives
Legatus roster change	Lost (client rebuild)	Survives
`/reset <name>`	Lost (explicit)	Survives
Manual `/history`	N/A (display only)	Queried

Recommendations — Priority Order

1. Bootstrap legatus from praetorium after rebuild (D-H1, Option A)

When the legatus SDK client is freshly created (_ensure_legatus_client creates a new client), prepend the last history_window nuntii to the first query. Subsequent queries use the SDK thread naturally.

Implementation: Add _client_is_fresh: bool = True flag. In _query_legatus, if fresh, prepend format_history_xml(praetorium.get_history(limit=history_window), viewer="legatus"). Clear flag after first query.

Effort: ~10 lines code + tests.

2. Deduplicate centurio history injection (D-H2, Option A)

Track last_injected_nuntius_id on _CenturioSession. On dispatch, only inject nuntii with timestamps newer than the last injection.

Implementation: Add last_injected_ts: str | None = None to _CenturioSession. In dispatch(), filter visible nuntii by timestamp. Update after injection.

Effort: ~15 lines code + tests.

3. Inform LLM of context restoration

When bootstrapping from praetorium (recommendation 1), add a system note:

xml

<context_notice>
This conversation context was restored from the praetorium after a session restart.
You may not have full conversational context from the previous session.
If Caesar's request depends on context you don't have, ask for clarification.
</context_notice>

Effort: ~5 lines.

4. Document the dual-history model

The implicit SDK thread + explicit praetorium pattern is powerful but non-obvious. Add a section to the architecture docs explaining:

SDK thread = short-term conversational memory (volatile)
Praetorium = long-term message log (persistent)
Centuriones get both; legatus gets only SDK thread (plus tools)

Effort: Documentation only.

Chat History Management ​

Current Architecture ​

1. SDK Conversation History (Implicit) ​

2. Praetorium (Explicit Nuntii Store) ​

3. Config Parameters ​

Message Flow Diagrams ​

Flow A: Caesar → Legatus (no @mention) ​

Flow B: Caesar → Centurio (explicit @mention) ​

Gaps and Issues ​

Gap 1: Legatus Has No Praetorium Context After Restart ​

Gap 2: Legatus Never Receives Praetorium History Inline ​

Gap 3: Double History for Centuriones ​

Gap 4: No Session Survival Across Restarts ​

Gap 5: No Cross-Centurio Conversation Visibility ​

Gap 6: History Window is Static ​

Gap 7: No History Pruning or Summarization ​

Design Decisions to Make ​

D-H1: Should Legatus Receive Praetorium History After Restart? ​

D-H2: Should Centurio History Injection Be Deduplicated? ​

D-H3: Should Cross-Centurio Visibility Be Expanded? ​

D-H4: Should History Be Summarized? ​

D-H5: Per-Centurio History Window? ​

Multi-Agent History: How It All Fits Together ​

Current Mental Model ​

What Each Agent Knows ​

History Lifetime Summary ​

Recommendations — Priority Order ​

1. Bootstrap legatus from praetorium after rebuild (D-H1, Option A) ​

2. Deduplicate centurio history injection (D-H2, Option A) ​

3. Inform LLM of context restoration ​

4. Document the dual-history model ​