Chat History Management
Investigation of how chatting history works in Legio: current implementation, gaps, design decisions, and recommendations for single-session and multi-agent scenarios.
Current Architecture
Legio has two independent history systems that serve different purposes:
Caesar (Telegram)
│
┌────────┴────────┐
▼ ▼
SDK Conversation Praetorium
(implicit, per (explicit, SQLite
client session) nuntii table)1. SDK Conversation History (Implicit)
Each ClaudeSDKClient instance maintains an internal conversation thread with Claude. This is the standard LLM multi-turn behavior — every client.query() builds on prior turns.
| Property | Legatus | Centurio |
|---|---|---|
| Client instance | Legatus._client | _CenturioSession.client per centurio |
| Persistence | In-memory (SDK process) | In-memory (SDK process) |
| Lifetime | Until roster/prompt change → rebuild | Until idle timeout, prompt change, or token limit |
| Token tracking | None | SessionTokenTracker (150k input token ceiling) |
| Reset trigger | _should_rebuild_client() → disconnect + reconnect | tracker.should_reset(), prompt hash change, /reset command |
| Survives restart | No | No |
How it works: When Caesar sends "do X" and then "now do Y", the legatus SDK client has both messages in its conversation. Claude naturally remembers the context. This is the primary mechanism for conversational continuity — it's not code we wrote, it's the SDK's built-in behavior.
When it breaks:
- System restart — all SDK sessions are lost, conversation starts fresh
- Legatus client rebuild — roster or prompt file change triggers
_should_rebuild_client()→ disconnect + new client → history gone - Centurio idle reap —
cleanup_idle()disconnects sessions after 30 min idle → history gone - Centurio token ceiling —
SessionTokenTrackertriggers reset at 150k input tokens → history gone - Manual reset —
/reset <name>command disconnects a centurio session
2. Praetorium (Explicit Nuntii Store)
The praetorium is a SQLite database (castra/praetorium.db) that persists every message exchanged in the system.
Schema:
CREATE TABLE nuntii (
id TEXT PRIMARY KEY, -- UUID4
sender TEXT NOT NULL, -- "caesar", "legatus", or centurio name
text TEXT NOT NULL, -- message body
audience TEXT NOT NULL, -- JSON array: ["legatus"], ["vorenus"], ["all"]
timestamp TEXT NOT NULL, -- ISO 8601 UTC
reply_to TEXT, -- UUID of parent nuntius (threading)
FOREIGN KEY (reply_to) REFERENCES nuntii(id)
);
CREATE INDEX idx_nuntii_timestamp ON nuntii(timestamp);
CREATE INDEX idx_nuntii_sender ON nuntii(sender);What gets stored:
| Event | sender | audience | reply_to |
|---|---|---|---|
| Caesar types a message (no @mention) | "caesar" | ("legatus",) | None |
Caesar types @vorenus do X | "caesar" | ("vorenus",) | None |
Caesar types @vorenus @pullo compare | "caesar" | ("vorenus", "pullo") | None |
| Legatus responds | "legatus" | ("caesar",) | caesar_nuntius.id |
| Centurio responds | "vorenus" | ("caesar",) | caesar_nuntius.id |
How it's queried:
| Caller | Method | What it sees |
|---|---|---|
/history command | get_history(limit) | All nuntii, newest first (admin view) |
| Centurio dispatch context | get_visible_nuntii(name, limit) | Only nuntii where centurio is in audience, or audience is ["all"] |
Legatus get_history tool | get_history(limit) | All nuntii (legatus has full visibility) |
| Caesar and legatus viewers | get_visible_nuntii("caesar") | Everything (privileged viewers) |
Where it's injected:
Centurio dispatch (
session.py:110-113): Before each centurio dispatch, the last N visible nuntii are fetched, formatted as XML viaformat_history_xml(), and prepended to the user message:pythonvisible = await praetorium.get_visible_nuntii(name, limit=config.history_window) context = format_history_xml(visible, viewer=name) full_message = f"{context}\n\n{nuntius.text}"Legatus query (
legatus.py:390-391): The legatus gets centurio status XML prepended, but NOT praetorium history. The legatus relies on its SDK conversation thread for context, and has aget_historyMCP tool for explicit history retrieval.
XML format injected into centurio context:
<praetorium recent="true" viewer="vorenus">
<nuntius id="uuid-1" sender="caesar" timestamp="2026-02-15T12:00:00+00:00">
@vorenus review this code
</nuntius>
<nuntius id="uuid-2" sender="vorenus" timestamp="2026-02-15T12:01:00+00:00">
I reviewed the code and found 3 issues...
</nuntius>
</praetorium>3. Config Parameters
From legio.toml:
history_window = 50 # nuntii per dispatch context
session_idle_timeout_minutes = 30 # centurio idle reapFrom session.py:
SessionTokenTracker(max_input_tokens=150_000) # ~75% of context windowMessage Flow Diagrams
Flow A: Caesar → Legatus (no @mention)
Caesar: "What's the status?"
↓
bot.py: _handle_message()
→ _handle_request()
→ text = update.message.text # "What's the status?"
→ run_legatus_request(update, bot, handle_message, text)
↓
legatus.py: handle_message(text)
→ mentions = [] # no @mention
→ Post Nuntius(sender="caesar", audience=("legatus",)) → PRAETORIUM
→ _query_legatus(text)
→ status_xml = _build_status_xml() # <centurio_status>...</centurio_status>
→ full_message = status_xml + "\n\n" + text # status XML + "What's the status?"
→ client.query(full_message) # SDK conversation — has prior turns
→ collect_response(client)
→ Post Nuntius(sender="legatus", audience=("caesar",), reply_to=...) → PRAETORIUM
→ return [response]Key observation: The legatus does NOT inject praetorium history into its query. It relies entirely on the SDK conversation thread. The praetorium is a write-through log, not a context source for legatus.
Flow B: Caesar → Centurio (explicit @mention)
Caesar: "@vorenus review this module"
↓
legatus.py: handle_message(text)
→ mentions = ["vorenus"]
→ Post Nuntius(sender="caesar", audience=("vorenus",)) → PRAETORIUM
→ session_mgr.dispatch("vorenus", nuntius, ...)
↓
session.py: dispatch()
→ ensure_session("vorenus", ...) # create or reuse SDK client
→ visible = praetorium.get_visible_nuntii("vorenus", limit=50) ← PRAETORIUM READ
→ context = format_history_xml(visible, viewer="vorenus")
→ full_message = context + "\n\n" + nuntius.text # history XML + "review this module"
→ session.client.query(full_message) # SDK conversation — has prior turns
→ collect_response(client, tracker=...)
↓
legatus.py:
→ Post Nuntius(sender="vorenus", audience=("caesar",), reply_to=...) → PRAETORIUM
→ Prepend attribution header: "⚔️ vorenus — Code Specialist\n━━━━━━━━━━━━"
→ return [header + response]Key observation: Centuriones get both praetorium history (injected) AND SDK conversation history (implicit). This is potentially redundant — the same messages appear twice if the centurio has been active recently.
Gaps and Issues
Gap 1: Legatus Has No Praetorium Context After Restart
When the system restarts, the legatus SDK client is recreated from scratch. The SDK conversation is empty. But the praetorium still has all historical nuntii. The legatus has no mechanism to bootstrap from praetorium — it starts as if no prior conversation happened.
Impact: After restart, Caesar says "continue where we left off" — legatus has no idea what that means. The get_history MCP tool exists but the LLM doesn't know to call it proactively.
Severity: High — breaks continuity across restarts.
Gap 2: Legatus Never Receives Praetorium History Inline
Unlike centuriones (which get history XML prepended), the legatus receives only centurio_status_xml + text. This is intentional (legatus has SDK conversation context), but creates an asymmetry:
- Centuriones see messages from other centuriones and Caesar via praetorium
- Legatus only sees messages from its own conversation thread
Impact: If Caesar asks the legatus "what did @vorenus say about X?", the legatus doesn't have that information in context unless it explicitly calls get_history.
Severity: Medium — legatus has the tool, just doesn't auto-use it.
Gap 3: Double History for Centuriones
When a centurio has an active SDK session and receives a new dispatch, the full_message contains:
- Praetorium XML history (last 50 nuntii visible to this centurio)
- The new message text
But the SDK client already has the conversation from prior turns. So messages from the last few turns appear twice — once in the XML block and once in the SDK's internal context.
Impact: Wastes tokens. At 50 nuntii, the XML block can be 5-10k tokens. If most of those are already in the SDK conversation, that's pure waste.
Severity: Medium — token cost, not correctness.
Gap 4: No Session Survival Across Restarts
Both legatus and centurio SDK sessions are in-memory. A system restart (crash, deploy, manual stop) loses all conversation state. The praetorium persists nuntii, but there's no mechanism to:
- Detect that a prior session existed
- Rebuild SDK context from praetorium history
- Inform the LLM that context was restored (not original)
Impact: Every restart is a clean slate for the LLM, even though the praetorium has full history.
Severity: High — the most impactful gap.
Gap 5: No Cross-Centurio Conversation Visibility
By design (D7), centuriones only see nuntii addressed to them or to "all". If @vorenus produces findings and Caesar then asks @pullo to build on them, pullo has no visibility into vorenus's conversation.
Current workaround: Centuriones can publish to acta (shared knowledge). But this requires the centurio to proactively publish.
Impact: Caesar must manually copy context between centuriones or use explicit @all broadcasts.
Severity: Low — this is by design, but worth noting.
Gap 6: History Window is Static
history_window = 50 is fixed for all centuriones regardless of context window size or task complexity. A simple Q&A centurio wastes tokens on 50 nuntii; a deep analysis centurio might need more context.
Severity: Low — configurable but not per-centurio.
Gap 7: No History Pruning or Summarization
The praetorium grows indefinitely. No mechanism to:
- Archive old nuntii
- Summarize long conversations into condensed context
- Limit database size
Severity: Low in the short term — SQLite handles millions of rows fine. Long-term maintenance concern.
Design Decisions to Make
D-H1: Should Legatus Receive Praetorium History After Restart?
Option A — Bootstrap from praetorium (recommended): On first query after client creation, prepend the last N nuntii (formatted as XML) to the user message. The legatus gets context even after restart.
Option B — Prompt instructs legatus to call get_history: Add a line to the legatus prompt: "When you don't have context for a request, call get_history first." Relies on LLM initiative.
Option C — No change: Accept that restarts clear context. Caesar uses /history manually.
Recommendation: Option A. Simple to implement — check self._client_is_new flag, prepend history XML on first query.
D-H2: Should Centurio History Injection Be Deduplicated?
Option A — Inject only new nuntii: Track the last injected nuntius ID per centurio session. On subsequent dispatches, only inject nuntii newer than the last injection.
Option B — Reduce window on active sessions: If the session already has conversation context, use a smaller history window (e.g., 5 instead of 50).
Option C — No change: Accept the duplication. The LLM handles redundant context gracefully.
Recommendation: Option A for correctness. Track last_injected_nuntius_id on _CenturioSession. Cheap to implement, meaningful token savings.
D-H3: Should Cross-Centurio Visibility Be Expanded?
Option A — Broadcast responses to all centuriones: Change response audience from ("caesar",) to ("all",). Every centurio sees every response.
Option B — Selective broadcast via legatus: Legatus decides which responses to broadcast based on relevance. Requires prompt engineering.
Option C — No change (current): Centuriones publish to acta explicitly.
Recommendation: Option C for now. Cross-centurio context belongs in acta (persistent shared knowledge), not in the nuntii stream (transient conversation). The acta mechanism works — it just needs Caesar or centuriones to use it.
D-H4: Should History Be Summarized?
Option A — LLM-based summarization: Periodically summarize old nuntii into a condensed context block. Store summaries as special nuntii or in acta.
Option B — Sliding window with overflow to commentarii: After N nuntii, auto-archive older ones to a centurio's commentarii.
Option C — No change: Keep all nuntii, rely on LIMIT queries.
Recommendation: Option C for now. Premature optimization. The praetorium LIMIT queries are already bounded. Consider Option A when conversations routinely exceed 200+ nuntii.
D-H5: Per-Centurio History Window?
Option A — Config per centurio in prompt.md frontmatter:
---
history_window: 20
---
# VorenusOption B — Model-based heuristic: Larger models get larger windows.
Option C — No change: Global history_window = 50.
Recommendation: Option C for now. Not a bottleneck yet. When per-centurio model routing is implemented (from claude-flow research), revisit this.
Multi-Agent History: How It All Fits Together
Current Mental Model
PRAETORIUM (shared SQLite)
┌─────────────────────────────────────────────────┐
│ nuntius: caesar → ("legatus",) "do X" │
│ nuntius: legatus → ("caesar",) "done X" │
│ nuntius: caesar → ("vorenus",) "@vorenus Y" │
│ nuntius: vorenus → ("caesar",) "Y result" │
│ nuntius: caesar → ("vorenus","pullo") "debate" │
│ nuntius: vorenus → ("caesar",) "my position" │
│ nuntius: pullo → ("caesar",) "my position" │
└─────────────────────────────────────────────────┘
│ │ │
visible to visible to visible to
legatus+caesar vorenus+caesar pullo+caesar
│ │ │
┌─────┴─────┐ ┌──────┴──────┐ ┌────┴──────┐
│ Legatus │ │ Vorenus │ │ Pullo │
│ SDK client│ │ SDK client │ │ SDK client│
│ (has own │ │ (gets XML │ │(gets XML │
│ thread) │ │ + thread) │ │ + thread) │
└───────────┘ └─────────────┘ └───────────┘What Each Agent Knows
| Agent | SDK conversation (implicit) | Praetorium history (injected) | MCP tools |
|---|---|---|---|
| Legatus | Full thread with Caesar since last rebuild | NOT injected (only status XML) | get_history, post_nuntius |
| Centurio | Full thread since session creation | Last 50 visible nuntii as XML | list_commentarii, read_commentarium, write_commentarium, list_edicta, etc. |
History Lifetime Summary
| Event | SDK thread | Praetorium |
|---|---|---|
| Normal message exchange | Grows per turn | Grows per nuntius (persisted) |
| System restart | Lost | Survives (SQLite) |
| Centurio idle timeout | Lost (session reaped) | Survives |
| Centurio token ceiling | Lost (session reset) | Survives |
| Legatus roster change | Lost (client rebuild) | Survives |
/reset <name> | Lost (explicit) | Survives |
Manual /history | N/A (display only) | Queried |
Recommendations — Priority Order
1. Bootstrap legatus from praetorium after rebuild (D-H1, Option A)
When the legatus SDK client is freshly created (_ensure_legatus_client creates a new client), prepend the last history_window nuntii to the first query. Subsequent queries use the SDK thread naturally.
Implementation: Add _client_is_fresh: bool = True flag. In _query_legatus, if fresh, prepend format_history_xml(praetorium.get_history(limit=history_window), viewer="legatus"). Clear flag after first query.
Effort: ~10 lines code + tests.
2. Deduplicate centurio history injection (D-H2, Option A)
Track last_injected_nuntius_id on _CenturioSession. On dispatch, only inject nuntii with timestamps newer than the last injection.
Implementation: Add last_injected_ts: str | None = None to _CenturioSession. In dispatch(), filter visible nuntii by timestamp. Update after injection.
Effort: ~15 lines code + tests.
3. Inform LLM of context restoration
When bootstrapping from praetorium (recommendation 1), add a system note:
<context_notice>
This conversation context was restored from the praetorium after a session restart.
You may not have full conversational context from the previous session.
If Caesar's request depends on context you don't have, ask for clarification.
</context_notice>Effort: ~5 lines.
4. Document the dual-history model
The implicit SDK thread + explicit praetorium pattern is powerful but non-obvious. Add a section to the architecture docs explaining:
- SDK thread = short-term conversational memory (volatile)
- Praetorium = long-term message log (persistent)
- Centuriones get both; legatus gets only SDK thread (plus tools)
Effort: Documentation only.