Skip to content

Chat History Management

Investigation of how chatting history works in Legio: current implementation, gaps, design decisions, and recommendations for single-session and multi-agent scenarios.


Current Architecture

Legio has two independent history systems that serve different purposes:

                        Caesar (Telegram)

                    ┌────────┴────────┐
                    ▼                 ▼
           SDK Conversation      Praetorium
           (implicit, per        (explicit, SQLite
            client session)       nuntii table)

1. SDK Conversation History (Implicit)

Each ClaudeSDKClient instance maintains an internal conversation thread with Claude. This is the standard LLM multi-turn behavior — every client.query() builds on prior turns.

PropertyLegatusCenturio
Client instanceLegatus._client_CenturioSession.client per centurio
PersistenceIn-memory (SDK process)In-memory (SDK process)
LifetimeUntil roster/prompt change → rebuildUntil idle timeout, prompt change, or token limit
Token trackingNoneSessionTokenTracker (150k input token ceiling)
Reset trigger_should_rebuild_client() → disconnect + reconnecttracker.should_reset(), prompt hash change, /reset command
Survives restartNoNo

How it works: When Caesar sends "do X" and then "now do Y", the legatus SDK client has both messages in its conversation. Claude naturally remembers the context. This is the primary mechanism for conversational continuity — it's not code we wrote, it's the SDK's built-in behavior.

When it breaks:

  • System restart — all SDK sessions are lost, conversation starts fresh
  • Legatus client rebuild — roster or prompt file change triggers _should_rebuild_client() → disconnect + new client → history gone
  • Centurio idle reapcleanup_idle() disconnects sessions after 30 min idle → history gone
  • Centurio token ceilingSessionTokenTracker triggers reset at 150k input tokens → history gone
  • Manual reset/reset <name> command disconnects a centurio session

2. Praetorium (Explicit Nuntii Store)

The praetorium is a SQLite database (castra/praetorium.db) that persists every message exchanged in the system.

Schema:

sql
CREATE TABLE nuntii (
    id TEXT PRIMARY KEY,           -- UUID4
    sender TEXT NOT NULL,          -- "caesar", "legatus", or centurio name
    text TEXT NOT NULL,            -- message body
    audience TEXT NOT NULL,        -- JSON array: ["legatus"], ["vorenus"], ["all"]
    timestamp TEXT NOT NULL,       -- ISO 8601 UTC
    reply_to TEXT,                 -- UUID of parent nuntius (threading)
    FOREIGN KEY (reply_to) REFERENCES nuntii(id)
);
CREATE INDEX idx_nuntii_timestamp ON nuntii(timestamp);
CREATE INDEX idx_nuntii_sender ON nuntii(sender);

What gets stored:

Eventsenderaudiencereply_to
Caesar types a message (no @mention)"caesar"("legatus",)None
Caesar types @vorenus do X"caesar"("vorenus",)None
Caesar types @vorenus @pullo compare"caesar"("vorenus", "pullo")None
Legatus responds"legatus"("caesar",)caesar_nuntius.id
Centurio responds"vorenus"("caesar",)caesar_nuntius.id

How it's queried:

CallerMethodWhat it sees
/history commandget_history(limit)All nuntii, newest first (admin view)
Centurio dispatch contextget_visible_nuntii(name, limit)Only nuntii where centurio is in audience, or audience is ["all"]
Legatus get_history toolget_history(limit)All nuntii (legatus has full visibility)
Caesar and legatus viewersget_visible_nuntii("caesar")Everything (privileged viewers)

Where it's injected:

  • Centurio dispatch (session.py:110-113): Before each centurio dispatch, the last N visible nuntii are fetched, formatted as XML via format_history_xml(), and prepended to the user message:

    python
    visible = await praetorium.get_visible_nuntii(name, limit=config.history_window)
    context = format_history_xml(visible, viewer=name)
    full_message = f"{context}\n\n{nuntius.text}"
  • Legatus query (legatus.py:390-391): The legatus gets centurio status XML prepended, but NOT praetorium history. The legatus relies on its SDK conversation thread for context, and has a get_history MCP tool for explicit history retrieval.

XML format injected into centurio context:

xml
<praetorium recent="true" viewer="vorenus">
  <nuntius id="uuid-1" sender="caesar" timestamp="2026-02-15T12:00:00+00:00">
    @vorenus review this code
  </nuntius>
  <nuntius id="uuid-2" sender="vorenus" timestamp="2026-02-15T12:01:00+00:00">
    I reviewed the code and found 3 issues...
  </nuntius>
</praetorium>

3. Config Parameters

From legio.toml:

toml
history_window = 50                    # nuntii per dispatch context
session_idle_timeout_minutes = 30      # centurio idle reap

From session.py:

python
SessionTokenTracker(max_input_tokens=150_000)   # ~75% of context window

Message Flow Diagrams

Flow A: Caesar → Legatus (no @mention)

Caesar: "What's the status?"

bot.py: _handle_message()
  → _handle_request()
    → text = update.message.text                    # "What's the status?"
    → run_legatus_request(update, bot, handle_message, text)

legatus.py: handle_message(text)
  → mentions = []                                   # no @mention
  → Post Nuntius(sender="caesar", audience=("legatus",))  →  PRAETORIUM
  → _query_legatus(text)
    → status_xml = _build_status_xml()              # <centurio_status>...</centurio_status>
    → full_message = status_xml + "\n\n" + text     # status XML + "What's the status?"
    → client.query(full_message)                    # SDK conversation — has prior turns
    → collect_response(client)
  → Post Nuntius(sender="legatus", audience=("caesar",), reply_to=...) → PRAETORIUM
  → return [response]

Key observation: The legatus does NOT inject praetorium history into its query. It relies entirely on the SDK conversation thread. The praetorium is a write-through log, not a context source for legatus.

Flow B: Caesar → Centurio (explicit @mention)

Caesar: "@vorenus review this module"

legatus.py: handle_message(text)
  → mentions = ["vorenus"]
  → Post Nuntius(sender="caesar", audience=("vorenus",))  → PRAETORIUM
  → session_mgr.dispatch("vorenus", nuntius, ...)

session.py: dispatch()
  → ensure_session("vorenus", ...)                  # create or reuse SDK client
  → visible = praetorium.get_visible_nuntii("vorenus", limit=50)  ← PRAETORIUM READ
  → context = format_history_xml(visible, viewer="vorenus")
  → full_message = context + "\n\n" + nuntius.text  # history XML + "review this module"
  → session.client.query(full_message)              # SDK conversation — has prior turns
  → collect_response(client, tracker=...)

legatus.py:
  → Post Nuntius(sender="vorenus", audience=("caesar",), reply_to=...) → PRAETORIUM
  → Prepend attribution header: "⚔️ vorenus — Code Specialist\n━━━━━━━━━━━━"
  → return [header + response]

Key observation: Centuriones get both praetorium history (injected) AND SDK conversation history (implicit). This is potentially redundant — the same messages appear twice if the centurio has been active recently.

Gaps and Issues

Gap 1: Legatus Has No Praetorium Context After Restart

When the system restarts, the legatus SDK client is recreated from scratch. The SDK conversation is empty. But the praetorium still has all historical nuntii. The legatus has no mechanism to bootstrap from praetorium — it starts as if no prior conversation happened.

Impact: After restart, Caesar says "continue where we left off" — legatus has no idea what that means. The get_history MCP tool exists but the LLM doesn't know to call it proactively.

Severity: High — breaks continuity across restarts.

Gap 2: Legatus Never Receives Praetorium History Inline

Unlike centuriones (which get history XML prepended), the legatus receives only centurio_status_xml + text. This is intentional (legatus has SDK conversation context), but creates an asymmetry:

  • Centuriones see messages from other centuriones and Caesar via praetorium
  • Legatus only sees messages from its own conversation thread

Impact: If Caesar asks the legatus "what did @vorenus say about X?", the legatus doesn't have that information in context unless it explicitly calls get_history.

Severity: Medium — legatus has the tool, just doesn't auto-use it.

Gap 3: Double History for Centuriones

When a centurio has an active SDK session and receives a new dispatch, the full_message contains:

  1. Praetorium XML history (last 50 nuntii visible to this centurio)
  2. The new message text

But the SDK client already has the conversation from prior turns. So messages from the last few turns appear twice — once in the XML block and once in the SDK's internal context.

Impact: Wastes tokens. At 50 nuntii, the XML block can be 5-10k tokens. If most of those are already in the SDK conversation, that's pure waste.

Severity: Medium — token cost, not correctness.

Gap 4: No Session Survival Across Restarts

Both legatus and centurio SDK sessions are in-memory. A system restart (crash, deploy, manual stop) loses all conversation state. The praetorium persists nuntii, but there's no mechanism to:

  1. Detect that a prior session existed
  2. Rebuild SDK context from praetorium history
  3. Inform the LLM that context was restored (not original)

Impact: Every restart is a clean slate for the LLM, even though the praetorium has full history.

Severity: High — the most impactful gap.

Gap 5: No Cross-Centurio Conversation Visibility

By design (D7), centuriones only see nuntii addressed to them or to "all". If @vorenus produces findings and Caesar then asks @pullo to build on them, pullo has no visibility into vorenus's conversation.

Current workaround: Centuriones can publish to acta (shared knowledge). But this requires the centurio to proactively publish.

Impact: Caesar must manually copy context between centuriones or use explicit @all broadcasts.

Severity: Low — this is by design, but worth noting.

Gap 6: History Window is Static

history_window = 50 is fixed for all centuriones regardless of context window size or task complexity. A simple Q&A centurio wastes tokens on 50 nuntii; a deep analysis centurio might need more context.

Severity: Low — configurable but not per-centurio.

Gap 7: No History Pruning or Summarization

The praetorium grows indefinitely. No mechanism to:

  • Archive old nuntii
  • Summarize long conversations into condensed context
  • Limit database size

Severity: Low in the short term — SQLite handles millions of rows fine. Long-term maintenance concern.

Design Decisions to Make

D-H1: Should Legatus Receive Praetorium History After Restart?

Option A — Bootstrap from praetorium (recommended): On first query after client creation, prepend the last N nuntii (formatted as XML) to the user message. The legatus gets context even after restart.

Option B — Prompt instructs legatus to call get_history: Add a line to the legatus prompt: "When you don't have context for a request, call get_history first." Relies on LLM initiative.

Option C — No change: Accept that restarts clear context. Caesar uses /history manually.

Recommendation: Option A. Simple to implement — check self._client_is_new flag, prepend history XML on first query.

D-H2: Should Centurio History Injection Be Deduplicated?

Option A — Inject only new nuntii: Track the last injected nuntius ID per centurio session. On subsequent dispatches, only inject nuntii newer than the last injection.

Option B — Reduce window on active sessions: If the session already has conversation context, use a smaller history window (e.g., 5 instead of 50).

Option C — No change: Accept the duplication. The LLM handles redundant context gracefully.

Recommendation: Option A for correctness. Track last_injected_nuntius_id on _CenturioSession. Cheap to implement, meaningful token savings.

D-H3: Should Cross-Centurio Visibility Be Expanded?

Option A — Broadcast responses to all centuriones: Change response audience from ("caesar",) to ("all",). Every centurio sees every response.

Option B — Selective broadcast via legatus: Legatus decides which responses to broadcast based on relevance. Requires prompt engineering.

Option C — No change (current): Centuriones publish to acta explicitly.

Recommendation: Option C for now. Cross-centurio context belongs in acta (persistent shared knowledge), not in the nuntii stream (transient conversation). The acta mechanism works — it just needs Caesar or centuriones to use it.

D-H4: Should History Be Summarized?

Option A — LLM-based summarization: Periodically summarize old nuntii into a condensed context block. Store summaries as special nuntii or in acta.

Option B — Sliding window with overflow to commentarii: After N nuntii, auto-archive older ones to a centurio's commentarii.

Option C — No change: Keep all nuntii, rely on LIMIT queries.

Recommendation: Option C for now. Premature optimization. The praetorium LIMIT queries are already bounded. Consider Option A when conversations routinely exceed 200+ nuntii.

D-H5: Per-Centurio History Window?

Option A — Config per centurio in prompt.md frontmatter:

markdown
---
history_window: 20
---
# Vorenus

Option B — Model-based heuristic: Larger models get larger windows.

Option C — No change: Global history_window = 50.

Recommendation: Option C for now. Not a bottleneck yet. When per-centurio model routing is implemented (from claude-flow research), revisit this.

Multi-Agent History: How It All Fits Together

Current Mental Model

  PRAETORIUM (shared SQLite)
  ┌─────────────────────────────────────────────────┐
  │  nuntius: caesar → ("legatus",) "do X"          │
  │  nuntius: legatus → ("caesar",) "done X"        │
  │  nuntius: caesar → ("vorenus",) "@vorenus Y"    │
  │  nuntius: vorenus → ("caesar",) "Y result"      │
  │  nuntius: caesar → ("vorenus","pullo") "debate"  │
  │  nuntius: vorenus → ("caesar",) "my position"   │
  │  nuntius: pullo → ("caesar",) "my position"     │
  └─────────────────────────────────────────────────┘
        │               │              │
    visible to       visible to     visible to
    legatus+caesar   vorenus+caesar  pullo+caesar
        │               │              │
  ┌─────┴─────┐  ┌──────┴──────┐ ┌────┴──────┐
  │ Legatus   │  │  Vorenus    │ │   Pullo   │
  │ SDK client│  │  SDK client │ │ SDK client│
  │ (has own  │  │ (gets XML   │ │(gets XML  │
  │  thread)  │  │  + thread)  │ │ + thread) │
  └───────────┘  └─────────────┘ └───────────┘

What Each Agent Knows

AgentSDK conversation (implicit)Praetorium history (injected)MCP tools
LegatusFull thread with Caesar since last rebuildNOT injected (only status XML)get_history, post_nuntius
CenturioFull thread since session creationLast 50 visible nuntii as XMLlist_commentarii, read_commentarium, write_commentarium, list_edicta, etc.

History Lifetime Summary

EventSDK threadPraetorium
Normal message exchangeGrows per turnGrows per nuntius (persisted)
System restartLostSurvives (SQLite)
Centurio idle timeoutLost (session reaped)Survives
Centurio token ceilingLost (session reset)Survives
Legatus roster changeLost (client rebuild)Survives
/reset <name>Lost (explicit)Survives
Manual /historyN/A (display only)Queried

Recommendations — Priority Order

1. Bootstrap legatus from praetorium after rebuild (D-H1, Option A)

When the legatus SDK client is freshly created (_ensure_legatus_client creates a new client), prepend the last history_window nuntii to the first query. Subsequent queries use the SDK thread naturally.

Implementation: Add _client_is_fresh: bool = True flag. In _query_legatus, if fresh, prepend format_history_xml(praetorium.get_history(limit=history_window), viewer="legatus"). Clear flag after first query.

Effort: ~10 lines code + tests.

2. Deduplicate centurio history injection (D-H2, Option A)

Track last_injected_nuntius_id on _CenturioSession. On dispatch, only inject nuntii with timestamps newer than the last injection.

Implementation: Add last_injected_ts: str | None = None to _CenturioSession. In dispatch(), filter visible nuntii by timestamp. Update after injection.

Effort: ~15 lines code + tests.

3. Inform LLM of context restoration

When bootstrapping from praetorium (recommendation 1), add a system note:

xml
<context_notice>
This conversation context was restored from the praetorium after a session restart.
You may not have full conversational context from the previous session.
If Caesar's request depends on context you don't have, ask for clarification.
</context_notice>

Effort: ~5 lines.

4. Document the dual-history model

The implicit SDK thread + explicit praetorium pattern is powerful but non-obvious. Add a section to the architecture docs explaining:

  • SDK thread = short-term conversational memory (volatile)
  • Praetorium = long-term message log (persistent)
  • Centuriones get both; legatus gets only SDK thread (plus tools)

Effort: Documentation only.

Built with Roman discipline.