Telegram UX Implementation Plan
Refined from Caesar's full plan (20260215-0905) and Antony's UX research (20260215-1600).
Guiding principle: ship the highest-value items first, defer complexity until scale demands it.
Scope: What We Build Now vs. Later
Phase 1 — Now (this plan)
| WI | Item | Type | Size |
|---|---|---|---|
| A | Attribution headers with role context | Code | S |
| B | Humanized status messages | Code | S |
| C | Reply-to-original threading | Code | S |
| D | New commands: /remove, /edict, /edicta, /revoke, /acta, /history, /reset | Code | M |
| E | TOTP authorization gate for destructive actions | Code | L |
| F | Dangerous action confirmation (non-TOTP tier) | Prompt | S |
| G | Routing narration and consult/debate protocols | Prompt | S |
Phase 2 — Deferred (build when needed)
| Item | Trigger to Build |
|---|---|
| Mission context + topic support | Caesar reports friction with flat chat at 10+ centuriones |
Mission persistence + /assign | Same trigger as above |
| Inline keyboards for selection | 15+ centuriones; @mentions become unwieldy |
| Legatus status panel (edited-in-place) | Caesar requests a persistent dashboard |
Centurio grouping in /status | 10+ centuriones |
Rationale: The mission system (WI-001/002 in Caesar's plan) is the largest piece of work and solves a scalability problem that doesn't exist yet at ≤10 centuriones. The current @mention pattern is clean and sufficient. TOTP and attribution deliver immediate value.
Current Behavior Inventory
Caesar (Telegram) ──→ TelegramBot._handle_message()
├── reply_text("⏳") → status_msg
├── _keep_typing() → typing indicator
├── _make_status_callback(status_msg)
└── Legatus.handle_message(text, user_id, on_status=...)
├── @mentions → dispatch_parallel() → attributed responses
└── no mentions → _query_legatus() → legatus response
Responses: status_msg.edit_text(first_chunk), reply_text(remaining_chunks)
Attribution: "⚔️ {name}\n\n{text}" (hardcoded in legatus.py:361)Files Touched
| File | Current LOC | Role |
|---|---|---|
legio/telegram/bot.py | 157 | Telegram handlers, commands |
legio/legatus.py | 229 | Orchestrator, tools, attribution |
legio/session.py | 152 | SDK sessions, format_status |
legio/rendering.py | 37 | XML/template rendering |
legio/config.py | 29 | Config loading |
legio/centurio.py | 37 | Centurio data model |
legio/errors.py | 6 | Domain exceptions |
WI-A: Attribution Headers with Role Context
Goal
Replace the minimal ⚔️ vorenus header with a richer format that shows specialization.
Current
⚔️ vorenus
Response text here...Target
⚔️ vorenus — Code Specialist
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
Response text here...- Name + first non-heading line from prompt.md (already extracted by
get_centurio_description()) - Unicode thin line separator (
━repeated) - Legatus responses have no header — the default voice
Implementation
- Move attribution formatting from
legatus.py:361torendering.pyasrender_attribution_header(name, description). - Call from
legatus.handle_message()when assembling attributed responses. - HTML-escape both name and description (already done for name via
_safe_htmlin bot.py).
Tests
tests/test_rendering.py: header format, HTML escaping, empty description fallback.tests/test_legatus.py: attributed responses include separator line.
Estimate: S
WI-B: Humanized Status Messages
Goal
Replace raw tool identifiers in status updates with natural English.
Current
🔧 Using search_web...
🔧 Dispatching to vorenus...Target
⏳ Searching the web...
⏳ Dispatching to vorenus...
⏳ Reading edicta...
⏳ Writing to commentarii...
⏳ Thinking...Implementation
- Update
format_status()insession.py:- Map known tool names to human descriptions via a dict:
{"search_web": "Searching the web", "list_edicta": "Reading edicta", "write_commentarium": "Writing to commentarii", ...} - Unknown tools:
"Using {name}..."(fallback) - Use ⏳ consistently (not 🔧) to match the status_msg initial emoji
- Map known tool names to human descriptions via a dict:
- Update
_make_prefixed_callbackprefix format from[name]to⏳ [name]for parallel dispatch.
Tests
- Update
TestFormatStatuswith new expected strings. - Add test for unknown tool fallback.
Estimate: S
WI-C: Reply-to-Original Threading
Goal
Centurio responses reply to Caesar's original message, creating visual context chains in Telegram.
Current
All responses edit the status_msg or send as new messages — no reply-to linking.
Target
Caesar: @vorenus check the auth module ← message_id: 42
└── ⚔️ vorenus — Code Specialist ← reply_to: 42
Found 3 issues...Implementation
- In
bot.py:_handle_message(), passupdate.message.message_idthrough to response sending. - When sending continuation responses (after the first status_msg edit), use
reply_to_message_id=original_message_id. - The first response still edits status_msg (this is the ⏳ → result transition).
- Second+ responses (multi-centurio, long splits) reply to the original.
Tests
tests/test_telegram_bot.py: verifyreply_to_message_idis set on continuation messages.
Estimate: S
WI-D: New Telegram Commands
Goal
Expose memoria and management operations as Telegram commands.
New Commands
| Command | Action | Authorization |
|---|---|---|
/remove <name> | Remove a centurio | TOTP (WI-E) |
/edict <name> <text> | Publish standing order | Prompt confirmation (WI-F) |
/edicta | List all standing orders | None |
/revoke <name> | Revoke an edictum | TOTP (WI-E) |
/acta | Show recent shared knowledge | None |
/history [n] | Show last N praetorium nuntii | None |
/reset <name> | Reset a centurio's session | None |
Implementation
- Add 7 command handlers in
bot.py. - Read-only commands (
/edicta,/acta,/history) call memoria/praetorium directly and format output. /resetcallssession_mgr.disconnect(name)(new method — disconnect and remove session so next dispatch creates a fresh one)./removeand/revokedelegate to the TOTP flow (WI-E)./edictdelegates to handle_message with prompt-level confirmation (WI-F).- Register all commands via
setMyCommandsat startup for autocomplete.
File size concern
bot.py is at 157 LOC. Adding 7 handlers will push it toward 300+. Plan to split: extract command handlers into legio/telegram/commands.py if the file crosses 300 LOC during implementation.
Tests
- One test class per command in
tests/test_telegram_bot.py. - Caesar-only gate test for each.
Estimate: M
WI-E: TOTP Authorization Gate
Goal
Require a Google Authenticator OTP for destructive actions. This is the one place where code enforcement overrides prompt, because irreversible actions need a hard guarantee.
Design
New vocabulary term: auctoritas (Latin: authorization, authority). Plural: auctoritates.
New data model: An auctoritas represents a pending authorization request bound to a specific action.
@dataclass
class Auctoritas:
id: str # UUID
action: str # e.g., "remove_centurio"
payload: dict[str, str] # e.g., {"name": "vorenus"}
chat_id: int
message_id: int # OTP request message (for deletion)
expires_at: datetime
attempts: int = 0Flow:
1. Caesar: /remove vorenus
2. Bot creates Auctoritas, sends OTP request message:
"🔐 Removing centurio vorenus requires authorization.
Enter your 6-digit OTP:"
3. Caesar: 123456
4. Bot verifies TOTP → if valid, executes action, deletes OTP messages (best-effort)
5. Bot: "✅ Removed centurio: vorenus"
(or "❌ Invalid OTP. 2 attempts remaining.")State machine:
pending → approved → executed
→ expired (TTL)
→ rejected (max attempts)New Files
| File | Purpose |
|---|---|
legio/totp.py | TOTP verifier wrapping pyotp |
legio/auctoritas.py | Auctoritas data model + in-memory store |
Config Changes
# config.py — new fields on LegioConfig
totp_secret: str = "" # from env: LEGIO_TOTP_SECRET (base32)
totp_required_actions: tuple[str, ...] = ("remove_centurio", "revoke_edictum")LEGIO_TOTP_SECRETfrom.env(required for TOTP; if absent, TOTP actions are disabled with warning).totp_required_actionsfrom[security]section inlegio.toml. Hardcode defaults; only override if Caesar wants custom policy.- Hardcode:
totp_ttl_seconds=120,totp_max_attempts=3,totp_drift_steps=1. No config knobs for these — they rarely change and adding them now is premature.
New Dependency
pyotp— standard TOTP/HOTP library, Google Authenticator compatible.
In-Memory vs. Database
Store auctoritates in-memory (dict keyed by chat_id). Rationale:
- Single-user system — only one pending auctoritas at a time is realistic.
- Expiry is 2 minutes — no persistence needed across restarts.
- Avoids schema migration complexity.
- If persistence becomes needed later, migrate to praetorium table.
Security Controls
# SECURITY: TOTP secret loaded from env only, never logged or persisted to disk# SECURITY: OTP messages deleted after verification (best-effort)# SECURITY: auctoritas expires after TTL regardless of attempts# SECURITY: timing-safe comparison for TOTP codes- OTP reply messages are also deleted (Caesar's typed OTP code).
Tests
| File | Tests |
|---|---|
tests/test_totp.py (new) | Valid OTP accepted, invalid rejected, drift tolerance, timing-safe comparison |
tests/test_auctoritas.py (new) | Create, expire, max attempts, approve, execute |
tests/test_config.py | LEGIO_TOTP_SECRET loaded, missing secret disables TOTP |
tests/test_telegram_bot.py | Full flow: request → OTP → execution; invalid OTP; expired request |
Estimate: L
WI-F: Prompt-Level Confirmation for Medium-Risk Actions
Goal
Actions that are revocable but impactful require Caesar to reply "Confirmed" — enforced via Legatus prompt, not code.
Actions
| Action | Why Not TOTP |
|---|---|
publish_edictum | Revocable (can /revoke) |
Implementation
Add to castra/legatus/prompt.md:
## Action Authorization
### TOTP-Required (Hard Gate)
These actions are blocked until Caesar provides a valid OTP.
The bot will ask for it automatically — you cannot bypass this:
- Removing a centurio
- Revoking an edictum
### Confirmation-Required (Soft Gate)
Before these actions, explain the impact and ask Caesar to reply "Confirmed":
- Publishing an edictum (standing order for all centuriones)
### No Confirmation Needed
- Creating a centurio (reversible)
- Dispatching to a centurio (read-only)
- Reading edicta, acta, history (read-only)Estimate: S (prompt edit only, no code)
WI-G: Routing Narration and Consult/Debate Protocols
Goal
Add behavioral protocols to Legatus for visible routing, one-shot consult, and structured debate. All via prompt — no orchestration code.
Implementation
Add to castra/legatus/prompt.md:
## Routing Narration
When you auto-route (no @mentions from Caesar):
1. Briefly explain which centurio you chose and why (one sentence).
2. Then dispatch.
Caesar should never be surprised about who is working on their request.
## Consult Protocol
When Caesar uses `/consult @name <question>`:
1. Dispatch the question to the named centurio.
2. Return their answer with attribution.
3. Add a one-sentence Legatus summary if the answer needs interpretation.
## Debate Protocol
When Caesar uses `/debate @a @b <question>`:
1. Dispatch the question to both centuriones.
2. If their answers conflict, dispatch one round of rebuttal (each sees the other's answer).
3. Produce a final synthesis: options, recommendation, and risks.
4. Maximum 3 rounds total. Stop even if no consensus.
5. Caesar sees the synthesis. Raw transcripts are saved to your commentarii.Why Prompt, Not Code
The debate "rounds" are a behavioral pattern, not a state machine. The Legatus SDK session maintains conversation context — it can count rounds itself. A code-level hard timeout (already exists: SDK token limit + session idle timeout) prevents runaway debates.
Estimate: S (prompt edits only)
New Domain Vocabulary
| Concept | Singular | Plural | Python Class | Python Var |
|---|---|---|---|---|
| Authorization request | Auctoritas | Auctoritates | Auctoritas | auctoritas / auctoritates |
Add to 00-domain-vocabulary.md and dev-docs/memos/20260215-1700-domain-vocabulary.md during WI-E.
Implementation Order
WI-A (S) → Attribution headers Zero dependencies, immediate visual improvement
WI-B (S) → Humanized status Zero dependencies, pairs with WI-A
WI-C (S) → Reply-to threading Zero dependencies, context improvement
WI-F (S) → Prompt confirmation Zero dependencies, prompt-only
WI-G (S) → Routing/consult/debate Zero dependencies, prompt-only
WI-D (M) → New commands Needs WI-E for /remove and /revoke
WI-E (L) → TOTP gate Needs pyotp, new files, new testsParallel: WI-A + WI-B + WI-C can be done in a single pass (all touch bot.py/rendering.py/session.py). Parallel: WI-F + WI-G can be done together (both are prompt edits to legatus/prompt.md). Then: WI-D + WI-E together (commands + TOTP gate are tightly coupled for /remove and /revoke).
Estimated Total
| Size | Count | Time |
|---|---|---|
| S | 5 | ~2 hours |
| M | 1 | ~2 hours |
| L | 1 | ~4 hours |
| Total | 7 | ~8 hours |
Decision Log
| # | Decision | Rationale |
|---|---|---|
| D1 | Keep single bot | Operational simplicity; identity via in-message headers |
| D2 | Flat commands (no subcommands) | Telegram autocomplete; conversational UX |
| D3 | TOTP in code for destructive; prompt for revocable | Hard guarantee where it matters; flexibility where it doesn't |
| D4 | Defer mission/topic system | Solves a scale problem that doesn't exist yet (≤10 centuriones) |
| D5 | Defer inline keyboards | @mentions work at current scale; keyboards add code complexity |
| D6 | Debate protocol in prompt, not code | "Prompt over code"; SDK session tracks rounds naturally |
| D7 | In-memory auctoritas store | Single user, 2-min TTL; no persistence needed |
| D8 | Hardcode TOTP tuning params | ttl=120, max_attempts=3, drift=1 — rarely change; avoid config bloat |
Testing Procedures
- After each WI:
ruff check . && ruff format --check . && python scripts/check_file_length.py && pytest - Full gate:
pytest --cov --cov-fail-under=100 - Security tests:
pytest -m security(TOTP verification, Caesar-only gates, input validation)
Manual Test Checklist
- [ ]
/helplists all new commands - [ ] Send
@vorenus do this→ response has rich attribution header with role - [ ] Send message → status updates show humanized text (not raw tool names)
- [ ] Multi-centurio response → each reply threads to original message
- [ ]
/remove vorenus→ OTP prompt appears → valid OTP executes → messages deleted - [ ]
/remove vorenus→ wrong OTP three times → request rejected - [ ]
/edict test-rule Always test first→ Legatus asks "Confirmed" → publishes - [ ]
/edicta→ lists standing orders - [ ]
/history 5→ shows last 5 nuntii