Telegram UX Implementation Plan

Refined from Caesar's full plan (20260215-0905) and Antony's UX research (20260215-1600).

Guiding principle: ship the highest-value items first, defer complexity until scale demands it.

Scope: What We Build Now vs. Later

Phase 1 — Now (this plan)

WI	Item	Type	Size
A	Attribution headers with role context	Code	S
B	Humanized status messages	Code	S
C	Reply-to-original threading	Code	S
D	New commands: `/remove`, `/edict`, `/edicta`, `/revoke`, `/acta`, `/history`, `/reset`	Code	M
E	TOTP authorization gate for destructive actions	Code	L
F	Dangerous action confirmation (non-TOTP tier)	Prompt	S
G	Routing narration and consult/debate protocols	Prompt	S

Phase 2 — Deferred (build when needed)

Item	Trigger to Build
Mission context + topic support	Caesar reports friction with flat chat at 10+ centuriones
Mission persistence + `/assign`	Same trigger as above
Inline keyboards for selection	15+ centuriones; @mentions become unwieldy
Legatus status panel (edited-in-place)	Caesar requests a persistent dashboard
Centurio grouping in `/status`	10+ centuriones

Rationale: The mission system (WI-001/002 in Caesar's plan) is the largest piece of work and solves a scalability problem that doesn't exist yet at ≤10 centuriones. The current @mention pattern is clean and sufficient. TOTP and attribution deliver immediate value.

Current Behavior Inventory

Caesar (Telegram) ──→ TelegramBot._handle_message()
                        ├── reply_text("⏳")  →  status_msg
                        ├── _keep_typing()     →  typing indicator
                        ├── _make_status_callback(status_msg)
                        └── Legatus.handle_message(text, user_id, on_status=...)
                              ├── @mentions → dispatch_parallel() → attributed responses
                              └── no mentions → _query_legatus() → legatus response

Responses: status_msg.edit_text(first_chunk), reply_text(remaining_chunks)
Attribution: "⚔️ {name}\n\n{text}" (hardcoded in legatus.py:361)

Files Touched

File	Current LOC	Role
`legio/telegram/bot.py`	157	Telegram handlers, commands
`legio/legatus.py`	229	Orchestrator, tools, attribution
`legio/session.py`	152	SDK sessions, format_status
`legio/rendering.py`	37	XML/template rendering
`legio/config.py`	29	Config loading
`legio/centurio.py`	37	Centurio data model
`legio/errors.py`	6	Domain exceptions

WI-A: Attribution Headers with Role Context

Goal

Replace the minimal ⚔️ vorenus header with a richer format that shows specialization.

Current

⚔️ vorenus

Response text here...

Target

⚔️ vorenus — Code Specialist
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

Response text here...

Name + first non-heading line from prompt.md (already extracted by get_centurio_description())
Unicode thin line separator (━ repeated)
Legatus responses have no header — the default voice

Implementation

Move attribution formatting from legatus.py:361 to rendering.py as render_attribution_header(name, description).
Call from legatus.handle_message() when assembling attributed responses.
HTML-escape both name and description (already done for name via _safe_html in bot.py).

Tests

tests/test_rendering.py: header format, HTML escaping, empty description fallback.
tests/test_legatus.py: attributed responses include separator line.

Estimate: S

WI-B: Humanized Status Messages

Goal

Replace raw tool identifiers in status updates with natural English.

Current

🔧 Using search_web...
🔧 Dispatching to vorenus...

Target

⏳ Searching the web...
⏳ Dispatching to vorenus...
⏳ Reading edicta...
⏳ Writing to commentarii...
⏳ Thinking...

Implementation

Update format_status() in session.py:
- Map known tool names to human descriptions via a dict: {"search_web": "Searching the web", "list_edicta": "Reading edicta", "write_commentarium": "Writing to commentarii", ...}
- Unknown tools: "Using {name}..." (fallback)
- Use ⏳ consistently (not 🔧) to match the status_msg initial emoji
Update _make_prefixed_callback prefix format from [name] to ⏳ [name] for parallel dispatch.

Tests

Update TestFormatStatus with new expected strings.
Add test for unknown tool fallback.

Estimate: S

WI-C: Reply-to-Original Threading

Goal

Centurio responses reply to Caesar's original message, creating visual context chains in Telegram.

Current

All responses edit the status_msg or send as new messages — no reply-to linking.

Target

Caesar: @vorenus check the auth module       ← message_id: 42
    └── ⚔️ vorenus — Code Specialist         ← reply_to: 42
        Found 3 issues...

Implementation

In bot.py:_handle_message(), pass update.message.message_id through to response sending.
When sending continuation responses (after the first status_msg edit), use reply_to_message_id=original_message_id.
The first response still edits status_msg (this is the ⏳ → result transition).
Second+ responses (multi-centurio, long splits) reply to the original.

Tests

tests/test_telegram_bot.py: verify reply_to_message_id is set on continuation messages.

Estimate: S

WI-D: New Telegram Commands

Goal

Expose memoria and management operations as Telegram commands.

New Commands

Command	Action	Authorization
`/remove <name>`	Remove a centurio	TOTP (WI-E)
`/edict <name> <text>`	Publish standing order	Prompt confirmation (WI-F)
`/edicta`	List all standing orders	None
`/revoke <name>`	Revoke an edictum	TOTP (WI-E)
`/acta`	Show recent shared knowledge	None
`/history [n]`	Show last N praetorium nuntii	None
`/reset <name>`	Reset a centurio's session	None

Implementation

Add 7 command handlers in bot.py.
Read-only commands (/edicta, /acta, /history) call memoria/praetorium directly and format output.
/reset calls session_mgr.disconnect(name) (new method — disconnect and remove session so next dispatch creates a fresh one).
/remove and /revoke delegate to the TOTP flow (WI-E).
/edict delegates to handle_message with prompt-level confirmation (WI-F).
Register all commands via setMyCommands at startup for autocomplete.

File size concern

bot.py is at 157 LOC. Adding 7 handlers will push it toward 300+. Plan to split: extract command handlers into legio/telegram/commands.py if the file crosses 300 LOC during implementation.

Tests

One test class per command in tests/test_telegram_bot.py.
Caesar-only gate test for each.

Estimate: M

WI-E: TOTP Authorization Gate

Goal

Require a Google Authenticator OTP for destructive actions. This is the one place where code enforcement overrides prompt, because irreversible actions need a hard guarantee.

Design

New vocabulary term: auctoritas (Latin: authorization, authority). Plural: auctoritates.

New data model: An auctoritas represents a pending authorization request bound to a specific action.

python

@dataclass
class Auctoritas:
    id: str                    # UUID
    action: str                # e.g., "remove_centurio"
    payload: dict[str, str]    # e.g., {"name": "vorenus"}
    chat_id: int
    message_id: int            # OTP request message (for deletion)
    expires_at: datetime
    attempts: int = 0

Flow:

1. Caesar: /remove vorenus
2. Bot creates Auctoritas, sends OTP request message:
   "🔐 Removing centurio vorenus requires authorization.
    Enter your 6-digit OTP:"
3. Caesar: 123456
4. Bot verifies TOTP → if valid, executes action, deletes OTP messages (best-effort)
5. Bot: "✅ Removed centurio: vorenus"
   (or "❌ Invalid OTP. 2 attempts remaining.")

State machine:

pending → approved → executed
       → expired (TTL)
       → rejected (max attempts)

New Files

File	Purpose
`legio/totp.py`	TOTP verifier wrapping `pyotp`
`legio/auctoritas.py`	Auctoritas data model + in-memory store

Config Changes

python

# config.py — new fields on LegioConfig
totp_secret: str = ""   # from env: LEGIO_TOTP_SECRET (base32)
totp_required_actions: tuple[str, ...] = ("remove_centurio", "revoke_edictum")

LEGIO_TOTP_SECRET from .env (required for TOTP; if absent, TOTP actions are disabled with warning).
totp_required_actions from [security] section in legio.toml. Hardcode defaults; only override if Caesar wants custom policy.
Hardcode: totp_ttl_seconds=120, totp_max_attempts=3, totp_drift_steps=1. No config knobs for these — they rarely change and adding them now is premature.

New Dependency

pyotp — standard TOTP/HOTP library, Google Authenticator compatible.

In-Memory vs. Database

Store auctoritates in-memory (dict keyed by chat_id). Rationale:

Single-user system — only one pending auctoritas at a time is realistic.
Expiry is 2 minutes — no persistence needed across restarts.
Avoids schema migration complexity.
If persistence becomes needed later, migrate to praetorium table.

Security Controls

# SECURITY: TOTP secret loaded from env only, never logged or persisted to disk
# SECURITY: OTP messages deleted after verification (best-effort)
# SECURITY: auctoritas expires after TTL regardless of attempts
# SECURITY: timing-safe comparison for TOTP codes
OTP reply messages are also deleted (Caesar's typed OTP code).

Tests

File	Tests
`tests/test_totp.py` (new)	Valid OTP accepted, invalid rejected, drift tolerance, timing-safe comparison
`tests/test_auctoritas.py` (new)	Create, expire, max attempts, approve, execute
`tests/test_config.py`	`LEGIO_TOTP_SECRET` loaded, missing secret disables TOTP
`tests/test_telegram_bot.py`	Full flow: request → OTP → execution; invalid OTP; expired request

Estimate: L

WI-F: Prompt-Level Confirmation for Medium-Risk Actions

Goal

Actions that are revocable but impactful require Caesar to reply "Confirmed" — enforced via Legatus prompt, not code.

Actions

Action	Why Not TOTP
`publish_edictum`	Revocable (can `/revoke`)

Implementation

Add to castra/legatus/prompt.md:

markdown

## Action Authorization

### TOTP-Required (Hard Gate)
These actions are blocked until Caesar provides a valid OTP.
The bot will ask for it automatically — you cannot bypass this:
- Removing a centurio
- Revoking an edictum

### Confirmation-Required (Soft Gate)
Before these actions, explain the impact and ask Caesar to reply "Confirmed":
- Publishing an edictum (standing order for all centuriones)

### No Confirmation Needed
- Creating a centurio (reversible)
- Dispatching to a centurio (read-only)
- Reading edicta, acta, history (read-only)

Estimate: S (prompt edit only, no code)

WI-G: Routing Narration and Consult/Debate Protocols

Goal

Add behavioral protocols to Legatus for visible routing, one-shot consult, and structured debate. All via prompt — no orchestration code.

Implementation

Add to castra/legatus/prompt.md:

markdown

## Routing Narration

When you auto-route (no @mentions from Caesar):
1. Briefly explain which centurio you chose and why (one sentence).
2. Then dispatch.
Caesar should never be surprised about who is working on their request.

## Consult Protocol

When Caesar uses `/consult @name <question>`:
1. Dispatch the question to the named centurio.
2. Return their answer with attribution.
3. Add a one-sentence Legatus summary if the answer needs interpretation.

## Debate Protocol

When Caesar uses `/debate @a @b <question>`:
1. Dispatch the question to both centuriones.
2. If their answers conflict, dispatch one round of rebuttal (each sees the other's answer).
3. Produce a final synthesis: options, recommendation, and risks.
4. Maximum 3 rounds total. Stop even if no consensus.
5. Caesar sees the synthesis. Raw transcripts are saved to your commentarii.

Why Prompt, Not Code

The debate "rounds" are a behavioral pattern, not a state machine. The Legatus SDK session maintains conversation context — it can count rounds itself. A code-level hard timeout (already exists: SDK token limit + session idle timeout) prevents runaway debates.

Estimate: S (prompt edits only)

New Domain Vocabulary

Concept	Singular	Plural	Python Class	Python Var
Authorization request	Auctoritas	Auctoritates	`Auctoritas`	`auctoritas` / `auctoritates`

Add to 00-domain-vocabulary.md and dev-docs/memos/20260215-1700-domain-vocabulary.md during WI-E.

Implementation Order

WI-A (S) → Attribution headers         Zero dependencies, immediate visual improvement
WI-B (S) → Humanized status            Zero dependencies, pairs with WI-A
WI-C (S) → Reply-to threading          Zero dependencies, context improvement
WI-F (S) → Prompt confirmation          Zero dependencies, prompt-only
WI-G (S) → Routing/consult/debate       Zero dependencies, prompt-only
WI-D (M) → New commands                 Needs WI-E for /remove and /revoke
WI-E (L) → TOTP gate                    Needs pyotp, new files, new tests

Parallel: WI-A + WI-B + WI-C can be done in a single pass (all touch bot.py/rendering.py/session.py). Parallel: WI-F + WI-G can be done together (both are prompt edits to legatus/prompt.md). Then: WI-D + WI-E together (commands + TOTP gate are tightly coupled for /remove and /revoke).

Estimated Total

Size	Count	Time
S	5	~2 hours
M	1	~2 hours
L	1	~4 hours
Total	7	~8 hours

Decision Log

#	Decision	Rationale
D1	Keep single bot	Operational simplicity; identity via in-message headers
D2	Flat commands (no subcommands)	Telegram autocomplete; conversational UX
D3	TOTP in code for destructive; prompt for revocable	Hard guarantee where it matters; flexibility where it doesn't
D4	Defer mission/topic system	Solves a scale problem that doesn't exist yet (≤10 centuriones)
D5	Defer inline keyboards	@mentions work at current scale; keyboards add code complexity
D6	Debate protocol in prompt, not code	"Prompt over code"; SDK session tracks rounds naturally
D7	In-memory auctoritas store	Single user, 2-min TTL; no persistence needed
D8	Hardcode TOTP tuning params	`ttl=120`, `max_attempts=3`, `drift=1` — rarely change; avoid config bloat

Testing Procedures

After each WI: ruff check . && ruff format --check . && python scripts/check_file_length.py && pytest
Full gate: pytest --cov --cov-fail-under=100
Security tests: pytest -m security (TOTP verification, Caesar-only gates, input validation)

Manual Test Checklist

[ ] /help lists all new commands
[ ] Send @vorenus do this → response has rich attribution header with role
[ ] Send message → status updates show humanized text (not raw tool names)
[ ] Multi-centurio response → each reply threads to original message
[ ] /remove vorenus → OTP prompt appears → valid OTP executes → messages deleted
[ ] /remove vorenus → wrong OTP three times → request rejected
[ ] /edict test-rule Always test first → Legatus asks "Confirmed" → publishes
[ ] /edicta → lists standing orders
[ ] /history 5 → shows last 5 nuntii

Telegram UX Implementation Plan ​

Scope: What We Build Now vs. Later ​

Phase 1 — Now (this plan) ​

Phase 2 — Deferred (build when needed) ​

Current Behavior Inventory ​

Files Touched ​

WI-A: Attribution Headers with Role Context ​

Goal ​

Current ​

Target ​

Implementation ​

Tests ​

Estimate: S ​

WI-B: Humanized Status Messages ​

Goal ​

Current ​

Target ​

Implementation ​

Tests ​

Estimate: S ​

WI-C: Reply-to-Original Threading ​

Goal ​

Current ​

Target ​

Implementation ​

Tests ​

Estimate: S ​

WI-D: New Telegram Commands ​

Goal ​

New Commands ​

Implementation ​

File size concern ​

Tests ​

Estimate: M ​

WI-E: TOTP Authorization Gate ​

Goal ​

Design ​

New Files ​

Config Changes ​

New Dependency ​

In-Memory vs. Database ​

Security Controls ​

Tests ​

Estimate: L ​

WI-F: Prompt-Level Confirmation for Medium-Risk Actions ​

Goal ​

Actions ​

Implementation ​

Estimate: S (prompt edit only, no code) ​

WI-G: Routing Narration and Consult/Debate Protocols ​

Goal ​

Implementation ​

Why Prompt, Not Code ​

Estimate: S (prompt edits only) ​

New Domain Vocabulary ​

Implementation Order ​

Estimated Total ​

Decision Log ​

Testing Procedures ​

Manual Test Checklist ​

Telegram UX Implementation Plan

Scope: What We Build Now vs. Later

Phase 1 — Now (this plan)

Phase 2 — Deferred (build when needed)

Current Behavior Inventory

Files Touched

WI-A: Attribution Headers with Role Context

Goal

Current

Target

Implementation

Tests

Estimate: S

WI-B: Humanized Status Messages

Goal

Current

Target

Implementation

Tests

Estimate: S

WI-C: Reply-to-Original Threading

Goal

Current

Target

Implementation

Tests

Estimate: S

WI-D: New Telegram Commands

Goal

New Commands

Implementation

File size concern

Tests

Estimate: M

WI-E: TOTP Authorization Gate

Goal

Design

New Files

Config Changes

New Dependency

In-Memory vs. Database

Security Controls

Tests

Estimate: L

WI-F: Prompt-Level Confirmation for Medium-Risk Actions

Goal

Actions

Implementation

Estimate: S (prompt edit only, no code)

WI-G: Routing Narration and Consult/Debate Protocols

Goal

Implementation

Why Prompt, Not Code

Estimate: S (prompt edits only)

New Domain Vocabulary

Implementation Order

Estimated Total

Decision Log

Testing Procedures

Manual Test Checklist