Telegram Markdown Rendering
Research into rendering LLM markdown output as formatted Telegram messages.
Problem
Claude returns markdown-formatted text (bold, code blocks, lists, etc.). Currently, run_legatus_request() in legio/telegram/utils.py escapes all LLM output with html.escape() via _safe_html(), sending it as plain text. Users see raw markdown syntax instead of formatted messages.
Current Architecture
Claude response (markdown)
→ _safe_html() escapes everything
→ edit_html() sends with parse_mode="HTML"
→ User sees: **bold** `code` - list itemKey Functions
| Function | File | Role |
|---|---|---|
_safe_html(text) | telegram/utils.py:36 | html.escape() on LLM output |
reply_html(msg, text) | telegram/utils.py:21 | Send with parse_mode="HTML" |
edit_html(msg, text) | telegram/utils.py:31 | Edit with parse_mode="HTML" |
run_legatus_request() | telegram/utils.py:119 | Orchestrates response lifecycle |
split_message(text) | telegram/utils.py:48 | Splits at \n\n, max 4000 chars |
render_attribution_header() | rendering.py:85 | Centurio name/role header |
Security Controls
- Line 155-156:
_safe_html(first)escapes LLM output before sending - Line 101-106: Attribution header escapes name and description separately
- Generic error message on exception (line 168) prevents leaking internals
Telegram Parse Modes
HTML (current)
Supported tags: <b>, <i>, <u>, <s>, <code>, <pre>, <pre><code class="language-X">, <a href="">, <blockquote>, <tg-spoiler>.
Pros:
- Already in use throughout the codebase
- Predictable escaping (just
<,>,&) - Well-tested infrastructure (
reply_html,edit_html) - Code blocks support language attribute
Cons:
- LLM output is markdown, not HTML — requires conversion
MarkdownV2
Syntax: *bold*, _italic_, `code`, ```pre```, ~strike~, >blockquote, ||spoiler||.
Pros:
- Closer to Claude's natural output format
Cons:
- 18 characters must be escaped outside formatting:
_ * [ ] ( ) ~> # + - = | { } . !` - Claude's markdown is not MarkdownV2-compliant (different escaping rules)
- One unescaped character = entire message fails to send
- Nested formatting has strict ordering rules
- No language attribute on code blocks (just
```code```) - Would require replacing all existing HTML infrastructure
Markdown (legacy)
Deprecated. Supports only *bold*, `code`, ```pre```. Do not use.
Implementation Strategies
Strategy A: Markdown → Telegram HTML (recommended)
Convert Claude's markdown to Telegram-compatible HTML before sending.
Claude response (markdown)
→ markdown_to_telegram_html() converts formatting
→ split_message() chunks the result
→ edit_html() sends with parse_mode="HTML"
→ User sees: **bold** code list item (formatted)Conversion mapping:
| Markdown | Telegram HTML |
|---|---|
**bold** / __bold__ | <b>bold</b> |
*italic* / _italic_ | <i>italic</i> |
`inline code` | <code>inline code</code> |
```lang\nblock\n``` | <pre><code class="language-lang">block</code></pre> |
> quote | <blockquote>quote</blockquote> |
~~strike~~ | <s>strike</s> |
[text](url) | <a href="url">text</a> |
- item / * item | • item (Unicode bullet, no HTML tag) |
1. item | 1. item (plain text, Telegram has no <ol>) |
# Heading | <b>Heading</b> (no heading tags in Telegram) |
Approach options:
Library:
mistune— Fast, pure Python markdown parser. Write a custom Telegram HTML renderer. Handles edge cases (nested formatting, escaping). ~150 lines for renderer.Library:
markdown-it-py— Port of markdown-it. Token-based, highly configurable. More complex API but more accurate parsing.Regex-based — Simple regex replacements (
**(.+?)**→<b>\1</b>). Fast but fragile: fails on nested formatting, code blocks containing markdown syntax, edge cases. Not recommended for production.Custom parser — Hand-rolled state machine. Full control, zero dependencies. ~200-300 lines. Risk of bugs on edge cases.
Recommendation: mistune with custom renderer. Lightweight (single dependency), well-maintained, handles all edge cases. Custom renderer is ~100-150 lines.
Security: All text nodes in the renderer must call html.escape(). Only recognized markdown constructs produce HTML tags. Unknown input passes through escaped. This preserves the existing security posture.
Strategy B: Switch to MarkdownV2
Send Claude's output with parse_mode="MarkdownV2" after escaping special characters.
Problems:
- Must escape 18 characters outside formatting spans — requires parsing markdown structure anyway
- Claude may produce markdown that doesn't conform to Telegram's MarkdownV2 spec
- Existing
reply_html/edit_htmlinfrastructure must be replaced or duplicated - Attribution headers, command responses, and error messages all use HTML — mixed modes add complexity
- One escaping mistake = message delivery failure
Verdict: More work, more fragile, no benefit over Strategy A.
Strategy C: Plain Text + Selective Formatting
Keep _safe_html() but post-process to add formatting for obvious patterns (code blocks, bullet lists).
Problems:
- Half-measure — some formatting works, some doesn't
- Hard to handle code blocks reliably without a real parser
- Still looks bad for most responses
Verdict: Not recommended.
Interaction with Existing Code
split_message() Compatibility
split_message() splits at \n\n boundaries with a 4000-char limit (buffer for HTML tags). After markdown→HTML conversion, the text will contain HTML tags that increase length. Options:
- Convert first, then split — HTML tags counted toward limit. Simple, correct.
- Split first, then convert — Risk of splitting mid-markdown construct (e.g., code block). Dangerous.
Decision: Convert first, then split. May need to reduce max_len slightly if HTML overhead is significant (4000 already provides 96-char buffer).
render_attribution_header() Compatibility
Attribution headers already produce HTML (html.escape on name/description). The rendered LLM response is appended after the header. Both produce HTML — fully compatible.
Message Editing
run_legatus_request() sends an initial ⏳ status via edit_html(), then edits it with the response. The converted HTML response replaces the status cleanly.
Dependency Considerations
Adding mistune to pyproject.toml dependencies:
- Pure Python, no C extensions
- Well-maintained (active development)
- Small footprint (~30KB)
- Already handles CommonMark spec
- No conflict with existing dependencies
- MIT licensed
Risks
Telegram tag limit — Telegram may reject messages with deeply nested or malformed HTML. Mitigation: the renderer produces flat, non-nested tags.
Message length inflation — HTML tags add bytes. A 4000-char markdown response might exceed 4096 after conversion. Mitigation: convert before splitting.
Code blocks with HTML — Code blocks may contain
<,>,&. Mitigation: escape code block content, only wrap with<pre><code>.Claude output variation — Claude doesn't always produce clean markdown. Mitigation: the parser handles partial/broken markdown gracefully (passes through as escaped text).
Recommendation
Strategy A with mistune. Implementation plan:
- Add
mistunetopyproject.tomldependencies - Create
legio/telegram/markdown_render.py— custommistunerenderer producing Telegram HTML - Replace
_safe_html(first)inrun_legatus_request()withmarkdown_to_telegram_html(first) - Keep
_safe_html()for non-LLM text (command output, error messages, attribution headers) - Write comprehensive tests (nested formatting, code blocks, edge cases)
- Maintain 100% test coverage