Headless Browser Solutions for Legio
Date: 2026-02-15 Status: Research Author: Antony
Legio deploys to remote Linux or containers where no visual browser exists. Centuriones need web interaction capabilities — research, form filling, data extraction, monitoring — without a display server.
The Problem
Centuriones run as Claude Agent SDK clients inside a Python process on a headless Linux host (bare metal, VM, or container). There is no X11/Wayland display. Any browser automation must:
- Run fully headless (no GPU, no display)
- Integrate with the Claude Agent SDK via MCP (stdio or in-process)
- Work inside Docker/Podman containers
- Keep resource footprint reasonable (single-user system)
- Be Python-native or callable from Python
Solution Landscape (2026)
Tier 1 — MCP-Native Browser Servers
These expose browser capabilities as MCP tools. The SDK client calls them like any other tool, no custom integration code needed.
1. Playwright MCP (Microsoft Official)
- Repo: https://github.com/microsoft/playwright-mcp
- Transport: stdio (spawn via
npx @playwright/mcp@latest --headless) - Approach: Accessibility-tree snapshots, not screenshots. The LLM receives structured text representing interactive elements, not pixels.
- Headless:
--headlessflag, Docker supports headless Chromium only. - Tools exposed:
browser_navigate,browser_snapshot,browser_click,browser_type,browser_select_option,browser_take_screenshot, etc. - Language: Node.js server, but stdio transport makes it language-agnostic. Python SDK connects via
StdioServerParameters. - Container: Works in Docker. Playwright auto-installs browser binaries on first run. Official Docker images available.
- Cost: Free, open-source (Apache 2.0).
- New (2026): Microsoft adding token-efficient CLI mode optimized for coding agents — reduces token consumption for common browser tasks.
Legio integration:
# In centurio session creation (session.py)
mcp_config = McpServerConfig(
command="npx",
args=["@playwright/mcp@latest", "--headless"],
transport="stdio",
)
options = ClaudeAgentOptions(
mcp_servers={"browser": mcp_config},
# ... existing config
)Verdict: Best fit for Legio. MCP-native, accessibility-tree approach is token-efficient, headless by design, free, and the SDK already supports stdio MCP servers.
2. Browser MCP (ByteDance)
- Repo: https://github.com/bytedance/browser-mcp
- Transport: stdio
- Approach: Puppeteer-based, structured accessibility data + optional vision mode for complex visual pages.
- Headless: Yes, Puppeteer default.
- Container: Works in Docker with headless Chrome.
- Cost: Free, open-source.
Verdict: Viable alternative if Playwright MCP has issues. Less mature than Microsoft's official server.
3. Browserbase Skills (Claude Agent SDK + Browse)
- Repo: https://github.com/browserbase/agent-browse
- Approach: Claude Agent SDK with a web browsing tool, built on Browserbase's managed browser infrastructure.
- Headless: Cloud-hosted browsers (no local browser needed).
- Cost: Browserbase pricing (managed service, not self-hosted).
Verdict: Overkill for Legio's single-user model. Adds cloud dependency and recurring cost. Better suited for multi-tenant production systems.
Tier 2 — Python-Native Agent Frameworks
These are full agent-browser frameworks with their own orchestration. They overlap with what Legatus already does but provide browser-specific agentic capabilities.
4. browser-use
- Repo: https://github.com/browser-use/browser-use
- Install:
pip install browser-use - Approach: Python-native. Gives AI agents browser control via Playwright under the hood. Combines accessibility tree + visual understanding + HTML extraction.
- LLM support: Anthropic, OpenAI, Google, Ollama, DeepSeek.
- Headless: Yes, Playwright headless mode.
- Container:
uvx browser-use installinstalls Chromium. - Cost: Free, open-source (MIT).
- Stars: 55k+ GitHub stars, active development.
Legio integration options:
Option A — Wrap as in-process MCP tool:
# Create a custom MCP tool that delegates to browser-use
@tool
async def browse_web(task: str) -> str:
"""Execute a browser task using AI-driven navigation."""
agent = BrowserAgent(task=task, llm=claude_client)
result = await agent.run()
return result.final_resultOption B — Expose as stdio MCP server (write thin wrapper).
Verdict: Most Pythonic option. Eliminates Node.js dependency. Risk: overlapping agent loops (browser-use has its own agentic loop vs. centurio's SDK loop). Best used as a tool, not as an agent.
5. Stagehand (Browserbase)
- Repo: https://github.com/browserbase/stagehand
- Approach: TypeScript SDK. v3 drops Playwright dependency, talks directly to Chrome via CDP. 44% faster than v2.
- Headless: Yes, CDP-native.
- Language: TypeScript only (no Python bindings).
- Cost: Free open-source, optional Browserbase cloud.
Verdict: Not suitable — TypeScript-only, no Python bindings. Would require a Node.js subprocess bridge.
Tier 3 — Managed Browser Infrastructure
Remote browser services where you send API calls and they run browsers in their cloud. Zero local browser installation.
6. Browserless (Self-Hosted Docker)
- Repo: https://github.com/browserless/browserless
- Docker:
docker run -p 3000:3000 browserless/chrome - API: REST + WebSocket. Supports Puppeteer, Playwright, Selenium connections. Clients connect to
ws://localhost:3000. - Headless: Always headless in Docker.
- Cost: Free for non-commercial. Commercial license required for production/CI. Cloud plans from free tier (1k units) up.
- Features: Screenshot API, PDF generation, content extraction, function execution, anti-detection.
Legio integration:
# Playwright connects to remote browserless instance
from playwright.async_api import async_playwright
async with async_playwright() as p:
browser = await p.chromium.connect_over_cdp(
"ws://browserless:3000"
)Verdict: Good for container deployments where you want browser isolation. Run browserless as a sidecar container. Adds operational complexity but provides clean separation.
7. Browserbase (Cloud)
- URL: https://www.browserbase.com/
- Approach: Cloud-hosted browser sessions. API-driven, no local browser. Designed for AI agent workloads.
- Scale: 1M+ concurrent sessions.
- Cost: Paid service (per-session pricing).
Verdict: Unnecessary for single-user Legio. Cloud dependency contradicts self-contained deployment goal.
Comparison Matrix
| Solution | MCP Native | Python | Headless | Container | Cost | Complexity |
|---|---|---|---|---|---|---|
| Playwright MCP | Yes (stdio) | Via stdio | Yes | Yes | Free | Low |
| Browser MCP (ByteDance) | Yes (stdio) | Via stdio | Yes | Yes | Free | Low |
| browser-use | No (wrap as tool) | Native | Yes | Yes | Free | Medium |
| Stagehand | No | No | Yes | Yes | Free | High |
| Browserless Docker | No (wrap as tool) | Via CDP | Yes | Sidecar | License | Medium |
| Browserbase Cloud | No | Via API | Cloud | N/A | Paid | Low |
Recommendation for Legio
Primary: Playwright MCP (Microsoft)
Why:
- MCP-native — zero custom integration code. Add to
mcp_serversdict in session config and centuriones immediately gain browser tools. - Accessibility-tree approach is token-efficient (structured text, not screenshots). Fits within centurio context window budgets.
- Headless by default in containers.
--headlessflag for dev. - Microsoft-backed, active development, 2026 CLI mode optimization.
- Already proven with Claude Agent SDK (stdio transport).
Integration path:
- Add
@playwright/mcpto container image (npm install -g @playwright/mcp) - Install Chromium in container (
npx playwright install chromium) - Configure as MCP server in centurio session creation
- Gate behind edictum — only centuriones with browser edictum get the tool
Container Dockerfile addition:
RUN npm install -g @playwright/mcp@latest \
&& npx playwright install --with-deps chromiumSecondary: browser-use (Python fallback)
Why:
- Eliminates Node.js dependency entirely (pure Python + Playwright).
- Richer agentic capabilities (visual understanding, multi-step navigation).
- Useful when accessibility-tree approach fails on complex SPAs.
When to use:
- If Playwright MCP's text-only approach can't handle a specific site
- If Node.js dependency becomes a deployment burden
- For tasks requiring visual page understanding (charts, images, layouts)
Integration path:
pip install browser-use- Wrap as in-process MCP tool (not a separate agent loop)
- Expose as
browse_web(task: str) -> strtool to centuriones
Sidecar option: Browserless Docker
When to use:
- Multi-container orchestration (docker-compose / k8s)
- Need browser process isolation from agent process
- Want connection pooling for multiple concurrent centuriones
Integration:
# docker-compose.yml
services:
legio:
build: .
depends_on: [browserless]
browserless:
image: browserless/chrome
ports: ["3000:3000"]Architecture Decision
Caesar (Telegram)
│
▼
Legatus
│
├─ dispatch_to_centurio("vorenus", "research competitor pricing")
│ │
│ ▼
│ Centurio "vorenus" (ClaudeSDKClient)
│ ├─ MCP: memoria_vorenus (in-process, existing)
│ ├─ MCP: browser (stdio, Playwright MCP, NEW)
│ │ ├─ browser_navigate("https://example.com")
│ │ ├─ browser_snapshot() ← accessibility tree
│ │ ├─ browser_click(ref="link-pricing")
│ │ └─ browser_snapshot() ← updated tree
│ └─ Returns structured result to Legatus
│
▼
Caesar sees: "⚔️ Vorenus — Code Specialist\n━━━━━━━\nCompetitor X charges..."Open Questions
Selective enablement: Should all centuriones get browser tools, or only those with a specific edictum/armarium? Recommend: opt-in via
tools.jsonper centurio.Resource budget: Headless Chromium uses ~100-300MB RAM per instance. With
max_centuriones=10, worst case is 3GB for browsers alone. Consider: shared browser instance via Browserless sidecar.Token cost: Each
browser_snapshotreturns the full accessibility tree (~2-5k tokens for complex pages). Multiple snapshots per task add up. Consider: token budget per browser session.Security: Centuriones browsing arbitrary URLs could hit malicious content. Consider: URL allowlist in edictum, or sandboxed browser network policy in container.
Node.js dependency: Playwright MCP requires Node.js in the container. If pure-Python is a hard requirement, browser-use becomes the primary choice instead.
Security: Authentication, 2FA, and Bot Detection
Three hard problems arise when centuriones browse the real web: credentials, multi-factor auth, and sites that actively block automation.
Problem 1 — Credential Management
Centuriones must never see raw passwords. If a password appears in the LLM's context window, it becomes part of the conversation that could be logged, cached in token trackers, or leaked in error messages.
Approach: Session State Files (not passwords)
Playwright supports saving and restoring full browser state — cookies, localStorage, IndexedDB — to a JSON file via storageState. The workflow:
- Caesar logs in once manually (or via a bootstrap script) on a real browser and exports
storageState.json. - The file is stored encrypted in
castra/browser/sessions/with filesystem permissions0600. - On centurio browser launch, Playwright MCP loads the storage state, skipping the login flow entirely.
- Centuriones never see credentials — they receive an already- authenticated browser context.
# Conceptual flow
Caesar (one-time) ──► manual login ──► export storageState.json
│
▼
castra/browser/sessions/
github.json (encrypted)
google.json (encrypted)
│
Centurio ─► Playwright MCP │
├─ load storageState ◄────┘
├─ browser_navigate (already logged in)
└─ no password ever touches the LLMSession refresh: Cookies expire. Two strategies:
| Strategy | How | When |
|---|---|---|
| Proactive refresh | Background cron re-exports storage state before expiry | Long-lived sessions (GitHub, internal tools) |
| On-demand re-auth | Centurio detects 401/redirect-to-login, asks Caesar via Telegram to re-authenticate | Short-lived sessions, strict security sites |
Secrets that must exist (for automated bootstrap, optional):
| Secret | Storage | Access |
|---|---|---|
| Site passwords | Vault (HashiCorp, SOPS, age-encrypted file) | Bootstrap script only, never LLM |
| TOTP seeds | Same vault | Bootstrap script only |
| OAuth client secrets | .env | Refresh token flow only |
| Storage state files | castra/browser/sessions/ (0600) | Playwright MCP at launch |
Rule: no credential ever enters a centurio's prompt, system message, or tool call arguments. Credentials flow through infrastructure only.
Problem 2 — Two-Factor Authentication (2FA / MFA)
Three 2FA patterns exist in the wild, each with different headless feasibility:
A. TOTP (Google Authenticator, Authy)
Automatable. The TOTP seed is a shared secret (base32 string). Given the seed, any code can generate the 6-digit code:
import pyotp
code = pyotp.TOTP(seed).now() # same as Google AuthenticatorLegio already has pyotp as a dependency (for the Auctoritas gate). A bootstrap script can:
- Read the TOTP seed from vault
- Generate the current code
- Fill it into the 2FA form via Playwright
- Export the authenticated
storageState.json
Security constraint: The TOTP seed must never be passed to the LLM. The bootstrap script runs outside the SDK agent loop.
B. Push Notification (Microsoft Authenticator, Duo)
Not automatable. These require Caesar to tap "Approve" on a phone.
Legio workaround: When a centurio encounters a push-MFA page:
- Centurio detects the MFA prompt (accessibility tree shows "approve this sign-in" or similar).
- Centurio pauses and sends Caesar a Telegram message:
🔐 Waiting for MFA approval — check your Authenticator app. - Caesar approves on phone.
- Centurio detects page change (poll
browser_snapshotevery 3s), continues. - Immediately export
storageState.jsonso this doesn't repeat.
This is the human-in-the-loop pattern — Caesar stays in control.
C. SMS / Email OTP
Partially automatable with infrastructure:
- Email OTP: If centurio has access to Caesar's email via IMAP MCP tool, it can read the code. But this creates a risky dependency chain.
- SMS OTP: Requires SMS gateway integration (Twilio, etc.), not worth the complexity for single-user.
Recommended approach: Treat like push notification — ask Caesar to read the code and reply in Telegram. Centurio types it into the form.
Summary Matrix
| 2FA Type | Automatable | Legio Strategy |
|---|---|---|
| TOTP | Yes | Bootstrap script with pyotp, seed from vault |
| Push (Duo, MS Auth) | No | Human-in-the-loop, Telegram notification |
| SMS OTP | Partially | Caesar reads code, replies in Telegram |
| Email OTP | Partially | IMAP tool or Caesar reads, replies in Telegram |
| WebAuthn / FIDO2 | No | Unsupported — use alternative auth method |
| Magic Link | Partially | Caesar clicks link on phone, centurio polls for session |
Problem 3 — Bot Detection and Blocking
Modern websites use multi-layered detection. Here's what centuriones will face and how to handle each layer:
Detection Layers
| Layer | What It Checks | Default Playwright Status |
|---|---|---|
| User-Agent string | "HeadlessChrome" in UA | Exposed by default |
navigator.webdriver | true for automation | Exposed by default |
| Browser fingerprint | Canvas, WebGL, fonts, TLS | Partially detectable |
| Behavioral analysis | Click timing, scroll patterns, instant navigation | Easily detectable |
| IP reputation | Known datacenter IP ranges | Flagged if running in cloud |
| Cloudflare Turnstile | JavaScript challenge + behavior score | Blocks vanilla Playwright |
| CAPTCHAs | Visual puzzles | Cannot be solved by LLM |
Mitigation Strategy (Layered)
Layer 1 — Stealth plugins (easy, free):
playwright-stealth (Python: pip install playwright-stealth) patches the most common detection vectors:
- Removes
navigator.webdriverflag - Strips "HeadlessChrome" from User-Agent
- Spoofs WebGL renderer, canvas, fonts
- Fixes Chrome DevTools Protocol leaks
Effectiveness: defeats basic checks (simple bot walls, naive Cloudflare rules). Fails against advanced systems (DataDome, PerimeterX, Akamai Bot Manager).
Layer 2 — Behavioral mimicry (medium, free):
- Random delays between actions (200-800ms)
- Mouse movement simulation before clicks
- Scroll gradually instead of jumping
- Don't navigate faster than a human could read
These can be injected at the centurio prompt level:
"When browsing, wait 1-3 seconds between page actions. Scroll before clicking elements below the fold."
Layer 3 — Anti-detect browsers (harder, free):
Camoufox — Firefox-based, modifies fingerprinting at the C++ level. Achieved 0% detection scores on major test suites. Python-native. However, maintenance gaps reported in 2026 — evaluate stability before depending on it.
Layer 4 — Residential proxies (paid):
For sites that block datacenter IPs. Services like Bright Data, Oxylabs provide residential IP rotation. Adds cost ($1-15/GB). Only needed for high-security targets.
Layer 5 — CAPTCHA solvers (paid, last resort):
Services like 2Captcha, CapSolver use human workers or specialized AI to solve CAPTCHAs. Adds latency (5-30s) and cost (~$1-3/1000 solves). Ethically questionable for some use cases.
Recommended Posture for Legio
Do not try to defeat every bot detection system. Instead:
- Use stealth plugins by default — handles 80% of sites.
- Maintain a site classification in edicta:
- Green sites: No detection (internal tools, APIs, docs sites). Use vanilla Playwright MCP.
- Yellow sites: Basic detection (login-gated SaaS, social media). Use stealth + storage state + behavioral delays.
- Red sites: Aggressive detection (Cloudflare Enterprise, DataDome). Fall back to alternative: use the site's API instead, or ask Caesar to perform the action manually.
- Never circumvent security controls on sites Caesar doesn't own. This is both ethical and practical — escalating the arms race wastes engineering time.
- Prefer APIs over scraping wherever available. Most services Legio would interact with (GitHub, Jira, Slack, Google Workspace) have REST/GraphQL APIs that are more reliable than browser automation.
Playwright MCP Specific Limitations
The Playwright MCP server has a known issue with certain 2FA flows: when Entra ID (Azure AD) detects a Playwright-controlled browser, it may request a security key instead of showing the Authenticator QR/push prompt. This is because the site fingerprints the browser as automated and downgrades to a stricter auth method.
Workaround: Use storage state from a manual login session. Avoid triggering 2FA in the automated browser entirely.
Architecture: Security Layers
┌─────────────────────────────────────┐
│ Caesar's Vault │
│ (HashiCorp / SOPS / age-encrypted) │
│ │
│ • site passwords │
│ • TOTP seeds │
│ • OAuth client secrets │
└──────────────┬──────────────────────┘
│
▼
┌─────────────────────────────────────┐
│ Bootstrap Script (one-time) │
│ Runs outside SDK agent loop │
│ │
│ 1. Read creds from vault │
│ 2. Launch headed/headless browser │
│ 3. Login + solve 2FA (pyotp) │
│ 4. Export storageState.json │
│ 5. Encrypt → castra/browser/sessions│
└──────────────┬──────────────────────┘
│
▼
┌──────────────────────────────────────────────────────────┐
│ Centurio Runtime │
│ │
│ Playwright MCP (headless, stealth) │
│ ├─ Load storageState (pre-authenticated) │
│ ├─ browser_navigate (already logged in) │
│ ├─ browser_snapshot (accessibility tree, no secrets) │
│ └─ browser_click / browser_type (form data only) │
│ │
│ 🚫 NO passwords in prompt │
│ 🚫 NO TOTP seeds in tool args │
│ 🚫 NO raw credentials in context window │
│ │
│ If MFA prompt detected: │
│ └─ Notify Caesar via Telegram │
│ └─ Wait for human approval │
│ └─ Re-export storageState after success │
└──────────────────────────────────────────────────────────┘Legio-Specific Security Rules
Credential isolation. Centuriones never receive passwords, tokens, or TOTP seeds. Infrastructure handles auth; agents handle tasks.
Storage state encryption. Session files in
castra/browser/sessions/are encrypted at rest (age or SOPS). Decrypted only at Playwright launch, held in memory, never logged.URL governance. Edictum per centurio defines allowed domains. Centurio attempting to navigate outside its allowed list triggers a warning to Caesar, not a silent block (visibility over obstruction).
No CAPTCHA circumvention. If a site presents a CAPTCHA, the centurio stops and reports. Caesar decides whether to solve it manually or abandon the task.
Session expiry monitoring. Centurio detects auth failure (401, redirect to login) and notifies Caesar immediately rather than retrying with stale credentials.
Audit trail. All browser tool calls logged in praetorium as nuntii —
browser_navigate(url)becomes a traceable record of what sites centuriones visited and when.
Sources
- Playwright MCP (Microsoft)
- Browser MCP (ByteDance)
- browser-use
- Stagehand
- Browserless
- Browserbase
- Vercel Agent Browser
- Best Browser Agents 2026 (Firecrawl)
- Best Agent Browsers 2026 (Bright Data)
- Playwright vs Puppeteer vs Selenium 2026
- Claude Agent SDK MCP Integration
- Browserbase Skills (Agent SDK)
- Playwright Authentication Docs
- Playwright Python Auth (Storage State)
- Playwright TOTP 2FA Automation
- Automating M365 Login with MFA in Playwright
- Playwright MCP Entra ID 2FA Issue
- Playwright Bot Detection Avoidance (BrowserStack)
- Playwright Stealth (PyPI)
- Camoufox Anti-Detect Browser
- Bypass Cloudflare with Playwright 2026 (ZenRows)
- HashiCorp Vault
- Browser Automation Session Management (Skyvern)