Skip to content

Headless Browser Solutions for Legio

Date: 2026-02-15 Status: Research Author: Antony

Legio deploys to remote Linux or containers where no visual browser exists. Centuriones need web interaction capabilities — research, form filling, data extraction, monitoring — without a display server.


The Problem

Centuriones run as Claude Agent SDK clients inside a Python process on a headless Linux host (bare metal, VM, or container). There is no X11/Wayland display. Any browser automation must:

  1. Run fully headless (no GPU, no display)
  2. Integrate with the Claude Agent SDK via MCP (stdio or in-process)
  3. Work inside Docker/Podman containers
  4. Keep resource footprint reasonable (single-user system)
  5. Be Python-native or callable from Python

Solution Landscape (2026)

Tier 1 — MCP-Native Browser Servers

These expose browser capabilities as MCP tools. The SDK client calls them like any other tool, no custom integration code needed.

1. Playwright MCP (Microsoft Official)

  • Repo: https://github.com/microsoft/playwright-mcp
  • Transport: stdio (spawn via npx @playwright/mcp@latest --headless)
  • Approach: Accessibility-tree snapshots, not screenshots. The LLM receives structured text representing interactive elements, not pixels.
  • Headless: --headless flag, Docker supports headless Chromium only.
  • Tools exposed: browser_navigate, browser_snapshot, browser_click, browser_type, browser_select_option, browser_take_screenshot, etc.
  • Language: Node.js server, but stdio transport makes it language-agnostic. Python SDK connects via StdioServerParameters.
  • Container: Works in Docker. Playwright auto-installs browser binaries on first run. Official Docker images available.
  • Cost: Free, open-source (Apache 2.0).
  • New (2026): Microsoft adding token-efficient CLI mode optimized for coding agents — reduces token consumption for common browser tasks.

Legio integration:

python
# In centurio session creation (session.py)
mcp_config = McpServerConfig(
    command="npx",
    args=["@playwright/mcp@latest", "--headless"],
    transport="stdio",
)
options = ClaudeAgentOptions(
    mcp_servers={"browser": mcp_config},
    # ... existing config
)

Verdict: Best fit for Legio. MCP-native, accessibility-tree approach is token-efficient, headless by design, free, and the SDK already supports stdio MCP servers.

2. Browser MCP (ByteDance)

  • Repo: https://github.com/bytedance/browser-mcp
  • Transport: stdio
  • Approach: Puppeteer-based, structured accessibility data + optional vision mode for complex visual pages.
  • Headless: Yes, Puppeteer default.
  • Container: Works in Docker with headless Chrome.
  • Cost: Free, open-source.

Verdict: Viable alternative if Playwright MCP has issues. Less mature than Microsoft's official server.

3. Browserbase Skills (Claude Agent SDK + Browse)

  • Repo: https://github.com/browserbase/agent-browse
  • Approach: Claude Agent SDK with a web browsing tool, built on Browserbase's managed browser infrastructure.
  • Headless: Cloud-hosted browsers (no local browser needed).
  • Cost: Browserbase pricing (managed service, not self-hosted).

Verdict: Overkill for Legio's single-user model. Adds cloud dependency and recurring cost. Better suited for multi-tenant production systems.


Tier 2 — Python-Native Agent Frameworks

These are full agent-browser frameworks with their own orchestration. They overlap with what Legatus already does but provide browser-specific agentic capabilities.

4. browser-use

  • Repo: https://github.com/browser-use/browser-use
  • Install: pip install browser-use
  • Approach: Python-native. Gives AI agents browser control via Playwright under the hood. Combines accessibility tree + visual understanding + HTML extraction.
  • LLM support: Anthropic, OpenAI, Google, Ollama, DeepSeek.
  • Headless: Yes, Playwright headless mode.
  • Container: uvx browser-use install installs Chromium.
  • Cost: Free, open-source (MIT).
  • Stars: 55k+ GitHub stars, active development.

Legio integration options:

Option A — Wrap as in-process MCP tool:

python
# Create a custom MCP tool that delegates to browser-use
@tool
async def browse_web(task: str) -> str:
    """Execute a browser task using AI-driven navigation."""
    agent = BrowserAgent(task=task, llm=claude_client)
    result = await agent.run()
    return result.final_result

Option B — Expose as stdio MCP server (write thin wrapper).

Verdict: Most Pythonic option. Eliminates Node.js dependency. Risk: overlapping agent loops (browser-use has its own agentic loop vs. centurio's SDK loop). Best used as a tool, not as an agent.

5. Stagehand (Browserbase)

  • Repo: https://github.com/browserbase/stagehand
  • Approach: TypeScript SDK. v3 drops Playwright dependency, talks directly to Chrome via CDP. 44% faster than v2.
  • Headless: Yes, CDP-native.
  • Language: TypeScript only (no Python bindings).
  • Cost: Free open-source, optional Browserbase cloud.

Verdict: Not suitable — TypeScript-only, no Python bindings. Would require a Node.js subprocess bridge.


Tier 3 — Managed Browser Infrastructure

Remote browser services where you send API calls and they run browsers in their cloud. Zero local browser installation.

6. Browserless (Self-Hosted Docker)

  • Repo: https://github.com/browserless/browserless
  • Docker: docker run -p 3000:3000 browserless/chrome
  • API: REST + WebSocket. Supports Puppeteer, Playwright, Selenium connections. Clients connect to ws://localhost:3000.
  • Headless: Always headless in Docker.
  • Cost: Free for non-commercial. Commercial license required for production/CI. Cloud plans from free tier (1k units) up.
  • Features: Screenshot API, PDF generation, content extraction, function execution, anti-detection.

Legio integration:

python
# Playwright connects to remote browserless instance
from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.connect_over_cdp(
        "ws://browserless:3000"
    )

Verdict: Good for container deployments where you want browser isolation. Run browserless as a sidecar container. Adds operational complexity but provides clean separation.

7. Browserbase (Cloud)

  • URL: https://www.browserbase.com/
  • Approach: Cloud-hosted browser sessions. API-driven, no local browser. Designed for AI agent workloads.
  • Scale: 1M+ concurrent sessions.
  • Cost: Paid service (per-session pricing).

Verdict: Unnecessary for single-user Legio. Cloud dependency contradicts self-contained deployment goal.


Comparison Matrix

SolutionMCP NativePythonHeadlessContainerCostComplexity
Playwright MCPYes (stdio)Via stdioYesYesFreeLow
Browser MCP (ByteDance)Yes (stdio)Via stdioYesYesFreeLow
browser-useNo (wrap as tool)NativeYesYesFreeMedium
StagehandNoNoYesYesFreeHigh
Browserless DockerNo (wrap as tool)Via CDPYesSidecarLicenseMedium
Browserbase CloudNoVia APICloudN/APaidLow

Recommendation for Legio

Primary: Playwright MCP (Microsoft)

Why:

  • MCP-native — zero custom integration code. Add to mcp_servers dict in session config and centuriones immediately gain browser tools.
  • Accessibility-tree approach is token-efficient (structured text, not screenshots). Fits within centurio context window budgets.
  • Headless by default in containers. --headless flag for dev.
  • Microsoft-backed, active development, 2026 CLI mode optimization.
  • Already proven with Claude Agent SDK (stdio transport).

Integration path:

  1. Add @playwright/mcp to container image (npm install -g @playwright/mcp)
  2. Install Chromium in container (npx playwright install chromium)
  3. Configure as MCP server in centurio session creation
  4. Gate behind edictum — only centuriones with browser edictum get the tool

Container Dockerfile addition:

dockerfile
RUN npm install -g @playwright/mcp@latest \
    && npx playwright install --with-deps chromium

Secondary: browser-use (Python fallback)

Why:

  • Eliminates Node.js dependency entirely (pure Python + Playwright).
  • Richer agentic capabilities (visual understanding, multi-step navigation).
  • Useful when accessibility-tree approach fails on complex SPAs.

When to use:

  • If Playwright MCP's text-only approach can't handle a specific site
  • If Node.js dependency becomes a deployment burden
  • For tasks requiring visual page understanding (charts, images, layouts)

Integration path:

  1. pip install browser-use
  2. Wrap as in-process MCP tool (not a separate agent loop)
  3. Expose as browse_web(task: str) -> str tool to centuriones

Sidecar option: Browserless Docker

When to use:

  • Multi-container orchestration (docker-compose / k8s)
  • Need browser process isolation from agent process
  • Want connection pooling for multiple concurrent centuriones

Integration:

yaml
# docker-compose.yml
services:
  legio:
    build: .
    depends_on: [browserless]
  browserless:
    image: browserless/chrome
    ports: ["3000:3000"]

Architecture Decision

Caesar (Telegram)


Legatus

  ├─ dispatch_to_centurio("vorenus", "research competitor pricing")
  │    │
  │    ▼
  │  Centurio "vorenus" (ClaudeSDKClient)
  │    ├─ MCP: memoria_vorenus (in-process, existing)
  │    ├─ MCP: browser (stdio, Playwright MCP, NEW)
  │    │    ├─ browser_navigate("https://example.com")
  │    │    ├─ browser_snapshot()     ← accessibility tree
  │    │    ├─ browser_click(ref="link-pricing")
  │    │    └─ browser_snapshot()     ← updated tree
  │    └─ Returns structured result to Legatus


Caesar sees: "⚔️ Vorenus — Code Specialist\n━━━━━━━\nCompetitor X charges..."

Open Questions

  1. Selective enablement: Should all centuriones get browser tools, or only those with a specific edictum/armarium? Recommend: opt-in via tools.json per centurio.

  2. Resource budget: Headless Chromium uses ~100-300MB RAM per instance. With max_centuriones=10, worst case is 3GB for browsers alone. Consider: shared browser instance via Browserless sidecar.

  3. Token cost: Each browser_snapshot returns the full accessibility tree (~2-5k tokens for complex pages). Multiple snapshots per task add up. Consider: token budget per browser session.

  4. Security: Centuriones browsing arbitrary URLs could hit malicious content. Consider: URL allowlist in edictum, or sandboxed browser network policy in container.

  5. Node.js dependency: Playwright MCP requires Node.js in the container. If pure-Python is a hard requirement, browser-use becomes the primary choice instead.


Security: Authentication, 2FA, and Bot Detection

Three hard problems arise when centuriones browse the real web: credentials, multi-factor auth, and sites that actively block automation.

Problem 1 — Credential Management

Centuriones must never see raw passwords. If a password appears in the LLM's context window, it becomes part of the conversation that could be logged, cached in token trackers, or leaked in error messages.

Approach: Session State Files (not passwords)

Playwright supports saving and restoring full browser state — cookies, localStorage, IndexedDB — to a JSON file via storageState. The workflow:

  1. Caesar logs in once manually (or via a bootstrap script) on a real browser and exports storageState.json.
  2. The file is stored encrypted in castra/browser/sessions/ with filesystem permissions 0600.
  3. On centurio browser launch, Playwright MCP loads the storage state, skipping the login flow entirely.
  4. Centuriones never see credentials — they receive an already- authenticated browser context.
# Conceptual flow
Caesar (one-time) ──► manual login ──► export storageState.json


                                  castra/browser/sessions/
                                  github.json (encrypted)
                                  google.json (encrypted)

Centurio ─► Playwright MCP              │
              ├─ load storageState ◄────┘
              ├─ browser_navigate (already logged in)
              └─ no password ever touches the LLM

Session refresh: Cookies expire. Two strategies:

StrategyHowWhen
Proactive refreshBackground cron re-exports storage state before expiryLong-lived sessions (GitHub, internal tools)
On-demand re-authCenturio detects 401/redirect-to-login, asks Caesar via Telegram to re-authenticateShort-lived sessions, strict security sites

Secrets that must exist (for automated bootstrap, optional):

SecretStorageAccess
Site passwordsVault (HashiCorp, SOPS, age-encrypted file)Bootstrap script only, never LLM
TOTP seedsSame vaultBootstrap script only
OAuth client secrets.envRefresh token flow only
Storage state filescastra/browser/sessions/ (0600)Playwright MCP at launch

Rule: no credential ever enters a centurio's prompt, system message, or tool call arguments. Credentials flow through infrastructure only.

Problem 2 — Two-Factor Authentication (2FA / MFA)

Three 2FA patterns exist in the wild, each with different headless feasibility:

A. TOTP (Google Authenticator, Authy)

Automatable. The TOTP seed is a shared secret (base32 string). Given the seed, any code can generate the 6-digit code:

python
import pyotp
code = pyotp.TOTP(seed).now()  # same as Google Authenticator

Legio already has pyotp as a dependency (for the Auctoritas gate). A bootstrap script can:

  1. Read the TOTP seed from vault
  2. Generate the current code
  3. Fill it into the 2FA form via Playwright
  4. Export the authenticated storageState.json

Security constraint: The TOTP seed must never be passed to the LLM. The bootstrap script runs outside the SDK agent loop.

B. Push Notification (Microsoft Authenticator, Duo)

Not automatable. These require Caesar to tap "Approve" on a phone.

Legio workaround: When a centurio encounters a push-MFA page:

  1. Centurio detects the MFA prompt (accessibility tree shows "approve this sign-in" or similar).
  2. Centurio pauses and sends Caesar a Telegram message: 🔐 Waiting for MFA approval — check your Authenticator app.
  3. Caesar approves on phone.
  4. Centurio detects page change (poll browser_snapshot every 3s), continues.
  5. Immediately export storageState.json so this doesn't repeat.

This is the human-in-the-loop pattern — Caesar stays in control.

C. SMS / Email OTP

Partially automatable with infrastructure:

  • Email OTP: If centurio has access to Caesar's email via IMAP MCP tool, it can read the code. But this creates a risky dependency chain.
  • SMS OTP: Requires SMS gateway integration (Twilio, etc.), not worth the complexity for single-user.

Recommended approach: Treat like push notification — ask Caesar to read the code and reply in Telegram. Centurio types it into the form.

Summary Matrix

2FA TypeAutomatableLegio Strategy
TOTPYesBootstrap script with pyotp, seed from vault
Push (Duo, MS Auth)NoHuman-in-the-loop, Telegram notification
SMS OTPPartiallyCaesar reads code, replies in Telegram
Email OTPPartiallyIMAP tool or Caesar reads, replies in Telegram
WebAuthn / FIDO2NoUnsupported — use alternative auth method
Magic LinkPartiallyCaesar clicks link on phone, centurio polls for session

Problem 3 — Bot Detection and Blocking

Modern websites use multi-layered detection. Here's what centuriones will face and how to handle each layer:

Detection Layers

LayerWhat It ChecksDefault Playwright Status
User-Agent string"HeadlessChrome" in UAExposed by default
navigator.webdrivertrue for automationExposed by default
Browser fingerprintCanvas, WebGL, fonts, TLSPartially detectable
Behavioral analysisClick timing, scroll patterns, instant navigationEasily detectable
IP reputationKnown datacenter IP rangesFlagged if running in cloud
Cloudflare TurnstileJavaScript challenge + behavior scoreBlocks vanilla Playwright
CAPTCHAsVisual puzzlesCannot be solved by LLM

Mitigation Strategy (Layered)

Layer 1 — Stealth plugins (easy, free):

playwright-stealth (Python: pip install playwright-stealth) patches the most common detection vectors:

  • Removes navigator.webdriver flag
  • Strips "HeadlessChrome" from User-Agent
  • Spoofs WebGL renderer, canvas, fonts
  • Fixes Chrome DevTools Protocol leaks

Effectiveness: defeats basic checks (simple bot walls, naive Cloudflare rules). Fails against advanced systems (DataDome, PerimeterX, Akamai Bot Manager).

Layer 2 — Behavioral mimicry (medium, free):

  • Random delays between actions (200-800ms)
  • Mouse movement simulation before clicks
  • Scroll gradually instead of jumping
  • Don't navigate faster than a human could read

These can be injected at the centurio prompt level:

"When browsing, wait 1-3 seconds between page actions. Scroll before clicking elements below the fold."

Layer 3 — Anti-detect browsers (harder, free):

Camoufox — Firefox-based, modifies fingerprinting at the C++ level. Achieved 0% detection scores on major test suites. Python-native. However, maintenance gaps reported in 2026 — evaluate stability before depending on it.

Layer 4 — Residential proxies (paid):

For sites that block datacenter IPs. Services like Bright Data, Oxylabs provide residential IP rotation. Adds cost ($1-15/GB). Only needed for high-security targets.

Layer 5 — CAPTCHA solvers (paid, last resort):

Services like 2Captcha, CapSolver use human workers or specialized AI to solve CAPTCHAs. Adds latency (5-30s) and cost (~$1-3/1000 solves). Ethically questionable for some use cases.

Do not try to defeat every bot detection system. Instead:

  1. Use stealth plugins by default — handles 80% of sites.
  2. Maintain a site classification in edicta:
    • Green sites: No detection (internal tools, APIs, docs sites). Use vanilla Playwright MCP.
    • Yellow sites: Basic detection (login-gated SaaS, social media). Use stealth + storage state + behavioral delays.
    • Red sites: Aggressive detection (Cloudflare Enterprise, DataDome). Fall back to alternative: use the site's API instead, or ask Caesar to perform the action manually.
  3. Never circumvent security controls on sites Caesar doesn't own. This is both ethical and practical — escalating the arms race wastes engineering time.
  4. Prefer APIs over scraping wherever available. Most services Legio would interact with (GitHub, Jira, Slack, Google Workspace) have REST/GraphQL APIs that are more reliable than browser automation.

Playwright MCP Specific Limitations

The Playwright MCP server has a known issue with certain 2FA flows: when Entra ID (Azure AD) detects a Playwright-controlled browser, it may request a security key instead of showing the Authenticator QR/push prompt. This is because the site fingerprints the browser as automated and downgrades to a stricter auth method.

Workaround: Use storage state from a manual login session. Avoid triggering 2FA in the automated browser entirely.

Architecture: Security Layers

                    ┌─────────────────────────────────────┐
                    │          Caesar's Vault              │
                    │  (HashiCorp / SOPS / age-encrypted)  │
                    │                                     │
                    │  • site passwords                    │
                    │  • TOTP seeds                        │
                    │  • OAuth client secrets               │
                    └──────────────┬──────────────────────┘


                    ┌─────────────────────────────────────┐
                    │     Bootstrap Script (one-time)      │
                    │  Runs outside SDK agent loop         │
                    │                                     │
                    │  1. Read creds from vault            │
                    │  2. Launch headed/headless browser   │
                    │  3. Login + solve 2FA (pyotp)        │
                    │  4. Export storageState.json          │
                    │  5. Encrypt → castra/browser/sessions│
                    └──────────────┬──────────────────────┘


┌──────────────────────────────────────────────────────────┐
│                  Centurio Runtime                         │
│                                                          │
│  Playwright MCP (headless, stealth)                      │
│    ├─ Load storageState (pre-authenticated)               │
│    ├─ browser_navigate (already logged in)                │
│    ├─ browser_snapshot (accessibility tree, no secrets)    │
│    └─ browser_click / browser_type (form data only)       │
│                                                          │
│  🚫 NO passwords in prompt                               │
│  🚫 NO TOTP seeds in tool args                           │
│  🚫 NO raw credentials in context window                 │
│                                                          │
│  If MFA prompt detected:                                 │
│    └─ Notify Caesar via Telegram                         │
│    └─ Wait for human approval                            │
│    └─ Re-export storageState after success                │
└──────────────────────────────────────────────────────────┘

Legio-Specific Security Rules

  1. Credential isolation. Centuriones never receive passwords, tokens, or TOTP seeds. Infrastructure handles auth; agents handle tasks.

  2. Storage state encryption. Session files in castra/browser/sessions/ are encrypted at rest (age or SOPS). Decrypted only at Playwright launch, held in memory, never logged.

  3. URL governance. Edictum per centurio defines allowed domains. Centurio attempting to navigate outside its allowed list triggers a warning to Caesar, not a silent block (visibility over obstruction).

  4. No CAPTCHA circumvention. If a site presents a CAPTCHA, the centurio stops and reports. Caesar decides whether to solve it manually or abandon the task.

  5. Session expiry monitoring. Centurio detects auth failure (401, redirect to login) and notifies Caesar immediately rather than retrying with stale credentials.

  6. Audit trail. All browser tool calls logged in praetorium as nuntii — browser_navigate(url) becomes a traceable record of what sites centuriones visited and when.


Sources

Built with Roman discipline.