Headless Browser Solutions for Legio

Date: 2026-02-15 Status: Research Author: Antony

Legio deploys to remote Linux or containers where no visual browser exists. Centuriones need web interaction capabilities — research, form filling, data extraction, monitoring — without a display server.

The Problem

Centuriones run as Claude Agent SDK clients inside a Python process on a headless Linux host (bare metal, VM, or container). There is no X11/Wayland display. Any browser automation must:

Run fully headless (no GPU, no display)
Integrate with the Claude Agent SDK via MCP (stdio or in-process)
Work inside Docker/Podman containers
Keep resource footprint reasonable (single-user system)
Be Python-native or callable from Python

Solution Landscape (2026)

Tier 1 — MCP-Native Browser Servers

These expose browser capabilities as MCP tools. The SDK client calls them like any other tool, no custom integration code needed.

1. Playwright MCP (Microsoft Official)

Repo: https://github.com/microsoft/playwright-mcp
Transport: stdio (spawn via npx @playwright/mcp@latest --headless)
Approach: Accessibility-tree snapshots, not screenshots. The LLM receives structured text representing interactive elements, not pixels.
Headless: --headless flag, Docker supports headless Chromium only.
Tools exposed: browser_navigate, browser_snapshot, browser_click, browser_type, browser_select_option, browser_take_screenshot, etc.
Language: Node.js server, but stdio transport makes it language-agnostic. Python SDK connects via StdioServerParameters.
Container: Works in Docker. Playwright auto-installs browser binaries on first run. Official Docker images available.
Cost: Free, open-source (Apache 2.0).
New (2026): Microsoft adding token-efficient CLI mode optimized for coding agents — reduces token consumption for common browser tasks.

Legio integration:

python

# In centurio session creation (session.py)
mcp_config = McpServerConfig(
    command="npx",
    args=["@playwright/mcp@latest", "--headless"],
    transport="stdio",
)
options = ClaudeAgentOptions(
    mcp_servers={"browser": mcp_config},
    # ... existing config
)

Verdict: Best fit for Legio. MCP-native, accessibility-tree approach is token-efficient, headless by design, free, and the SDK already supports stdio MCP servers.

2. Browser MCP (ByteDance)

Repo: https://github.com/bytedance/browser-mcp
Transport: stdio
Approach: Puppeteer-based, structured accessibility data + optional vision mode for complex visual pages.
Headless: Yes, Puppeteer default.
Container: Works in Docker with headless Chrome.
Cost: Free, open-source.

Verdict: Viable alternative if Playwright MCP has issues. Less mature than Microsoft's official server.

3. Browserbase Skills (Claude Agent SDK + Browse)

Repo: https://github.com/browserbase/agent-browse
Approach: Claude Agent SDK with a web browsing tool, built on Browserbase's managed browser infrastructure.
Headless: Cloud-hosted browsers (no local browser needed).
Cost: Browserbase pricing (managed service, not self-hosted).

Verdict: Overkill for Legio's single-user model. Adds cloud dependency and recurring cost. Better suited for multi-tenant production systems.

Tier 2 — Python-Native Agent Frameworks

These are full agent-browser frameworks with their own orchestration. They overlap with what Legatus already does but provide browser-specific agentic capabilities.

4. browser-use

Repo: https://github.com/browser-use/browser-use
Install: pip install browser-use
Approach: Python-native. Gives AI agents browser control via Playwright under the hood. Combines accessibility tree + visual understanding + HTML extraction.
LLM support: Anthropic, OpenAI, Google, Ollama, DeepSeek.
Headless: Yes, Playwright headless mode.
Container: uvx browser-use install installs Chromium.
Cost: Free, open-source (MIT).
Stars: 55k+ GitHub stars, active development.

Legio integration options:

Option A — Wrap as in-process MCP tool:

python

# Create a custom MCP tool that delegates to browser-use
@tool
async def browse_web(task: str) -> str:
    """Execute a browser task using AI-driven navigation."""
    agent = BrowserAgent(task=task, llm=claude_client)
    result = await agent.run()
    return result.final_result

Option B — Expose as stdio MCP server (write thin wrapper).

Verdict: Most Pythonic option. Eliminates Node.js dependency. Risk: overlapping agent loops (browser-use has its own agentic loop vs. centurio's SDK loop). Best used as a tool, not as an agent.

5. Stagehand (Browserbase)

Repo: https://github.com/browserbase/stagehand
Approach: TypeScript SDK. v3 drops Playwright dependency, talks directly to Chrome via CDP. 44% faster than v2.
Headless: Yes, CDP-native.
Language: TypeScript only (no Python bindings).
Cost: Free open-source, optional Browserbase cloud.

Verdict: Not suitable — TypeScript-only, no Python bindings. Would require a Node.js subprocess bridge.

Tier 3 — Managed Browser Infrastructure

Remote browser services where you send API calls and they run browsers in their cloud. Zero local browser installation.

6. Browserless (Self-Hosted Docker)

Repo: https://github.com/browserless/browserless
Docker: docker run -p 3000:3000 browserless/chrome
API: REST + WebSocket. Supports Puppeteer, Playwright, Selenium connections. Clients connect to ws://localhost:3000.
Headless: Always headless in Docker.
Cost: Free for non-commercial. Commercial license required for production/CI. Cloud plans from free tier (1k units) up.
Features: Screenshot API, PDF generation, content extraction, function execution, anti-detection.

Legio integration:

python

# Playwright connects to remote browserless instance
from playwright.async_api import async_playwright

async with async_playwright() as p:
    browser = await p.chromium.connect_over_cdp(
        "ws://browserless:3000"
    )

Verdict: Good for container deployments where you want browser isolation. Run browserless as a sidecar container. Adds operational complexity but provides clean separation.

7. Browserbase (Cloud)

URL: https://www.browserbase.com/
Approach: Cloud-hosted browser sessions. API-driven, no local browser. Designed for AI agent workloads.
Scale: 1M+ concurrent sessions.
Cost: Paid service (per-session pricing).

Verdict: Unnecessary for single-user Legio. Cloud dependency contradicts self-contained deployment goal.

Comparison Matrix

Solution	MCP Native	Python	Headless	Container	Cost	Complexity
Playwright MCP	Yes (stdio)	Via stdio	Yes	Yes	Free	Low
Browser MCP (ByteDance)	Yes (stdio)	Via stdio	Yes	Yes	Free	Low
browser-use	No (wrap as tool)	Native	Yes	Yes	Free	Medium
Stagehand	No	No	Yes	Yes	Free	High
Browserless Docker	No (wrap as tool)	Via CDP	Yes	Sidecar	License	Medium
Browserbase Cloud	No	Via API	Cloud	N/A	Paid	Low

Recommendation for Legio

Primary: Playwright MCP (Microsoft)

Why:

MCP-native — zero custom integration code. Add to mcp_servers dict in session config and centuriones immediately gain browser tools.
Accessibility-tree approach is token-efficient (structured text, not screenshots). Fits within centurio context window budgets.
Headless by default in containers. --headless flag for dev.
Microsoft-backed, active development, 2026 CLI mode optimization.
Already proven with Claude Agent SDK (stdio transport).

Integration path:

Add @playwright/mcp to container image (npm install -g @playwright/mcp)
Install Chromium in container (npx playwright install chromium)
Configure as MCP server in centurio session creation
Gate behind edictum — only centuriones with browser edictum get the tool

Container Dockerfile addition:

dockerfile

RUN npm install -g @playwright/mcp@latest \
    && npx playwright install --with-deps chromium

Secondary: browser-use (Python fallback)

Why:

Eliminates Node.js dependency entirely (pure Python + Playwright).
Richer agentic capabilities (visual understanding, multi-step navigation).
Useful when accessibility-tree approach fails on complex SPAs.

When to use:

If Playwright MCP's text-only approach can't handle a specific site
If Node.js dependency becomes a deployment burden
For tasks requiring visual page understanding (charts, images, layouts)

Integration path:

pip install browser-use
Wrap as in-process MCP tool (not a separate agent loop)
Expose as browse_web(task: str) -> str tool to centuriones

Sidecar option: Browserless Docker

When to use:

Multi-container orchestration (docker-compose / k8s)
Need browser process isolation from agent process
Want connection pooling for multiple concurrent centuriones

Integration:

yaml

# docker-compose.yml
services:
  legio:
    build: .
    depends_on: [browserless]
  browserless:
    image: browserless/chrome
    ports: ["3000:3000"]

Architecture Decision

Caesar (Telegram)
  │
  ▼
Legatus
  │
  ├─ dispatch_to_centurio("vorenus", "research competitor pricing")
  │    │
  │    ▼
  │  Centurio "vorenus" (ClaudeSDKClient)
  │    ├─ MCP: memoria_vorenus (in-process, existing)
  │    ├─ MCP: browser (stdio, Playwright MCP, NEW)
  │    │    ├─ browser_navigate("https://example.com")
  │    │    ├─ browser_snapshot()     ← accessibility tree
  │    │    ├─ browser_click(ref="link-pricing")
  │    │    └─ browser_snapshot()     ← updated tree
  │    └─ Returns structured result to Legatus
  │
  ▼
Caesar sees: "⚔️ Vorenus — Code Specialist\n━━━━━━━\nCompetitor X charges..."

Open Questions

Selective enablement: Should all centuriones get browser tools, or only those with a specific edictum/armarium? Recommend: opt-in via tools.json per centurio.
Resource budget: Headless Chromium uses ~100-300MB RAM per instance. With max_centuriones=10, worst case is 3GB for browsers alone. Consider: shared browser instance via Browserless sidecar.
Token cost: Each browser_snapshot returns the full accessibility tree (~2-5k tokens for complex pages). Multiple snapshots per task add up. Consider: token budget per browser session.
Security: Centuriones browsing arbitrary URLs could hit malicious content. Consider: URL allowlist in edictum, or sandboxed browser network policy in container.
Node.js dependency: Playwright MCP requires Node.js in the container. If pure-Python is a hard requirement, browser-use becomes the primary choice instead.

Security: Authentication, 2FA, and Bot Detection

Three hard problems arise when centuriones browse the real web: credentials, multi-factor auth, and sites that actively block automation.

Problem 1 — Credential Management

Centuriones must never see raw passwords. If a password appears in the LLM's context window, it becomes part of the conversation that could be logged, cached in token trackers, or leaked in error messages.

Approach: Session State Files (not passwords)

Playwright supports saving and restoring full browser state — cookies, localStorage, IndexedDB — to a JSON file via storageState. The workflow:

Caesar logs in once manually (or via a bootstrap script) on a real browser and exports storageState.json.
The file is stored encrypted in castra/browser/sessions/ with filesystem permissions 0600.
On centurio browser launch, Playwright MCP loads the storage state, skipping the login flow entirely.
Centuriones never see credentials — they receive an already- authenticated browser context.

# Conceptual flow
Caesar (one-time) ──► manual login ──► export storageState.json
                                         │
                                         ▼
                                  castra/browser/sessions/
                                  github.json (encrypted)
                                  google.json (encrypted)
                                         │
Centurio ─► Playwright MCP              │
              ├─ load storageState ◄────┘
              ├─ browser_navigate (already logged in)
              └─ no password ever touches the LLM

Session refresh: Cookies expire. Two strategies:

Strategy	How	When
Proactive refresh	Background cron re-exports storage state before expiry	Long-lived sessions (GitHub, internal tools)
On-demand re-auth	Centurio detects 401/redirect-to-login, asks Caesar via Telegram to re-authenticate	Short-lived sessions, strict security sites

Secrets that must exist (for automated bootstrap, optional):

Secret	Storage	Access
Site passwords	Vault (HashiCorp, SOPS, age-encrypted file)	Bootstrap script only, never LLM
TOTP seeds	Same vault	Bootstrap script only
OAuth client secrets	`.env`	Refresh token flow only
Storage state files	`castra/browser/sessions/` (`0600`)	Playwright MCP at launch

Rule: no credential ever enters a centurio's prompt, system message, or tool call arguments. Credentials flow through infrastructure only.

Problem 2 — Two-Factor Authentication (2FA / MFA)

Three 2FA patterns exist in the wild, each with different headless feasibility:

A. TOTP (Google Authenticator, Authy)

Automatable. The TOTP seed is a shared secret (base32 string). Given the seed, any code can generate the 6-digit code:

python

import pyotp
code = pyotp.TOTP(seed).now()  # same as Google Authenticator

Legio already has pyotp as a dependency (for the Auctoritas gate). A bootstrap script can:

Read the TOTP seed from vault
Generate the current code
Fill it into the 2FA form via Playwright
Export the authenticated storageState.json

Security constraint: The TOTP seed must never be passed to the LLM. The bootstrap script runs outside the SDK agent loop.

B. Push Notification (Microsoft Authenticator, Duo)

Not automatable. These require Caesar to tap "Approve" on a phone.

Legio workaround: When a centurio encounters a push-MFA page:

Centurio detects the MFA prompt (accessibility tree shows "approve this sign-in" or similar).
Centurio pauses and sends Caesar a Telegram message: 🔐 Waiting for MFA approval — check your Authenticator app.
Caesar approves on phone.
Centurio detects page change (poll browser_snapshot every 3s), continues.
Immediately export storageState.json so this doesn't repeat.

This is the human-in-the-loop pattern — Caesar stays in control.

C. SMS / Email OTP

Partially automatable with infrastructure:

Email OTP: If centurio has access to Caesar's email via IMAP MCP tool, it can read the code. But this creates a risky dependency chain.
SMS OTP: Requires SMS gateway integration (Twilio, etc.), not worth the complexity for single-user.

Recommended approach: Treat like push notification — ask Caesar to read the code and reply in Telegram. Centurio types it into the form.

Summary Matrix

2FA Type	Automatable	Legio Strategy
TOTP	Yes	Bootstrap script with pyotp, seed from vault
Push (Duo, MS Auth)	No	Human-in-the-loop, Telegram notification
SMS OTP	Partially	Caesar reads code, replies in Telegram
Email OTP	Partially	IMAP tool or Caesar reads, replies in Telegram
WebAuthn / FIDO2	No	Unsupported — use alternative auth method
Magic Link	Partially	Caesar clicks link on phone, centurio polls for session

Problem 3 — Bot Detection and Blocking

Modern websites use multi-layered detection. Here's what centuriones will face and how to handle each layer:

Detection Layers

Layer	What It Checks	Default Playwright Status
User-Agent string	"HeadlessChrome" in UA	Exposed by default
`navigator.webdriver`	`true` for automation	Exposed by default
Browser fingerprint	Canvas, WebGL, fonts, TLS	Partially detectable
Behavioral analysis	Click timing, scroll patterns, instant navigation	Easily detectable
IP reputation	Known datacenter IP ranges	Flagged if running in cloud
Cloudflare Turnstile	JavaScript challenge + behavior score	Blocks vanilla Playwright
CAPTCHAs	Visual puzzles	Cannot be solved by LLM

Mitigation Strategy (Layered)

Layer 1 — Stealth plugins (easy, free):

playwright-stealth (Python: pip install playwright-stealth) patches the most common detection vectors:

Removes navigator.webdriver flag
Strips "HeadlessChrome" from User-Agent
Spoofs WebGL renderer, canvas, fonts
Fixes Chrome DevTools Protocol leaks

Effectiveness: defeats basic checks (simple bot walls, naive Cloudflare rules). Fails against advanced systems (DataDome, PerimeterX, Akamai Bot Manager).

Layer 2 — Behavioral mimicry (medium, free):

Random delays between actions (200-800ms)
Mouse movement simulation before clicks
Scroll gradually instead of jumping
Don't navigate faster than a human could read

These can be injected at the centurio prompt level:

"When browsing, wait 1-3 seconds between page actions. Scroll before clicking elements below the fold."

Layer 3 — Anti-detect browsers (harder, free):

Camoufox — Firefox-based, modifies fingerprinting at the C++ level. Achieved 0% detection scores on major test suites. Python-native. However, maintenance gaps reported in 2026 — evaluate stability before depending on it.

Layer 4 — Residential proxies (paid):

For sites that block datacenter IPs. Services like Bright Data, Oxylabs provide residential IP rotation. Adds cost ($1-15/GB). Only needed for high-security targets.

Layer 5 — CAPTCHA solvers (paid, last resort):

Services like 2Captcha, CapSolver use human workers or specialized AI to solve CAPTCHAs. Adds latency (5-30s) and cost (~$1-3/1000 solves). Ethically questionable for some use cases.

Playwright MCP Specific Limitations

The Playwright MCP server has a known issue with certain 2FA flows: when Entra ID (Azure AD) detects a Playwright-controlled browser, it may request a security key instead of showing the Authenticator QR/push prompt. This is because the site fingerprints the browser as automated and downgrades to a stricter auth method.

Workaround: Use storage state from a manual login session. Avoid triggering 2FA in the automated browser entirely.

Architecture: Security Layers

                    ┌─────────────────────────────────────┐
                    │          Caesar's Vault              │
                    │  (HashiCorp / SOPS / age-encrypted)  │
                    │                                     │
                    │  • site passwords                    │
                    │  • TOTP seeds                        │
                    │  • OAuth client secrets               │
                    └──────────────┬──────────────────────┘
                                   │
                                   ▼
                    ┌─────────────────────────────────────┐
                    │     Bootstrap Script (one-time)      │
                    │  Runs outside SDK agent loop         │
                    │                                     │
                    │  1. Read creds from vault            │
                    │  2. Launch headed/headless browser   │
                    │  3. Login + solve 2FA (pyotp)        │
                    │  4. Export storageState.json          │
                    │  5. Encrypt → castra/browser/sessions│
                    └──────────────┬──────────────────────┘
                                   │
                                   ▼
┌──────────────────────────────────────────────────────────┐
│                  Centurio Runtime                         │
│                                                          │
│  Playwright MCP (headless, stealth)                      │
│    ├─ Load storageState (pre-authenticated)               │
│    ├─ browser_navigate (already logged in)                │
│    ├─ browser_snapshot (accessibility tree, no secrets)    │
│    └─ browser_click / browser_type (form data only)       │
│                                                          │
│  🚫 NO passwords in prompt                               │
│  🚫 NO TOTP seeds in tool args                           │
│  🚫 NO raw credentials in context window                 │
│                                                          │
│  If MFA prompt detected:                                 │
│    └─ Notify Caesar via Telegram                         │
│    └─ Wait for human approval                            │
│    └─ Re-export storageState after success                │
└──────────────────────────────────────────────────────────┘

Legio-Specific Security Rules

Credential isolation. Centuriones never receive passwords, tokens, or TOTP seeds. Infrastructure handles auth; agents handle tasks.
Storage state encryption. Session files in castra/browser/sessions/ are encrypted at rest (age or SOPS). Decrypted only at Playwright launch, held in memory, never logged.
URL governance. Edictum per centurio defines allowed domains. Centurio attempting to navigate outside its allowed list triggers a warning to Caesar, not a silent block (visibility over obstruction).
No CAPTCHA circumvention. If a site presents a CAPTCHA, the centurio stops and reports. Caesar decides whether to solve it manually or abandon the task.
Session expiry monitoring. Centurio detects auth failure (401, redirect to login) and notifies Caesar immediately rather than retrying with stale credentials.
Audit trail. All browser tool calls logged in praetorium as nuntii — browser_navigate(url) becomes a traceable record of what sites centuriones visited and when.

Headless Browser Solutions for Legio

The Problem

Solution Landscape (2026)

Tier 1 — MCP-Native Browser Servers

1. Playwright MCP (Microsoft Official)

2. Browser MCP (ByteDance)

3. Browserbase Skills (Claude Agent SDK + Browse)

Tier 2 — Python-Native Agent Frameworks

4. browser-use

5. Stagehand (Browserbase)

Tier 3 — Managed Browser Infrastructure

6. Browserless (Self-Hosted Docker)

7. Browserbase (Cloud)

Comparison Matrix

Recommendation for Legio

Primary: Playwright MCP (Microsoft)

Secondary: browser-use (Python fallback)

Sidecar option: Browserless Docker

Architecture Decision

Open Questions

Security: Authentication, 2FA, and Bot Detection

Problem 1 — Credential Management

Approach: Session State Files (not passwords)

Problem 2 — Two-Factor Authentication (2FA / MFA)

A. TOTP (Google Authenticator, Authy)

B. Push Notification (Microsoft Authenticator, Duo)

C. SMS / Email OTP

Summary Matrix

Problem 3 — Bot Detection and Blocking

Detection Layers

Mitigation Strategy (Layered)

Recommended Posture for Legio

Playwright MCP Specific Limitations

Architecture: Security Layers

Legio-Specific Security Rules

Sources

Headless Browser Solutions for Legio ​

The Problem ​

Solution Landscape (2026) ​

Tier 1 — MCP-Native Browser Servers ​

1. Playwright MCP (Microsoft Official) ​

2. Browser MCP (ByteDance) ​

3. Browserbase Skills (Claude Agent SDK + Browse) ​

Tier 2 — Python-Native Agent Frameworks ​

4. browser-use ​

5. Stagehand (Browserbase) ​

Tier 3 — Managed Browser Infrastructure ​

6. Browserless (Self-Hosted Docker) ​

7. Browserbase (Cloud) ​

Comparison Matrix ​

Recommendation for Legio ​

Primary: Playwright MCP (Microsoft) ​

Secondary: browser-use (Python fallback) ​

Sidecar option: Browserless Docker ​

Architecture Decision ​

Open Questions ​

Security: Authentication, 2FA, and Bot Detection ​

Problem 1 — Credential Management ​

Approach: Session State Files (not passwords) ​

Problem 2 — Two-Factor Authentication (2FA / MFA) ​

A. TOTP (Google Authenticator, Authy) ​

B. Push Notification (Microsoft Authenticator, Duo) ​

C. SMS / Email OTP ​

Summary Matrix ​

Problem 3 — Bot Detection and Blocking ​

Detection Layers ​

Mitigation Strategy (Layered) ​

Recommended Posture for Legio ​

Playwright MCP Specific Limitations ​

Architecture: Security Layers ​

Legio-Specific Security Rules ​

Sources ​

Headless Browser Solutions for Legio

The Problem

Solution Landscape (2026)

Tier 1 — MCP-Native Browser Servers

1. Playwright MCP (Microsoft Official)

2. Browser MCP (ByteDance)

3. Browserbase Skills (Claude Agent SDK + Browse)

Tier 2 — Python-Native Agent Frameworks

4. browser-use

5. Stagehand (Browserbase)

Tier 3 — Managed Browser Infrastructure

6. Browserless (Self-Hosted Docker)

7. Browserbase (Cloud)

Comparison Matrix

Recommendation for Legio

Primary: Playwright MCP (Microsoft)

Secondary: browser-use (Python fallback)

Sidecar option: Browserless Docker

Architecture Decision

Open Questions

Security: Authentication, 2FA, and Bot Detection

Problem 1 — Credential Management

Approach: Session State Files (not passwords)

Problem 2 — Two-Factor Authentication (2FA / MFA)

A. TOTP (Google Authenticator, Authy)

B. Push Notification (Microsoft Authenticator, Duo)

C. SMS / Email OTP

Summary Matrix

Problem 3 — Bot Detection and Blocking

Detection Layers

Mitigation Strategy (Layered)

Recommended Posture for Legio

Playwright MCP Specific Limitations

Architecture: Security Layers

Legio-Specific Security Rules

Sources