Filesystem Architecture: Project vs. Deployed Instance
Date: 2026-02-16 Status: Research Author: Antony
The whole legion is a bunch of markdown files and scripts in a file tree. Caesar can version-control the entire system through git. How should we organize the project repo vs. the running instance's state?
The Insight
A centurio is not primarily code. A centurio is:
castra/centuriones/vorenus/prompt.md— personality, skills, rulescastra/centuriones/vorenus/tools.json— tool configurationcastra/centuriones/vorenus/commentarii/*.md— private memory
An edictum is castra/edicta/security-review.md. An actum is castra/acta/api-patterns.md. The legatus itself is castra/legatus/prompt.md.
The Python code (legio/) is infrastructure — plumbing that reads these files and routes them to the Claude API. The actual "intelligence" of the legion lives in the filesystem. This means:
- The legion is git-native. Every centurio creation, edictum publish, and prompt edit is a file change that can be committed, branched, diffed, and reverted.
- Caesar can fork a legion. Clone the repo, tweak prompts, deploy a second instance with different personality/rules.
- Prompts are code. They deserve the same review, versioning, and CI discipline as Python modules.
Current State: What's Where
What IS git-tracked (the project)
legio/ # Python infrastructure
├── __main__.py # startup orchestration
├── legatus.py # orchestrator agent
├── session.py # SDK session lifecycle
├── centurio.py # centurio data model
├── nuntius.py # message model
├── praetorium.py # SQLite message bus
├── config.py # config loading
├── errors.py # domain exceptions
├── totp.py # OTP verification
├── auctoritas.py # authorization state machine
├── rendering.py # formatting, templates
├── memoria/ # knowledge store
│ ├── store.py
│ └── tools.py
└── telegram/ # Telegram bot
├── bot.py
├── commands.py
└── utils.py
templates/ # blueprints for new centuriones
├── centurio/
│ ├── prompt.md.template
│ └── tools.json.template
└── legatus/
└── prompt.md.template
tests/ # 100% coverage test suite
scripts/ # quality enforcement
dev-docs/ # frozen documentation
.claude/rules/ # coding standards
legio.toml # non-secret configuration
pyproject.toml # Python project config
AGENTS.md # shared agent instructionsWhat is NOT git-tracked (runtime state)
castra/ # .gitignored living state
├── centuriones/ # centurio prompts + memory
│ └── vorenus/
│ ├── prompt.md # ← the centurio's soul
│ ├── tools.json # ← tool configuration
│ └── commentarii/ # ← private notes
│ └── api-design.md
├── edicta/ # standing orders
│ └── code-review.md
├── acta/ # shared knowledge
│ └── database-patterns.md
├── legatus/
│ └── prompt.md # ← the legatus's brain
└── praetorium.db # SQLite message history
.env # secrets (API keys, TOTP)The Problem
castra/ is half-ignored. The .gitignore excludes centurio data, edicta, and acta (correct — these are instance state), but tracks castra/legatus/prompt.md and the directory structure. This creates ambiguity:
- Legatus prompt is tracked, but centurio prompts are not
- Templates exist but aren't clearly separated from instance state
- No way to ship a "starter legion" with pre-configured centuriones
- Database is obviously not tracked (binary), but its schema is embedded in Python code, not versioned independently
Proposed Structure: Three Layers
Layer 1: Code (always git-tracked)
The infrastructure. Python modules, tests, scripts, project config. Changes here require code review, tests, linting.
legio/ # Python code
tests/ # test suite
scripts/ # quality tools
pyproject.toml # project metadataLayer 2: Blueprints (always git-tracked)
The design of the legion. Templates, default prompts, example configurations. These define what a fresh deployment looks like.
blueprints/ # was: templates/
├── legatus/
│ └── prompt.md # default legatus prompt
├── centurio/
│ ├── prompt.md.template # template for new centuriones
│ └── tools.json.template # default tool config
├── edicta/ # starter edicta (optional)
│ └── code-review.md # example standing order
└── legio.toml.example # example config with all optionsWhy rename templates/ to blueprints/? Two reasons:
templates/is a Python convention (Jinja, Django). Confusing.- Roman vocabulary: blueprint = "exemplar" if we want Latin, but
blueprints/is clearer in English and the directory is infrastructure.
Layer 3: Castra (instance state, git-optional)
The living deployment. Created from blueprints at first launch, then evolves through operation. This is where Caesar's decision matters:
Option A — Castra fully gitignored (current default)
castra/ # .gitignored
├── centuriones/*/ # created at runtime
├── edicta/*/ # created via /edict
├── acta/*/ # created by centuriones
├── legatus/prompt.md # copied from blueprint at init
└── praetorium.db # SQLite, never trackedBest for: deployments where castra is disposable or backed up separately. Clean separation between code and state.
Option B — Castra as a tracked subtree (Caesar's legion-as-repo)
castra/ # tracked (except db + secrets)
├── centuriones/
│ ├── vorenus/
│ │ ├── prompt.md # ← version-controlled personality
│ │ ├── tools.json
│ │ └── commentarii/ # ← .gitignored (ephemeral)
│ └── pullo/
│ ├── prompt.md
│ └── tools.json
├── edicta/
│ └── code-review.md # ← version-controlled policy
├── acta/
│ └── api-patterns.md # ← version-controlled knowledge
├── legatus/
│ └── prompt.md
└── .gitignore # ignores praetorium.db, commentariiBest for: Caesar who treats the legion itself as a product. Every centurio creation, prompt tweak, and edictum is a git commit. Can branch (git checkout -b experiment/aggressive-reviewer), test, and merge back.
Option C — Castra as a separate git repo (multi-instance)
legio/ # code repo (github.com/xiaolai/legio)
castra-production/ # state repo (github.com/xiaolai/legio-castra-prod)
castra-staging/ # state repo (github.com/xiaolai/legio-castra-staging)Best for: running multiple legion instances from the same codebase with different centurio rosters and edicta. Each castra is its own repo with its own commit history.
Gitignore Redesign by Option
Option A (fully ignored — current, cleaned up)
# Castra: runtime state, not source
castra/centuriones/
castra/edicta/
castra/acta/
castra/legatus/prompt.md
castra/praetorium.db*
!castra/**/.gitkeepOption B (tracked subtree — recommended)
# Castra: track prompts + edicta + acta, ignore runtime data
castra/praetorium.db*
castra/centuriones/*/commentarii/
castra/browser/sessions/ # encrypted auth state (future)
!castra/**/.gitkeepOption C (separate repo)
The code repo's .gitignore ignores castra/ entirely. The castra repo has its own .gitignore:
# In the castra repo
praetorium.db*
centuriones/*/commentarii/
browser/sessions/Deployment: Init Flow
Regardless of option, first launch needs a bootstrap that populates castra from blueprints:
legio --init # or handled in __main__.py
1. Create castra/ directory structure
2. Copy blueprints/legatus/prompt.md → castra/legatus/prompt.md
3. Ensure edicta/, acta/, centuriones/ dirs exist
4. Initialize praetorium.db with schema
5. (Option B) git add castra/ && git commit -m "Initialize castra"This already partially exists in __main__.py (castra dir creation). Extending it to copy the legatus prompt from blueprints is trivial.
Container Deployment
Docker image (immutable)
├── legio/ ← baked in
├── blueprints/ ← baked in
├── scripts/ ← baked in
└── pyproject.toml ← baked in
Volume mount (persistent, per-instance)
├── castra/ ← mounted from host or named volume
├── .env ← mounted or injected via env vars
└── legio.toml ← mounted (or baked in with defaults)FROM python:3.14-slim
WORKDIR /app
COPY legio/ legio/
COPY blueprints/ blueprints/
COPY pyproject.toml .
RUN pip install --no-cache-dir .
VOLUME ["/app/castra"]
ENV CASTRA_DIR=/app/castra
CMD ["python", "-m", "legio"]# docker-compose.yml
services:
legio:
build: .
volumes:
- ./castra:/app/castra # or named volume
- ./.env:/app/.env:ro
- ./legio.toml:/app/legio.toml:ro
environment:
- TELEGRAM_BOT_TOKEN
- ANTHROPIC_API_KEY
- LEGIO_TOTP_SECRETFor Option C (multi-instance), each instance gets its own compose file pointing to a different castra volume.
What Should Be in Blueprints vs. Castra
| Content | Location | Why |
|---|---|---|
| Legatus default prompt | blueprints/legatus/prompt.md | Source of truth, reviewed |
| Centurio prompt template | blueprints/centurio/prompt.md.template | Used by create_centurio() |
| Centurio tool template | blueprints/centurio/tools.json.template | Default MCP tool config |
| Example edicta | blueprints/edicta/*.md | Optional starter policies |
Example legio.toml | blueprints/legio.toml.example | Documents all options |
| Active legatus prompt | castra/legatus/prompt.md | May diverge from blueprint |
| Centurio instances | castra/centuriones/*/ | Created at runtime |
| Active edicta | castra/edicta/*.md | Published via /edict |
| Active acta | castra/acta/*.md | Published by centuriones |
| Commentarii | castra/centuriones/*/commentarii/ | Ephemeral per-session |
| Message history | castra/praetorium.db | Binary, never tracked |
| Auth sessions | castra/browser/sessions/ | Encrypted, never tracked |
Database Location and Runtime Data
Current state
All runtime data lives in one place:
castra/praetorium.db # SQLite — conversation history (nuntii)
castra/praetorium.db-shm # SQLite shared memory (WAL mode)
castra/praetorium.db-wal # SQLite write-ahead logThe Aerarium plan (cost tracking) would add a stipendia table to this same database. So praetorium.db becomes the single source of truth for all operational data.
What belongs in the database vs. filesystem
| Data | Storage | Why |
|---|---|---|
| Conversation history (nuntii) | praetorium.db | Relational, needs queries (by sender, by time, visibility filtering) |
| Cost records (stipendia, future) | praetorium.db | Relational, needs aggregation (SUM by sender, by day, by month) |
| Centurio prompts | castra/centuriones/*/prompt.md | Markdown files — human-readable, git-diffable, editable with any text editor |
| Edicta | castra/edicta/*.md | Same — markdown, version-controlled, readable |
| Acta | castra/acta/*.md | Same |
| Commentarii | castra/centuriones/*/commentarii/*.md | Ephemeral notes, per-session |
| Browser sessions (future) | castra/browser/sessions/*.json | Encrypted auth state, not queryable |
Principle: structured data that needs queries → SQLite. Human-readable content that benefits from git diffs → markdown files.
Database in deployment
The database is never git-tracked (binary file, grows with every message). It needs special handling:
Bare metal / VM:
castra/praetorium.db # lives on local disk
# backed up via cron + sqlite3 .backupDocker container:
volumes:
- legio-data:/app/castra # named volume, persists across restarts
# OR
- ./castra:/app/castra # bind mount, Caesar can access directlyImportant: SQLite in WAL mode works fine with Docker bind mounts on Linux (same filesystem). On macOS Docker, filesystem notification delays may cause stale reads — not a concern for single-process Legio.
Backup strategy:
SQLite's .backup command creates a consistent snapshot while the database is in use. A daily cron job:
sqlite3 castra/praetorium.db ".backup castra/backups/praetorium-$(date +%Y%m%d).db"Or for Option B/C (git-tracked castra), export key data as SQL:
sqlite3 castra/praetorium.db ".dump nuntii" > castra/exports/nuntii.sql
sqlite3 castra/praetorium.db ".dump stipendia" > castra/exports/stipendia.sql
git add castra/exports/ && git commit -m "Daily data export"This gives git history for data without tracking the binary file.
Full deployed instance layout
/app/ # or ~/legio/ on bare metal
├── legio/ # Python code (immutable in container)
├── blueprints/ # templates (immutable in container)
├── scripts/ # tools (immutable in container)
├── pyproject.toml
├── legio.toml # config (mounted or baked in)
├── .env # secrets (mounted, never baked in)
│
└── castra/ # ALL mutable state lives here
├── legatus/
│ └── prompt.md # the legatus brain
├── centuriones/
│ ├── vorenus/
│ │ ├── prompt.md # personality + skills
│ │ ├── tools.json # MCP tool config
│ │ └── commentarii/ # ephemeral session notes
│ └── pullo/
│ ├── prompt.md
│ └── tools.json
├── edicta/ # standing orders
│ └── code-review.md
├── acta/ # shared knowledge
│ └── api-patterns.md
├── browser/ # future: headless browser state
│ └── sessions/
│ └── github.json # encrypted storageState
├── backups/ # database snapshots
│ └── praetorium-20260216.db
├── exports/ # optional: SQL dumps for git
│ ├── nuntii.sql
│ └── stipendia.sql
├── praetorium.db # SQLite: nuntii + stipendia
├── praetorium.db-shm
└── praetorium.db-walOne directory to back up, one volume to mount, one path to protect. Everything mutable lives under castra/. If Caesar needs to migrate to a new server: copy castra/ + .env + legio.toml and redeploy.
Recommendation
Option B (tracked subtree) for Caesar's use case.
The legion IS the product. Every prompt, edictum, and knowledge file deserves version history. Caesar should be able to git log castra/centuriones/vorenus/prompt.md and see how a centurio evolved. Branching lets Caesar experiment safely. The .gitignore excludes only truly ephemeral data (database, commentarii, browser sessions).
The rename from templates/ to blueprints/ clarifies intent and avoids collision with Python template conventions.
Action items:
- Rename
templates/→blueprints/ - Update code that references
templates/path - Restructure
.gitignorefor Option B - Move legatus prompt source-of-truth to
blueprints/legatus/prompt.md - Add init logic to copy blueprint → castra on first launch
- Add
Dockerfile+docker-compose.ymlto repo root - Document the three-layer model in AGENTS.md or a memo