Filesystem Architecture: Project vs. Deployed Instance

Date: 2026-02-16 Status: Research Author: Antony

The whole legion is a bunch of markdown files and scripts in a file tree. Caesar can version-control the entire system through git. How should we organize the project repo vs. the running instance's state?

The Insight

A centurio is not primarily code. A centurio is:

castra/centuriones/vorenus/prompt.md — personality, skills, rules
castra/centuriones/vorenus/tools.json — tool configuration
castra/centuriones/vorenus/commentarii/*.md — private memory

An edictum is castra/edicta/security-review.md. An actum is castra/acta/api-patterns.md. The legatus itself is castra/legatus/prompt.md.

The Python code (legio/) is infrastructure — plumbing that reads these files and routes them to the Claude API. The actual "intelligence" of the legion lives in the filesystem. This means:

The legion is git-native. Every centurio creation, edictum publish, and prompt edit is a file change that can be committed, branched, diffed, and reverted.
Caesar can fork a legion. Clone the repo, tweak prompts, deploy a second instance with different personality/rules.
Prompts are code. They deserve the same review, versioning, and CI discipline as Python modules.

Current State: What's Where

What IS git-tracked (the project)

legio/                          # Python infrastructure
├── __main__.py                 # startup orchestration
├── legatus.py                  # orchestrator agent
├── session.py                  # SDK session lifecycle
├── centurio.py                 # centurio data model
├── nuntius.py                  # message model
├── praetorium.py               # SQLite message bus
├── config.py                   # config loading
├── errors.py                   # domain exceptions
├── totp.py                     # OTP verification
├── auctoritas.py               # authorization state machine
├── rendering.py                # formatting, templates
├── memoria/                    # knowledge store
│   ├── store.py
│   └── tools.py
└── telegram/                   # Telegram bot
    ├── bot.py
    ├── commands.py
    └── utils.py

templates/                      # blueprints for new centuriones
├── centurio/
│   ├── prompt.md.template
│   └── tools.json.template
└── legatus/
    └── prompt.md.template

tests/                          # 100% coverage test suite
scripts/                        # quality enforcement
dev-docs/                       # frozen documentation
.claude/rules/                  # coding standards

legio.toml                      # non-secret configuration
pyproject.toml                  # Python project config
AGENTS.md                       # shared agent instructions

What is NOT git-tracked (runtime state)

castra/                         # .gitignored living state
├── centuriones/                # centurio prompts + memory
│   └── vorenus/
│       ├── prompt.md           # ← the centurio's soul
│       ├── tools.json          # ← tool configuration
│       └── commentarii/        # ← private notes
│           └── api-design.md
├── edicta/                     # standing orders
│   └── code-review.md
├── acta/                       # shared knowledge
│   └── database-patterns.md
├── legatus/
│   └── prompt.md               # ← the legatus's brain
└── praetorium.db               # SQLite message history

.env                            # secrets (API keys, TOTP)

The Problem

castra/ is half-ignored. The .gitignore excludes centurio data, edicta, and acta (correct — these are instance state), but tracks castra/legatus/prompt.md and the directory structure. This creates ambiguity:

Legatus prompt is tracked, but centurio prompts are not
Templates exist but aren't clearly separated from instance state
No way to ship a "starter legion" with pre-configured centuriones
Database is obviously not tracked (binary), but its schema is embedded in Python code, not versioned independently

Proposed Structure: Three Layers

Layer 1: Code (always git-tracked)

The infrastructure. Python modules, tests, scripts, project config. Changes here require code review, tests, linting.

legio/                          # Python code
tests/                          # test suite
scripts/                        # quality tools
pyproject.toml                  # project metadata

Layer 2: Blueprints (always git-tracked)

The design of the legion. Templates, default prompts, example configurations. These define what a fresh deployment looks like.

blueprints/                     # was: templates/
├── legatus/
│   └── prompt.md               # default legatus prompt
├── centurio/
│   ├── prompt.md.template      # template for new centuriones
│   └── tools.json.template     # default tool config
├── edicta/                     # starter edicta (optional)
│   └── code-review.md          # example standing order
└── legio.toml.example          # example config with all options

Why rename templates/ to blueprints/? Two reasons:

templates/ is a Python convention (Jinja, Django). Confusing.
Roman vocabulary: blueprint = "exemplar" if we want Latin, but blueprints/ is clearer in English and the directory is infrastructure.

Layer 3: Castra (instance state, git-optional)

The living deployment. Created from blueprints at first launch, then evolves through operation. This is where Caesar's decision matters:

Option A — Castra fully gitignored (current default)

castra/                         # .gitignored
├── centuriones/*/              # created at runtime
├── edicta/*/                   # created via /edict
├── acta/*/                     # created by centuriones
├── legatus/prompt.md           # copied from blueprint at init
└── praetorium.db               # SQLite, never tracked

Best for: deployments where castra is disposable or backed up separately. Clean separation between code and state.

Option B — Castra as a tracked subtree (Caesar's legion-as-repo)

castra/                         # tracked (except db + secrets)
├── centuriones/
│   ├── vorenus/
│   │   ├── prompt.md           # ← version-controlled personality
│   │   ├── tools.json
│   │   └── commentarii/        # ← .gitignored (ephemeral)
│   └── pullo/
│       ├── prompt.md
│       └── tools.json
├── edicta/
│   └── code-review.md          # ← version-controlled policy
├── acta/
│   └── api-patterns.md         # ← version-controlled knowledge
├── legatus/
│   └── prompt.md
└── .gitignore                  # ignores praetorium.db, commentarii

Best for: Caesar who treats the legion itself as a product. Every centurio creation, prompt tweak, and edictum is a git commit. Can branch (git checkout -b experiment/aggressive-reviewer), test, and merge back.

Option C — Castra as a separate git repo (multi-instance)

legio/                          # code repo (github.com/xiaolai/legio)
castra-production/              # state repo (github.com/xiaolai/legio-castra-prod)
castra-staging/                 # state repo (github.com/xiaolai/legio-castra-staging)

Best for: running multiple legion instances from the same codebase with different centurio rosters and edicta. Each castra is its own repo with its own commit history.

Gitignore Redesign by Option

Option A (fully ignored — current, cleaned up)

gitignore

# Castra: runtime state, not source
castra/centuriones/
castra/edicta/
castra/acta/
castra/legatus/prompt.md
castra/praetorium.db*
!castra/**/.gitkeep

Option B (tracked subtree — recommended)

gitignore

# Castra: track prompts + edicta + acta, ignore runtime data
castra/praetorium.db*
castra/centuriones/*/commentarii/
castra/browser/sessions/        # encrypted auth state (future)
!castra/**/.gitkeep

Option C (separate repo)

The code repo's .gitignore ignores castra/ entirely. The castra repo has its own .gitignore:

gitignore

# In the castra repo
praetorium.db*
centuriones/*/commentarii/
browser/sessions/

Deployment: Init Flow

Regardless of option, first launch needs a bootstrap that populates castra from blueprints:

legio --init                    # or handled in __main__.py

1. Create castra/ directory structure
2. Copy blueprints/legatus/prompt.md → castra/legatus/prompt.md
3. Ensure edicta/, acta/, centuriones/ dirs exist
4. Initialize praetorium.db with schema
5. (Option B) git add castra/ && git commit -m "Initialize castra"

This already partially exists in __main__.py (castra dir creation). Extending it to copy the legatus prompt from blueprints is trivial.

Container Deployment

Docker image (immutable)
├── legio/          ← baked in
├── blueprints/     ← baked in
├── scripts/        ← baked in
└── pyproject.toml  ← baked in

Volume mount (persistent, per-instance)
├── castra/         ← mounted from host or named volume
├── .env            ← mounted or injected via env vars
└── legio.toml      ← mounted (or baked in with defaults)

dockerfile

FROM python:3.14-slim
WORKDIR /app
COPY legio/ legio/
COPY blueprints/ blueprints/
COPY pyproject.toml .
RUN pip install --no-cache-dir .
VOLUME ["/app/castra"]
ENV CASTRA_DIR=/app/castra
CMD ["python", "-m", "legio"]

yaml

# docker-compose.yml
services:
  legio:
    build: .
    volumes:
      - ./castra:/app/castra          # or named volume
      - ./.env:/app/.env:ro
      - ./legio.toml:/app/legio.toml:ro
    environment:
      - TELEGRAM_BOT_TOKEN
      - ANTHROPIC_API_KEY
      - LEGIO_TOTP_SECRET

For Option C (multi-instance), each instance gets its own compose file pointing to a different castra volume.

What Should Be in Blueprints vs. Castra

Content	Location	Why
Legatus default prompt	`blueprints/legatus/prompt.md`	Source of truth, reviewed
Centurio prompt template	`blueprints/centurio/prompt.md.template`	Used by `create_centurio()`
Centurio tool template	`blueprints/centurio/tools.json.template`	Default MCP tool config
Example edicta	`blueprints/edicta/*.md`	Optional starter policies
Example `legio.toml`	`blueprints/legio.toml.example`	Documents all options
Active legatus prompt	`castra/legatus/prompt.md`	May diverge from blueprint
Centurio instances	`castra/centuriones/*/`	Created at runtime
Active edicta	`castra/edicta/*.md`	Published via `/edict`
Active acta	`castra/acta/*.md`	Published by centuriones
Commentarii	`castra/centuriones/*/commentarii/`	Ephemeral per-session
Message history	`castra/praetorium.db`	Binary, never tracked
Auth sessions	`castra/browser/sessions/`	Encrypted, never tracked

Database Location and Runtime Data

Current state

All runtime data lives in one place:

castra/praetorium.db            # SQLite — conversation history (nuntii)
castra/praetorium.db-shm        # SQLite shared memory (WAL mode)
castra/praetorium.db-wal        # SQLite write-ahead log

The Aerarium plan (cost tracking) would add a stipendia table to this same database. So praetorium.db becomes the single source of truth for all operational data.

What belongs in the database vs. filesystem

Data	Storage	Why
Conversation history (nuntii)	`praetorium.db`	Relational, needs queries (by sender, by time, visibility filtering)
Cost records (stipendia, future)	`praetorium.db`	Relational, needs aggregation (SUM by sender, by day, by month)
Centurio prompts	`castra/centuriones/*/prompt.md`	Markdown files — human-readable, git-diffable, editable with any text editor
Edicta	`castra/edicta/*.md`	Same — markdown, version-controlled, readable
Acta	`castra/acta/*.md`	Same
Commentarii	`castra/centuriones//commentarii/.md`	Ephemeral notes, per-session
Browser sessions (future)	`castra/browser/sessions/*.json`	Encrypted auth state, not queryable

Principle: structured data that needs queries → SQLite. Human-readable content that benefits from git diffs → markdown files.

Database in deployment

The database is never git-tracked (binary file, grows with every message). It needs special handling:

Bare metal / VM:

castra/praetorium.db            # lives on local disk
                                # backed up via cron + sqlite3 .backup

Docker container:

yaml

volumes:
  - legio-data:/app/castra      # named volume, persists across restarts
  # OR
  - ./castra:/app/castra        # bind mount, Caesar can access directly

Important: SQLite in WAL mode works fine with Docker bind mounts on Linux (same filesystem). On macOS Docker, filesystem notification delays may cause stale reads — not a concern for single-process Legio.

Backup strategy:

SQLite's .backup command creates a consistent snapshot while the database is in use. A daily cron job:

bash

sqlite3 castra/praetorium.db ".backup castra/backups/praetorium-$(date +%Y%m%d).db"

Or for Option B/C (git-tracked castra), export key data as SQL:

bash

sqlite3 castra/praetorium.db ".dump nuntii" > castra/exports/nuntii.sql
sqlite3 castra/praetorium.db ".dump stipendia" > castra/exports/stipendia.sql
git add castra/exports/ && git commit -m "Daily data export"

This gives git history for data without tracking the binary file.

Full deployed instance layout

/app/                           # or ~/legio/ on bare metal
├── legio/                      # Python code (immutable in container)
├── blueprints/                 # templates (immutable in container)
├── scripts/                    # tools (immutable in container)
├── pyproject.toml
├── legio.toml                  # config (mounted or baked in)
├── .env                        # secrets (mounted, never baked in)
│
└── castra/                     # ALL mutable state lives here
    ├── legatus/
    │   └── prompt.md           # the legatus brain
    ├── centuriones/
    │   ├── vorenus/
    │   │   ├── prompt.md       # personality + skills
    │   │   ├── tools.json      # MCP tool config
    │   │   └── commentarii/    # ephemeral session notes
    │   └── pullo/
    │       ├── prompt.md
    │       └── tools.json
    ├── edicta/                 # standing orders
    │   └── code-review.md
    ├── acta/                   # shared knowledge
    │   └── api-patterns.md
    ├── browser/                # future: headless browser state
    │   └── sessions/
    │       └── github.json     # encrypted storageState
    ├── backups/                # database snapshots
    │   └── praetorium-20260216.db
    ├── exports/                # optional: SQL dumps for git
    │   ├── nuntii.sql
    │   └── stipendia.sql
    ├── praetorium.db           # SQLite: nuntii + stipendia
    ├── praetorium.db-shm
    └── praetorium.db-wal

One directory to back up, one volume to mount, one path to protect. Everything mutable lives under castra/. If Caesar needs to migrate to a new server: copy castra/ + .env + legio.toml and redeploy.

Recommendation

Option B (tracked subtree) for Caesar's use case.

The legion IS the product. Every prompt, edictum, and knowledge file deserves version history. Caesar should be able to git log castra/centuriones/vorenus/prompt.md and see how a centurio evolved. Branching lets Caesar experiment safely. The .gitignore excludes only truly ephemeral data (database, commentarii, browser sessions).

The rename from templates/ to blueprints/ clarifies intent and avoids collision with Python template conventions.

Action items:

Rename templates/ → blueprints/
Update code that references templates/ path
Restructure .gitignore for Option B
Move legatus prompt source-of-truth to blueprints/legatus/prompt.md
Add init logic to copy blueprint → castra on first launch
Add Dockerfile + docker-compose.yml to repo root
Document the three-layer model in AGENTS.md or a memo

Filesystem Architecture: Project vs. Deployed Instance ​

The Insight ​

Current State: What's Where ​

What IS git-tracked (the project) ​

What is NOT git-tracked (runtime state) ​

The Problem ​

Proposed Structure: Three Layers ​

Layer 1: Code (always git-tracked) ​

Layer 2: Blueprints (always git-tracked) ​

Layer 3: Castra (instance state, git-optional) ​

Gitignore Redesign by Option ​

Option A (fully ignored — current, cleaned up) ​

Option B (tracked subtree — recommended) ​

Option C (separate repo) ​

Deployment: Init Flow ​

Container Deployment ​

What Should Be in Blueprints vs. Castra ​

Database Location and Runtime Data ​

Current state ​

What belongs in the database vs. filesystem ​

Database in deployment ​

Full deployed instance layout ​

Recommendation ​

Filesystem Architecture: Project vs. Deployed Instance

The Insight

Current State: What's Where

What IS git-tracked (the project)

What is NOT git-tracked (runtime state)

The Problem

Proposed Structure: Three Layers

Layer 1: Code (always git-tracked)

Layer 2: Blueprints (always git-tracked)

Layer 3: Castra (instance state, git-optional)

Gitignore Redesign by Option

Option A (fully ignored — current, cleaned up)

Option B (tracked subtree — recommended)

Option C (separate repo)

Deployment: Init Flow

Container Deployment

What Should Be in Blueprints vs. Castra

Database Location and Runtime Data

Current state

What belongs in the database vs. filesystem

Database in deployment

Full deployed instance layout

Recommendation