Core concepts

First time here? Read the guided tour (chapters 1–12) for a slower introduction. This page is a consolidated reference for bucket, worker, job, and allocation.

Maand is an agentless workload orchestrator. You run the maand CLI on a host machine (laptop, CI runner, or bastion). That host holds the bucket — local files and SQLite state. Workers are ordinary Linux hosts reached over SSH; nothing is installed on them except what you deploy (job files, runner.py, and runtime directories).

┌──────────────────── maand CLI host ────────────────────┐
│  bucket/                                               │
│    maand.db + KV    workspace/    secrets/    tmp/     │
│         │                │                             │
│         │    build reads │                             │
│         ▼                ▼                             │
│    catalog (workers, jobs, allocations)                │
└────────────────────────┬───────────────────────────────┘
                         │ ssh + rsync (deploy, job control)
         ┌───────────────┼───────────────┐
         ▼               ▼               ▼
    Worker A         Worker B         Worker C
    /opt/worker/<bucket_id>/

Official site: maand.sh


Bucket

A bucket is one maand project directory. Run all maand commands from that directory (or with bucket.Location set to it in tests).

Path Role
maand.conf SSH user/key, sudo, cert TTL, job_config_selector, log_format
data/maand.db SQLite catalog: workers, jobs, allocations, hashes, KV history
workspace/ Source of truth you edit: workers.json, jobs/<name>/, optional disabled.json
secrets/ CA, worker SSH private key, KV encryption key
tmp/ Staging for deploy and hook workspaces
logs/ Structured command logs from deploy, rsync, SSH, hooks (logs/runs/<run_id>/ per invocation)

Each bucket has a stable bucket_id (UUID) and an update_seq incremented on every deploy. Workers store both in worker.json so manual job control can detect drift.

Build reads workspace/ → updates maand.db and KV. Deploy reads the DB → rsyncs to workers. Workers are never the source of truth for catalog data.


Worker

A worker is a cluster node — an SSH target identified by IP or hostname.

Definition

Workers are declared in workspace/workers.json:

[
  {
    "host": "10.0.0.1",
    "labels": ["worker", "gpu"],
    "memory": "8192 mb",
    "cpu": "4000 mhz",
    "tags": { "zone": "a", "rack": "1" }
  }
]
Field Meaning
host SSH address (unique)
labels Used for job placement; label worker is added automatically
memory / cpu Capacity for resource validation when jobs declare limits
tags Arbitrary metadata → KV namespace maand/worker/<ip>/tags/<key>
position Order in the array (assigned on read)

On the worker after deploy

/opt/worker/<bucket_id>/
├── worker.json          # bucket_id, worker_id, update_seq, labels
├── jobs.json            # list of jobs on this worker (+ disabled flag)
├── bin/runner.py        # runs Makefile targets via SSH job control
└── jobs/<job>/          # staged job tree (Makefile, configs, data/, logs/, bin/)

Worker lifecycle

Event What happens
Add host to workers.json + build New row in worker table
maand collect facts Probe host over SSH; merge memory / cpu into workers.json (then maand build to sync catalog)
Deploy Creates /opt/worker/<bucket_id>/, syncs jobs
Remove host from workers.json + build Allocations marked removed; worker row dropped from catalog
Deploy after removal Stop job; remove deployed files; keep data/ and logs/; clear allocation hash on deploy (redeploy starts fresh, reuses data/logs)
GC Delete worker data//logs//bin/; purge removed allocation rows and KV

Workers do not run a maand agent. Deploy and maand job invoke runner.py over SSH; maand collect facts probes capacity into workers.json; maand run_command runs arbitrary shell commands over SSH.


Job

A job is a deployable unit — a directory under workspace/jobs/<name>/.

Required layout

workspace/jobs/api/
├── manifest.json       # version, selectors, resources, commands, certs
├── Makefile            # start / stop / restart (default deploy lifecycle)
├── _hooks/           # optional: hook_<name>.py | .ts | .js
├── config.tpl          # optional: Go templates → rendered at deploy
└── …                   # other files copied into maand.db on build

Manifest highlights

Field Purpose
version Semver-like release id; required when the job participates in the dependency graph; drives deploy new_version per allocation
selectors Worker labels for placement; when omitted, the job name is used
resources.memory / cpu Min/max bounds in the manifest; actual memory/CPU for the current environment from bucket.jobs.conf or bucket.jobs.<env>.conf (selected by job_config_selector in maand.conf) — resources-and-placement.md
resources.ports Named ports: {} (maand assigns from bucket.conf pool) or an integer (fixed in manifest)
max_concurrent_upgrades Rolling batch size for restart / reload during deploy
restart_policy always / reload / never — how upgrades apply after rsync (deploy)
restart_globs Optional; with reload, paths that force restart when changed
max_concurrent_starts Rolling batch size for start on first deploy (0 = all at once)
hooks Named hooks (hook_*) with executed_on events
health_check Optional built-in probes (tcp/http/ssh) on active allocations and/or a health_check command on non-removed allocations (probes first)
certs TLS definitions → generated at build, deployed under jobs/<job>/certs/certs.md

Manifest reference: manifest.md. Configuration: configuration.md. Command scripts: cli/hooks.md.

Job lifecycle on workers

Each deploy wave rsyncs the job tree, then optionally runs Makefile targets on the worker:

  1. New allocationmake start
  2. Upgrade → depends on restart_policy:
    • alwaysmake restart
    • reloadmake reload, or make restart when a changed file matches restart_globs
    • never → no lifecycle (files only)

The Makefile receives CURRENT_VERSION (running) and NEW_VERSION (target). Custom rollouts use a job_control command instead of the default targets.

Runtime state lives in data/, logs/, and bin/ on the worker — not in the workspace (build rejects those dirs in git).


Allocation

An allocation is the binding (job × worker): “run job api on worker 10.0.0.1.”

Maand creates allocations automatically during build by label matching:

  1. For each worker, collect its labels (including worker).
  2. For each job, use manifest selectors when set; otherwise use the job name as the selector.
  3. Require every selector to appear on the worker.
  4. Insert or update a row in allocations for each match.
workers.json              jobs/api/manifest.json
  labels: [worker,         selectors: [worker, prod]
           prod]                   │
       │                           │  selectors: [worker, prod]
       └──── all match? ───────────┘
                 │
                 ▼
        allocation (api @ 10.0.0.1)
        alloc_id = hash("api|10.0.0.1")

Dedicated jobs can omit manifest selectors when the worker carries the job name (for example job prometheus on a worker labeled prometheus).

Allocation fields

Column Meaning
alloc_id Stable UUID derived from job + worker IP
worker_ip / job The pair
disabled 1 when excluded via disabled.json or resource rules
removed 1 when worker or job left the workspace (soft delete until deploy/GC)
deployment_seq Wave order during deploy (from command demands)

Each allocation tracks catalog current_version (last promoted, in hash) and new_version (build target, in allocations). Job-level KV maand/job/<job>/version holds the manifest target. Templates and hooks expose running vs target via .CurrentVersion / .NewVersion and CURRENT_VERSION / NEW_VERSION — see deploy.md.

Active vs inactive

An allocation is active when removed = 0 and disabled = 0. A disabled allocation (removed = 0, disabled = 1) still gets build KV (certs, per-allocation metadata, deploy staging) and hook fan-out (post_build, pre_deploy, post_deploy, health_check commands, cli). Manifest health probes skip disabled allocations. Deploy never starts disabled allocations (no start/restart/reload/rsync). Content and version changes are still staged and promoted; after re-enable, deploy starts the allocation.

KV nuance: maand/job/<job>/workers, maand/job/<job>/rollout_order, and maand/worker/<ip>/jobs list active allocations only. Per-allocation keys such as peer_workers use non-removed peers (disabled peers may still appear). See KV namespaces.

Only active allocations receive:

Inspect allocations:

maand cat allocations
maand cat allocations --jobs api --workers 10.0.0.1

Disabling without removing

workspace/disabled.json marks allocations disabled without deleting workspace files:

{
  "jobs": {
    "api": { "allocations": ["10.0.0.2"] }
  },
  "workers": ["10.0.0.3"]
}

Run maand build after editing disabled.json.

Full how-to (disable one allocation, entire job, entire worker, re-enable): disable and drain.


How the three relate

Concept Question it answers Example
Worker Where can work run? 10.0.0.1 with labels worker, gpu
Job What workload? api — manifest, Makefile, files
Allocation Which job on which worker? api on 10.0.0.1, alloc_id=…

One job typically has many allocations (one per matching worker). One worker typically hosts many jobs (one allocation per job).


Deployment sequence

Jobs that depend on each other (via demands in manifest hooks) get a deployment_seq during build. Deploy processes sequence 0, then 1, and so on — so depended-on jobs reach workers before dependents.

Full reference: deployment-sequence.md.

Details in cli/build.md and cli/deploy.md.


Hooks vs Makefile vs run_command

Mechanism Runs on Trigger Use case
Makefile (start/stop/restart/reload) Worker Deploy, maand job Process lifecycle
Hooks (hook_*) CLI host build/deploy/CLI/health_check Migrations, KV, hooks
maand run_command Worker (raw shell) Manual Ops, debugging
maand collect facts Worker (probe) Manual Fill memory / cpu in workers.json

Hooks talk to maand’s runtime HTTP API and KV store on the CLI host. See cli/hooks.md and hook-api.md.


Typical state flow

edit workspace/workers.json, workspace/jobs/*
        │
        ▼
   maand build          ← catalog + KV + certs; no SSH to workers
        │
        ▼
   maand deploy         ← rsync + lifecycle (start/restart/reload) + hooks
        │
        ├── maand health_check
        ├── maand job restart <job>
        ├── maand hooks <cmd> [job]
        ├── maand collect facts --generate-workers > workspace/workers.json   # optional: refresh worker capacity
        ├── maand run_command "…"
        └── maand gc     ← after removals

Further reading