Core concepts
First time here? Read the guided tour (chapters 1–12) for a slower introduction. This page is a consolidated reference for bucket, worker, job, and allocation.
Maand is an agentless workload orchestrator. You run the maand CLI on a host machine (laptop, CI runner, or bastion). That host holds the bucket — local files and SQLite state. Workers are ordinary Linux hosts reached over SSH; nothing is installed on them except what you deploy (job files, runner.py, and runtime directories).
┌──────────────────── maand CLI host ────────────────────┐
│ bucket/ │
│ maand.db + KV workspace/ secrets/ tmp/ │
│ │ │ │
│ │ build reads │ │
│ ▼ ▼ │
│ catalog (workers, jobs, allocations) │
└────────────────────────┬───────────────────────────────┘
│ ssh + rsync (deploy, job control)
┌───────────────┼───────────────┐
▼ ▼ ▼
Worker A Worker B Worker C
/opt/worker/<bucket_id>/
Official site: maand.sh
Bucket
A bucket is one maand project directory. Run all maand commands from that directory (or with bucket.Location set to it in tests).
| Path | Role |
|---|---|
maand.conf |
SSH user/key, sudo, cert TTL, job_config_selector, log_format |
data/maand.db |
SQLite catalog: workers, jobs, allocations, hashes, KV history |
workspace/ |
Source of truth you edit: workers.json, jobs/<name>/, optional disabled.json |
secrets/ |
CA, worker SSH private key, KV encryption key |
tmp/ |
Staging for deploy and hook workspaces |
logs/ |
Structured command logs from deploy, rsync, SSH, hooks (logs/runs/<run_id>/ per invocation) |
Each bucket has a stable bucket_id (UUID) and an update_seq incremented on every deploy. Workers store both in worker.json so manual job control can detect drift.
Build reads workspace/ → updates maand.db and KV. Deploy reads the DB → rsyncs to workers. Workers are never the source of truth for catalog data.
Worker
A worker is a cluster node — an SSH target identified by IP or hostname.
Definition
Workers are declared in workspace/workers.json:
[
{
"host": "10.0.0.1",
"labels": ["worker", "gpu"],
"memory": "8192 mb",
"cpu": "4000 mhz",
"tags": { "zone": "a", "rack": "1" }
}
]
| Field | Meaning |
|---|---|
host |
SSH address (unique) |
labels |
Used for job placement; label worker is added automatically |
memory / cpu |
Capacity for resource validation when jobs declare limits |
tags |
Arbitrary metadata → KV namespace maand/worker/<ip>/tags/<key> |
position |
Order in the array (assigned on read) |
On the worker after deploy
/opt/worker/<bucket_id>/
├── worker.json # bucket_id, worker_id, update_seq, labels
├── jobs.json # list of jobs on this worker (+ disabled flag)
├── bin/runner.py # runs Makefile targets via SSH job control
└── jobs/<job>/ # staged job tree (Makefile, configs, data/, logs/, bin/)
Worker lifecycle
| Event | What happens |
|---|---|
Add host to workers.json + build |
New row in worker table |
maand collect facts |
Probe host over SSH; merge memory / cpu into workers.json (then maand build to sync catalog) |
| Deploy | Creates /opt/worker/<bucket_id>/, syncs jobs |
Remove host from workers.json + build |
Allocations marked removed; worker row dropped from catalog |
| Deploy after removal | Stop job; remove deployed files; keep data/ and logs/; clear allocation hash on deploy (redeploy starts fresh, reuses data/logs) |
| GC | Delete worker data//logs//bin/; purge removed allocation rows and KV |
Workers do not run a maand agent. Deploy and maand job invoke runner.py over SSH; maand collect facts probes capacity into workers.json; maand run_command runs arbitrary shell commands over SSH.
Job
A job is a deployable unit — a directory under workspace/jobs/<name>/.
Required layout
workspace/jobs/api/
├── manifest.json # version, selectors, resources, commands, certs
├── Makefile # start / stop / restart (default deploy lifecycle)
├── _hooks/ # optional: hook_<name>.py | .ts | .js
├── config.tpl # optional: Go templates → rendered at deploy
└── … # other files copied into maand.db on build
Manifest highlights
| Field | Purpose |
|---|---|
version |
Semver-like release id; required when the job participates in the dependency graph; drives deploy new_version per allocation |
selectors |
Worker labels for placement; when omitted, the job name is used |
resources.memory / cpu |
Min/max bounds in the manifest; actual memory/CPU for the current environment from bucket.jobs.conf or bucket.jobs.<env>.conf (selected by job_config_selector in maand.conf) — resources-and-placement.md |
resources.ports |
Named ports: {} (maand assigns from bucket.conf pool) or an integer (fixed in manifest) |
max_concurrent_upgrades |
Rolling batch size for restart / reload during deploy |
restart_policy |
always / reload / never — how upgrades apply after rsync (deploy) |
restart_globs |
Optional; with reload, paths that force restart when changed |
max_concurrent_starts |
Rolling batch size for start on first deploy (0 = all at once) |
hooks |
Named hooks (hook_*) with executed_on events |
health_check |
Optional built-in probes (tcp/http/ssh) on active allocations and/or a health_check command on non-removed allocations (probes first) |
certs |
TLS definitions → generated at build, deployed under jobs/<job>/certs/ — certs.md |
Manifest reference: manifest.md. Configuration: configuration.md. Command scripts: cli/hooks.md.
Job lifecycle on workers
Each deploy wave rsyncs the job tree, then optionally runs Makefile targets on the worker:
- New allocation →
make start - Upgrade → depends on
restart_policy:always→make restartreload→make reload, ormake restartwhen a changed file matchesrestart_globsnever→ no lifecycle (files only)
The Makefile receives CURRENT_VERSION (running) and NEW_VERSION (target). Custom rollouts use a job_control command instead of the default targets.
Runtime state lives in data/, logs/, and bin/ on the worker — not in the workspace (build rejects those dirs in git).
Allocation
An allocation is the binding (job × worker): “run job api on worker 10.0.0.1.”
Maand creates allocations automatically during build by label matching:
- For each worker, collect its labels (including
worker). - For each job, use manifest
selectorswhen set; otherwise use the job name as the selector. - Require every selector to appear on the worker.
- Insert or update a row in
allocationsfor each match.
workers.json jobs/api/manifest.json
labels: [worker, selectors: [worker, prod]
prod] │
│ │ selectors: [worker, prod]
└──── all match? ───────────┘
│
▼
allocation (api @ 10.0.0.1)
alloc_id = hash("api|10.0.0.1")
Dedicated jobs can omit manifest selectors when the worker carries the job name (for example job prometheus on a worker labeled prometheus).
Allocation fields
| Column | Meaning |
|---|---|
alloc_id |
Stable UUID derived from job + worker IP |
worker_ip / job |
The pair |
disabled |
1 when excluded via disabled.json or resource rules |
removed |
1 when worker or job left the workspace (soft delete until deploy/GC) |
deployment_seq |
Wave order during deploy (from command demands) |
Each allocation tracks catalog current_version (last promoted, in hash) and new_version (build target, in allocations). Job-level KV maand/job/<job>/version holds the manifest target. Templates and hooks expose running vs target via .CurrentVersion / .NewVersion and CURRENT_VERSION / NEW_VERSION — see deploy.md.
Active vs inactive
An allocation is active when removed = 0 and disabled = 0. A disabled allocation (removed = 0, disabled = 1) still gets build KV (certs, per-allocation metadata, deploy staging) and hook fan-out (post_build, pre_deploy, post_deploy, health_check commands, cli). Manifest health probes skip disabled allocations. Deploy never starts disabled allocations (no start/restart/reload/rsync). Content and version changes are still staged and promoted; after re-enable, deploy starts the allocation.
KV nuance: maand/job/<job>/workers, maand/job/<job>/rollout_order, and maand/worker/<ip>/jobs list active allocations only. Per-allocation keys such as peer_workers use non-removed peers (disabled peers may still appear). See KV namespaces.
Only active allocations receive:
- Deploy rsync and lifecycle (start/restart/reload) plus rollout hooks that run on workers
maand hooks,maand health_check- Default targets for
maand job start|stop|restart|run --target reload
Inspect allocations:
maand cat allocations
maand cat allocations --jobs api --workers 10.0.0.1
Disabling without removing
workspace/disabled.json marks allocations disabled without deleting workspace files:
{
"jobs": {
"api": { "allocations": ["10.0.0.2"] }
},
"workers": ["10.0.0.3"]
}
- Disable all instances of a job:
"api": {} - Disable one worker for a job:
"allocations": ["10.0.0.2"] - Disable every job on a worker:
"workers": ["10.0.0.3"]
Run maand build after editing disabled.json.
Full how-to (disable one allocation, entire job, entire worker, re-enable): disable and drain.
How the three relate
| Concept | Question it answers | Example |
|---|---|---|
| Worker | Where can work run? | 10.0.0.1 with labels worker, gpu |
| Job | What workload? | api — manifest, Makefile, files |
| Allocation | Which job on which worker? | api on 10.0.0.1, alloc_id=… |
One job typically has many allocations (one per matching worker). One worker typically hosts many jobs (one allocation per job).
Deployment sequence
Jobs that depend on each other (via demands in manifest hooks) get a deployment_seq during build. Deploy processes sequence 0, then 1, and so on — so depended-on jobs reach workers before dependents.
Full reference: deployment-sequence.md.
Details in cli/build.md and cli/deploy.md.
Hooks vs Makefile vs run_command
| Mechanism | Runs on | Trigger | Use case |
|---|---|---|---|
Makefile (start/stop/restart/reload) |
Worker | Deploy, maand job |
Process lifecycle |
Hooks (hook_*) |
CLI host | build/deploy/CLI/health_check | Migrations, KV, hooks |
maand run_command |
Worker (raw shell) | Manual | Ops, debugging |
maand collect facts |
Worker (probe) | Manual | Fill memory / cpu in workers.json |
Hooks talk to maand’s runtime HTTP API and KV store on the CLI host. See cli/hooks.md and hook-api.md.
Typical state flow
edit workspace/workers.json, workspace/jobs/*
│
▼
maand build ← catalog + KV + certs; no SSH to workers
│
▼
maand deploy ← rsync + lifecycle (start/restart/reload) + hooks
│
├── maand health_check
├── maand job restart <job>
├── maand hooks <cmd> [job]
├── maand collect facts --generate-workers > workspace/workers.json # optional: refresh worker capacity
├── maand run_command "…"
└── maand gc ← after removals
Further reading
- overview.md — capabilities and limits
- quickstart.md — hands-on first deploy
- reference/README.md — configuration, manifest, CLI, KV, logging
- ../README.md — full doc index