Hook runtime API

Python/Bun scripts reach maand through an in-process HTTP API on the CLI host. For when to use hooks and event names, see cli/hooks.md.

Execution model

Open DB + kv.Initialize
StartRuntimeAPI (HTTP on localhost:8080 in maand process)
Resolve worker set for the event (active, allocated, rollout batch, or filtered)
Resolve rollout order (rollout_order KV when valid, else catalog order)
Split workers into batches (max_concurrent_upgrades, or max_concurrent_starts for first-start hooks)
For each batch (sequential):
  For each allocation in the batch (parallel within batch):
    Stage tmp/workers/<ip>/jobs/<job>/ from job_files + certs from KV
    Run script on CLI host via bash (python3 or bun)
    Script reaches API at HOOK_API_HOST=127.0.0.1
Commit (CLI path) or return error to caller (deploy / health_check)

Important details:

Where code runs: bash on the CLI host, working directory = bucket root. Output is logged to structured lines in logs/<worker_ip>.log (or logs/maand.log for bucket-local). Each line includes timestamp, run id, command phase, and payload. Per-run copies live under logs/runs/<run_id>/.
Per-allocation context: one process per allocation; env vars identify worker IP and allocation UUID even though the process is local.
Worker access: use Python run_ssh, run_runner_target, or run_make_target to execute on ALLOCATION_IP. Bun scripts must shell out to ssh themselves or call a Python helper.
Staging: tmp/workers/<ip>/jobs/<job>/ mirrors deploy layout (files + certs) so scripts can read local copies if needed.
Batching: every hook event uses the same rollout order and batch width rules (see below). Batches run one after another; allocations inside a batch run in parallel.

Environment variables (hook scripts)

Per allocation

Variable	Meaning
`ALLOCATION_ID`	Stable allocation UUID
`ALLOCATION_IP`	Worker host for this invocation
`ALLOCATION_INDEX`	Zero-based index among non-removed peers (same as `<job>_allocation_index` in per-allocation KV)
`JOB`	Job name
`EVENT`	Event name (`pre_deploy`, `cli`, …)
`COMMAND`	Command name
`DISABLED`	`0` or `1` — allocation marked disabled in catalog
`CURRENT_VERSION`	Running version on this allocation (`0.0.0` before first promote)
`NEW_VERSION`	Target version from the current build/deploy plan
`HOOK_API_HOST`	Host to reach runtime API (`127.0.0.1`)
`BUCKET_ROOT`	Absolute path to the maand bucket on the CLI host

Target version is job-level KV maand/job/<job>/version and env NEW_VERSION. Running vs target for rollout logic lives in the catalog (hash.current_version, allocations.new_version) and template fields .CurrentVersion / .NewVersion.

Per batch

Every hook invocation also receives batch context for the current wave:

Variable	Meaning
`BATCH_ALLOCATIONS`	Comma-separated worker IPs in this batch
`BATCH_INDEX`	Zero-based batch index
`BATCH_COUNT`	Total batches in this run
`DEPLOY_PHASE`	Phase label (`new`, `update`, `stop`, `health_check`, `pre_deploy`, `job_control`, `cli`, …)
`ROLLOUT_ORDER`	Full comma-separated order list for this worker set
`ROLLOUT_ORDER_SOURCE`	`kv` or `default`

JOB is set on every script; batch hooks set it again alongside the batch vars above.

Event-specific

Event	Extra variables
Deploy `job_control`	`NEW_ALLOCATIONS`, `UPDATED_ALLOCATIONS`
`maand job` / `jobcontrol` `job_control`	`TARGET` (`start`, `stop`, `restart`, `reload`, or a Makefile target)

Batching

Worker order comes from rollout_order in KV when the list matches the current worker set; otherwise maand uses catalog order. Override order in pre_deploy or cli with put_rollout_order() — see kv/namespaces.md.

Batch width comes from the job manifest:

Phase	Manifest field
First deploy starts and `after_allocation_started` on start	`max_concurrent_starts` (`0` = one batch of all new allocations)
Everything else (upgrades, hooks, health commands, `cli`, `pre_deploy`, …)	`max_concurrent_upgrades` (minimum 1)

Event	Workers included
`pre_deploy`, `post_deploy`, `post_build`, `cli`	All non-removed allocations (includes disabled)
`health_check` (commands)	All non-removed allocations (includes disabled)
`after_allocation_started`	Workers in the current Makefile rollout batch
`after_allocation_stopped`	Stopped allocations for the job (grouped and batched per job during reconcile)
Deploy `job_control`	All allocated workers (including disabled)
`jobcontrol` `job_control`	Active workers matching `--allocations` filter

Manifest health_check probes (tcp / http / ssh) run on active allocations only, in rollout order with max_concurrent_upgrades batch width. Probes are internal maand checks — they do not run user scripts and do not receive batch env vars. health_check commands use non-removed allocations (includes disabled) with the same order and batch width.

Within a batch, all allocations (and all probe checks × workers in that batch) run in parallel. The next batch starts only after the current batch succeeds.

Use acquire_semaphore when parallel allocations must serialize a critical section (migrations, rate-limited APIs). Failures aggregate into a run error listing per-worker errors.

For a single batch run (BATCH_INDEX=0, BATCH_COUNT=1), see guides/hook-one-shot.md. For run-once logic inside a hook script, use is_one_shot() / isOneShot() — see hook-one-shot.md#guard-in-the-script.

Runtime HTTP API

Started by hooks.StartRuntimeAPI(tx) for the lifetime of build/deploy/health_check/hook sessions.

Property	Value
Listen address	`localhost:8080` (not exposed outside the host)
Request body	JSON, `Content-Type: application/json`
Allocation scope	Header `X-ALLOCATION-ID` (required on every route)
Event scope	Header `EVENT` (required; must match the running hook)
Command scope	Header `COMMAND` (required; used by `/demands` and semaphore scoping)

Embedded maand.py / maand.ts set these headers automatically from env vars.

Endpoint summary

Method	Path	Purpose
GET	`/kv`	Read a key from an allowed namespace
PUT	`/kv`	Write a non-secret key under `vars/job/<current job>` only
DELETE	`/kv`	Delete a key under `vars/job/<current job>` only
PUT	`/kv/secret`	Write encrypted secret under `secrets/job/<current job>`
DELETE	`/kv/secret`	Delete secret under `secrets/job/<current job>`
GET	`/kv/keys`	List keys under job-level namespaces
GET	`/demands`	List downstream hooks that depend on this job+hook
POST	`/semaphore/acquire`	Block until this allocation holds a slot
POST	`/semaphore/release`	Release a held slot
GET	`/semaphore/status?name=...`	Inspect holders and waiters

KV read vs write

Namespace pattern	GET `/kv`	PUT/DELETE `/kv`	PUT/DELETE `/kv/secret`
`maand/bucket`, `maand/worker`, `maand/worker/<ip>`, tags	✓	✗	✗
`vars/bucket`, `vars/bucket/job/<job>`	✓	✗	✗
`maand/job/<job>/worker/<ip>`	✓	✗	✗
`maand/job/<job>` key `rollout_order` only	✓	✓ on `pre_deploy` or `cli`	✗
`vars/job/<job>` (current job)	✓	✓	✗
`secrets/job/<job>` (current job)	✓ (decrypted)	✗	✓
Upstream demand jobs (`maand/job/`, `vars/job/`, `secrets/job/*`)	✓ if declared in manifest `demands`	✗	✗

Writes to other maand/* keys, vars/bucket/*, and upstream jobs are rejected. Use put_rollout_order / putRolloutOrder (or PUT /kv on maand/job/<job> + key rollout_order) to override rollout order for one deploy; build resets it on the next maand build.

PUT rollout_order body:

{
  "namespace": "maand/job/api",
  "key": "rollout_order",
  "value": "10.0.0.2,10.0.0.1"
}

from maand import put_rollout_order

put_rollout_order(["10.0.0.2", "10.0.0.1"]).raise_for_status()

GET /kv body:

{ "namespace": "vars/job/api", "key": "db_url" }

Response (200):

{ "namespace": "vars/job/api", "key": "db_url", "value": "postgres://..." }

PUT /kv body (value required; optional ttl in seconds, 0 = no expiry):

{ "namespace": "vars/job/api", "key": "db_url", "value": "postgres://...", "ttl": 3600 }

PUT /kv/secret body (optional ttl in seconds):

{ "namespace": "secrets/job/api", "key": "db_password", "value": "plain-text-secret", "ttl": 86400 }

Values are encrypted with AES-256-GCM using secrets/kv.key before storage in maand.db. Expired keys are tombstoned on maand build; maand gc purges tombstones per --retain-days — see kv/persistence.md.

DELETE /kv and /kv/secret use the same JSON body as GET (namespace + key; no value).

GET /kv/keys — optional body { "namespace": "vars/job/api" }. Omit namespace to list both vars/job/<job> and secrets/job/<job> (secret listing returns key names only, never values).

KV writes during `health_check`

PUT, DELETE, and /kv/secret writes are rejected when the EVENT header is health_check. Health check scripts must be read-only with respect to KV — use pre_deploy or post_deploy to create or update vars and secrets.

`GET /demands`

Returns hooks whose manifest demands point at this job and this command name (reverse dependency lookup).

Response (200) — array of:

{
  "job": "api",
  "hook": "hook_migrate",
  "demand_config": { "min_version": "2.0.0" }
}

When to use: a shared upstream command (e.g. hook_schema on database) can inspect who depends on it and tailor behavior using demand_config (feature flags, schema versions, etc.).

Semaphores

Coordinate cross-allocation locks inside one hook session. Scoped by job + EVENT + name — the same name under pre_deploy and post_deploy are independent semaphores.

Field	Default	Limit
`capacity`	1	1–64
`timeout_seconds`	600	max 3600

POST /semaphore/acquire body:

{ "name": "migration", "capacity": 1, "timeout_seconds": 600 }

Response (200):

{
  "name": "migration",
  "allocation_id": "<uuid>",
  "capacity": 1,
  "acquired": true
}

POST /semaphore/release body: { "name": "migration" }

GET /semaphore/status?name=migration response:

{
  "name": "migration",
  "capacity": 1,
  "holders": ["<allocation-uuid>"],
  "waiting": 0,
  "available": 0
}

When to use semaphores:

Pattern	`capacity`	Example
Single-writer migration	1	Only one allocation runs DDL at a time
Rolling batch	N	Allow N concurrent restarts against an external API
Leader bootstrap	1	First acquirer writes shared KV, others read

Always release_semaphore in a finally / try/finally block so a failed script does not hold the lock for the rest of the deploy session.

Example (Python):

from maand import acquire_semaphore, release_semaphore, allocation_ip

acquire_semaphore("migrate", capacity=1, timeout_seconds=900).raise_for_status()
try:
    run_migration_for(allocation_ip())
finally:
    release_semaphore("migrate")

Semaphores exist only in memory for the current maand process session — they do not survive CLI restart.

Python and Bun helpers

Embedded maand.py / maand.ts wrap the HTTP API. Prefer these over raw HTTP.

Context (env)

Python	Bun	Returns
`allocation_id()`	`allocationId()`	`ALLOCATION_ID`
`allocation_ip()`	`allocationIp()`	`ALLOCATION_IP`
`allocation_index()`	`allocationIndex()`	`ALLOCATION_INDEX`
`job_name()`	`jobName()`	`JOB`
`hook_event()`	`hookEvent()`	`EVENT`
`hook_name()`	`hookName()`	`HOOK`
`is_allocation_disabled()`	`isAllocationDisabled()`	`DISABLED == "1"`
`is_one_shot()`	`isOneShot()`	`BATCH_INDEX == "0" && ALLOCATION_INDEX == "0"` (leader; run once per invocation)

Aliases: get_allocation_id, get_job, kv_get, etc. (both runtimes).

KV

Python	Bun	API
`get_store_value(ns, key)`	`getStoreValue(ns, key)`	GET `/kv` → `Response`
`get_kv_value(ns, key)`	(parse JSON yourself)	GET `/kv` → plaintext `value`
`put_rollout_order(order)`	`putRolloutOrder(order)`	PUT `/kv` → `maand/job/<job>/rollout_order`
`get_rollout_order()`	`getRolloutOrder()`	GET `/kv` → `rollout_order`
`put_job_variable(key, val)`	`putJobVariable(key, val)`	PUT `/kv`
`put_job_secret(key, val)`	`putJobSecret(key, val)`	PUT `/kv/secret`
`delete_job_variable(key)`	`deleteJobVariable(key)`	DELETE `/kv`
`delete_job_secret(key)`	`deleteJobSecret(key)`	DELETE `/kv/secret`
`list_job_keys(ns=None)`	`listJobKeys(ns?)`	GET `/kv/keys`

Demands and semaphores

Python	Bun	API
`list_hook_demands()`	`listHookDemands()`	GET `/demands`
`acquire_semaphore(name, capacity=1, timeout_seconds=600)`	`acquireSemaphore(...)`	POST `/semaphore/acquire`
`release_semaphore(name)`	`releaseSemaphore(name)`	POST `/semaphore/release`
`semaphore_status(name)`	`semaphoreStatus(name)`	GET `/semaphore/status`

Worker SSH (Python only)

Function	Purpose
`load_ssh()`	Parse `maand.conf` → `(user, key_path, use_sudo)`
`run_ssh(worker_ip, remote_cmd, ...)`	Arbitrary remote command over SSH
`run_runner_target(target, ...)`	`runner.py <target> --jobs <job>` on worker (same as deploy)
`run_make_target(target, ...)`	`make -C /opt/worker/<bucket>/jobs/<job> <target>`

Bun scripts that need SSH should invoke ssh directly or call a thin Python wrapper script.

Python virtualenv (recommended)

Create a venv per job under _hooks (not copied into tmp/ during runs; maand calls the workspace interpreter):

cd workspace/jobs/<job>/_hooks
python3 -m venv .venv
source .venv/bin/activate   # optional for manual work
pip install -r requirements.txt
pip install requests        # required if scripts use maand.py

Maand uses, in order:

workspace/jobs/<job>/_hooks/.venv/bin/python3
workspace/jobs/<job>/_hooks/venv/bin/python3
python3 on your PATH

.venv, venv, node_modules, and __pycache__ are skipped during maand build file indexing.

Bun

Install Bun on the CLI host. Per job:

cd workspace/jobs/<job>/_hooks
bun install

KV persistence by context

Context	When KV writes persist to `maand.db`
`maand build`	End of main build transaction (`PersistToTransaction`).
`post_build` hooks	Separate session transaction at end of hook pass (failures fail build).
`maand hooks`	Successful CLI commit.
`maand deploy`	`kv.PersistSession()` after each job's `pre_deploy` and after each `deployJob`.
`maand health_check`	KV writes rejected (read-only).

Use pre_deploy to write secrets consumed by .tpl on the same deploy. Full persistence and purge rules: kv/persistence.md. Namespace keys: kv/namespaces.md.

HTTP API errors

HTTP	Message	Typical cause
400	`X-ALLOCATION-ID header is missing`	Raw HTTP call without header
404	`Invalid allocation ID`	Stale or wrong allocation UUID
400	`Both namespace and key are required`	Incomplete JSON body
400	`Invalid or unauthorized namespace`	Write to read-only namespace, wrong job, or upstream not in demands
404	`KV get operation failed`	Key does not exist
400	`KV writes are not allowed during health_check`	PUT/DELETE during health_check event
408	`Timed out waiting for semaphore`	`timeout_seconds` elapsed
409	`Semaphore acquire or release failed`	Release without hold, or internal conflict
415	`Content-Type must be application/json`	Missing or wrong content type
400	`Invalid JSON format`	Malformed request body

Check logs/<worker_ip>.log and CLI output when --verbose is set.

cli/hooks.md — events, patterns, checklist
kv/namespaces.md · kv/persistence.md
cli/build.md · cli/deploy.md