Hook runtime API

Python/Bun scripts reach maand through an in-process HTTP API on the CLI host. For when to use hooks and event names, see cli/hooks.md.


Execution model

Open DB + kv.Initialize
StartRuntimeAPI (HTTP on localhost:8080 in maand process)
Resolve worker set for the event (active, allocated, rollout batch, or filtered)
Resolve rollout order (rollout_order KV when valid, else catalog order)
Split workers into batches (max_concurrent_upgrades, or max_concurrent_starts for first-start hooks)
For each batch (sequential):
  For each allocation in the batch (parallel within batch):
    Stage tmp/workers/<ip>/jobs/<job>/ from job_files + certs from KV
    Run script on CLI host via bash (python3 or bun)
    Script reaches API at HOOK_API_HOST=127.0.0.1
Commit (CLI path) or return error to caller (deploy / health_check)

Important details:

Environment variables (hook scripts)

Per allocation

Variable Meaning
ALLOCATION_ID Stable allocation UUID
ALLOCATION_IP Worker host for this invocation
ALLOCATION_INDEX Zero-based index among non-removed peers (same as <job>_allocation_index in per-allocation KV)
JOB Job name
EVENT Event name (pre_deploy, cli, …)
COMMAND Command name
DISABLED 0 or 1 — allocation marked disabled in catalog
CURRENT_VERSION Running version on this allocation (0.0.0 before first promote)
NEW_VERSION Target version from the current build/deploy plan
HOOK_API_HOST Host to reach runtime API (127.0.0.1)
BUCKET_ROOT Absolute path to the maand bucket on the CLI host

Target version is job-level KV maand/job/<job>/version and env NEW_VERSION. Running vs target for rollout logic lives in the catalog (hash.current_version, allocations.new_version) and template fields .CurrentVersion / .NewVersion.

Per batch

Every hook invocation also receives batch context for the current wave:

Variable Meaning
BATCH_ALLOCATIONS Comma-separated worker IPs in this batch
BATCH_INDEX Zero-based batch index
BATCH_COUNT Total batches in this run
DEPLOY_PHASE Phase label (new, update, stop, health_check, pre_deploy, job_control, cli, …)
ROLLOUT_ORDER Full comma-separated order list for this worker set
ROLLOUT_ORDER_SOURCE kv or default

JOB is set on every script; batch hooks set it again alongside the batch vars above.

Event-specific

Event Extra variables
Deploy job_control NEW_ALLOCATIONS, UPDATED_ALLOCATIONS
maand job / jobcontrol job_control TARGET (start, stop, restart, reload, or a Makefile target)

Batching

Worker order comes from rollout_order in KV when the list matches the current worker set; otherwise maand uses catalog order. Override order in pre_deploy or cli with put_rollout_order() — see kv/namespaces.md.

Batch width comes from the job manifest:

Phase Manifest field
First deploy starts and after_allocation_started on start max_concurrent_starts (0 = one batch of all new allocations)
Everything else (upgrades, hooks, health commands, cli, pre_deploy, …) max_concurrent_upgrades (minimum 1)
Event Workers included
pre_deploy, post_deploy, post_build, cli All non-removed allocations (includes disabled)
health_check (commands) All non-removed allocations (includes disabled)
after_allocation_started Workers in the current Makefile rollout batch
after_allocation_stopped Stopped allocations for the job (grouped and batched per job during reconcile)
Deploy job_control All allocated workers (including disabled)
jobcontrol job_control Active workers matching --allocations filter

Manifest health_check probes (tcp / http / ssh) run on active allocations only, in rollout order with max_concurrent_upgrades batch width. Probes are internal maand checks — they do not run user scripts and do not receive batch env vars. health_check commands use non-removed allocations (includes disabled) with the same order and batch width.

Within a batch, all allocations (and all probe checks × workers in that batch) run in parallel. The next batch starts only after the current batch succeeds.

Use acquire_semaphore when parallel allocations must serialize a critical section (migrations, rate-limited APIs). Failures aggregate into a run error listing per-worker errors.

For a single batch run (BATCH_INDEX=0, BATCH_COUNT=1), see guides/hook-one-shot.md. For run-once logic inside a hook script, use is_one_shot() / isOneShot() — see hook-one-shot.md#guard-in-the-script.


Runtime HTTP API

Started by hooks.StartRuntimeAPI(tx) for the lifetime of build/deploy/health_check/hook sessions.

Property Value
Listen address localhost:8080 (not exposed outside the host)
Request body JSON, Content-Type: application/json
Allocation scope Header X-ALLOCATION-ID (required on every route)
Event scope Header EVENT (required; must match the running hook)
Command scope Header COMMAND (required; used by /demands and semaphore scoping)

Embedded maand.py / maand.ts set these headers automatically from env vars.

Endpoint summary

Method Path Purpose
GET /kv Read a key from an allowed namespace
PUT /kv Write a non-secret key under vars/job/<current job> only
DELETE /kv Delete a key under vars/job/<current job> only
PUT /kv/secret Write encrypted secret under secrets/job/<current job>
DELETE /kv/secret Delete secret under secrets/job/<current job>
GET /kv/keys List keys under job-level namespaces
GET /demands List downstream hooks that depend on this job+hook
POST /semaphore/acquire Block until this allocation holds a slot
POST /semaphore/release Release a held slot
GET /semaphore/status?name=... Inspect holders and waiters

KV read vs write

Namespace pattern GET /kv PUT/DELETE /kv PUT/DELETE /kv/secret
maand/bucket, maand/worker, maand/worker/<ip>, tags
vars/bucket, vars/bucket/job/<job>
maand/job/<job>/worker/<ip>
maand/job/<job> key rollout_order only ✓ on pre_deploy or cli
vars/job/<job> (current job)
secrets/job/<job> (current job) ✓ (decrypted)
Upstream demand jobs (maand/job/*, vars/job/*, secrets/job/*) ✓ if declared in manifest demands

Writes to other maand/* keys, vars/bucket/*, and upstream jobs are rejected. Use put_rollout_order / putRolloutOrder (or PUT /kv on maand/job/<job> + key rollout_order) to override rollout order for one deploy; build resets it on the next maand build.

PUT rollout_order body:

{
  "namespace": "maand/job/api",
  "key": "rollout_order",
  "value": "10.0.0.2,10.0.0.1"
}
from maand import put_rollout_order

put_rollout_order(["10.0.0.2", "10.0.0.1"]).raise_for_status()

GET /kv body:

{ "namespace": "vars/job/api", "key": "db_url" }

Response (200):

{ "namespace": "vars/job/api", "key": "db_url", "value": "postgres://..." }

PUT /kv body (value required; optional ttl in seconds, 0 = no expiry):

{ "namespace": "vars/job/api", "key": "db_url", "value": "postgres://...", "ttl": 3600 }

PUT /kv/secret body (optional ttl in seconds):

{ "namespace": "secrets/job/api", "key": "db_password", "value": "plain-text-secret", "ttl": 86400 }

Values are encrypted with AES-256-GCM using secrets/kv.key before storage in maand.db. Expired keys are tombstoned on maand build; maand gc purges tombstones per --retain-days — see kv/persistence.md.

DELETE /kv and /kv/secret use the same JSON body as GET (namespace + key; no value).

GET /kv/keys — optional body { "namespace": "vars/job/api" }. Omit namespace to list both vars/job/<job> and secrets/job/<job> (secret listing returns key names only, never values).

KV writes during health_check

PUT, DELETE, and /kv/secret writes are rejected when the EVENT header is health_check. Health check scripts must be read-only with respect to KV — use pre_deploy or post_deploy to create or update vars and secrets.

GET /demands

Returns hooks whose manifest demands point at this job and this command name (reverse dependency lookup).

Response (200) — array of:

{
  "job": "api",
  "hook": "hook_migrate",
  "demand_config": { "min_version": "2.0.0" }
}

When to use: a shared upstream command (e.g. hook_schema on database) can inspect who depends on it and tailor behavior using demand_config (feature flags, schema versions, etc.).

Semaphores

Coordinate cross-allocation locks inside one hook session. Scoped by job + EVENT + name — the same name under pre_deploy and post_deploy are independent semaphores.

Field Default Limit
capacity 1 1–64
timeout_seconds 600 max 3600

POST /semaphore/acquire body:

{ "name": "migration", "capacity": 1, "timeout_seconds": 600 }

Response (200):

{
  "name": "migration",
  "allocation_id": "<uuid>",
  "capacity": 1,
  "acquired": true
}

POST /semaphore/release body: { "name": "migration" }

GET /semaphore/status?name=migration response:

{
  "name": "migration",
  "capacity": 1,
  "holders": ["<allocation-uuid>"],
  "waiting": 0,
  "available": 0
}

When to use semaphores:

Pattern capacity Example
Single-writer migration 1 Only one allocation runs DDL at a time
Rolling batch N Allow N concurrent restarts against an external API
Leader bootstrap 1 First acquirer writes shared KV, others read

Always release_semaphore in a finally / try/finally block so a failed script does not hold the lock for the rest of the deploy session.

Example (Python):

from maand import acquire_semaphore, release_semaphore, allocation_ip

acquire_semaphore("migrate", capacity=1, timeout_seconds=900).raise_for_status()
try:
    run_migration_for(allocation_ip())
finally:
    release_semaphore("migrate")

Semaphores exist only in memory for the current maand process session — they do not survive CLI restart.


Python and Bun helpers

Embedded maand.py / maand.ts wrap the HTTP API. Prefer these over raw HTTP.

Context (env)

Python Bun Returns
allocation_id() allocationId() ALLOCATION_ID
allocation_ip() allocationIp() ALLOCATION_IP
allocation_index() allocationIndex() ALLOCATION_INDEX
job_name() jobName() JOB
hook_event() hookEvent() EVENT
hook_name() hookName() HOOK
is_allocation_disabled() isAllocationDisabled() DISABLED == "1"
is_one_shot() isOneShot() BATCH_INDEX == "0" && ALLOCATION_INDEX == "0" (leader; run once per invocation)

Aliases: get_allocation_id, get_job, kv_get, etc. (both runtimes).

KV

Python Bun API
get_store_value(ns, key) getStoreValue(ns, key) GET /kvResponse
get_kv_value(ns, key) (parse JSON yourself) GET /kv → plaintext value
put_rollout_order(order) putRolloutOrder(order) PUT /kvmaand/job/<job>/rollout_order
get_rollout_order() getRolloutOrder() GET /kvrollout_order
put_job_variable(key, val) putJobVariable(key, val) PUT /kv
put_job_secret(key, val) putJobSecret(key, val) PUT /kv/secret
delete_job_variable(key) deleteJobVariable(key) DELETE /kv
delete_job_secret(key) deleteJobSecret(key) DELETE /kv/secret
list_job_keys(ns=None) listJobKeys(ns?) GET /kv/keys

Demands and semaphores

Python Bun API
list_hook_demands() listHookDemands() GET /demands
acquire_semaphore(name, capacity=1, timeout_seconds=600) acquireSemaphore(...) POST /semaphore/acquire
release_semaphore(name) releaseSemaphore(name) POST /semaphore/release
semaphore_status(name) semaphoreStatus(name) GET /semaphore/status

Worker SSH (Python only)

Function Purpose
load_ssh() Parse maand.conf(user, key_path, use_sudo)
run_ssh(worker_ip, remote_cmd, ...) Arbitrary remote command over SSH
run_runner_target(target, ...) runner.py <target> --jobs <job> on worker (same as deploy)
run_make_target(target, ...) make -C /opt/worker/<bucket>/jobs/<job> <target>

Bun scripts that need SSH should invoke ssh directly or call a thin Python wrapper script.

Create a venv per job under _hooks (not copied into tmp/ during runs; maand calls the workspace interpreter):

cd workspace/jobs/<job>/_hooks
python3 -m venv .venv
source .venv/bin/activate   # optional for manual work
pip install -r requirements.txt
pip install requests        # required if scripts use maand.py

Maand uses, in order:

  1. workspace/jobs/<job>/_hooks/.venv/bin/python3
  2. workspace/jobs/<job>/_hooks/venv/bin/python3
  3. python3 on your PATH

.venv, venv, node_modules, and __pycache__ are skipped during maand build file indexing.

Bun

Install Bun on the CLI host. Per job:

cd workspace/jobs/<job>/_hooks
bun install

KV persistence by context

Context When KV writes persist to maand.db
maand build End of main build transaction (PersistToTransaction).
post_build hooks Separate session transaction at end of hook pass (failures fail build).
maand hooks Successful CLI commit.
maand deploy kv.PersistSession() after each job's pre_deploy and after each deployJob.
maand health_check KV writes rejected (read-only).

Use pre_deploy to write secrets consumed by .tpl on the same deploy. Full persistence and purge rules: kv/persistence.md. Namespace keys: kv/namespaces.md.


HTTP API errors

HTTP Message Typical cause
400 X-ALLOCATION-ID header is missing Raw HTTP call without header
404 Invalid allocation ID Stale or wrong allocation UUID
400 Both namespace and key are required Incomplete JSON body
400 Invalid or unauthorized namespace Write to read-only namespace, wrong job, or upstream not in demands
404 KV get operation failed Key does not exist
400 KV writes are not allowed during health_check PUT/DELETE during health_check event
408 Timed out waiting for semaphore timeout_seconds elapsed
409 Semaphore acquire or release failed Release without hold, or internal conflict
415 Content-Type must be application/json Missing or wrong content type
400 Invalid JSON format Malformed request body

Check logs/<worker_ip>.log and CLI output when --verbose is set.