Tutorial: Day-2 operations
After the guided tour or quickstart, use these patterns for everyday cluster operations. Assumes a working bucket with deployed jobs.
Inspect catalog state
Quick summary:
maand info
Detailed tables:
maand cat workers
maand cat jobs
maand cat allocations
maand cat hooks
maand cat ports
maand cat certs
maand cat kv
Filter allocations:
maand cat allocations --jobs api,worker
maand cat allocations --workers 10.0.0.1
Check TLS expiry (CA + job leaf certs). CA expired blocks maand build; expiring prints a stderr warning:
maand cat certs
maand cat certs --jobs api,postgres
See certs.md.
Read one KV key:
maand cat kv get maand/job/api job_name
Manual job control
maand job runs Makefile targets (or job_control commands) on workers. It verifies each worker’s worker.json matches the database — run maand deploy first if you see sync errors.
maand job restart api
maand job run api --target reload
maand job stop api --allocations 10.0.0.2
maand job start api --health_check
maand job run api --target migrate
maand job status api
Use maand job run --target reload after maand deploy --sync-only when you pushed config to disk but want the process to pick it up yourself.
| vs deploy | maand deploy |
maand job |
|---|---|---|
| When | Catalog or job files changed | Ops / one-off lifecycle |
| Sync check | No | Yes (update_seq) |
| Hash skip | Yes | No |
See job.md.
Health checks
Each job may use manifest probes, a custom command, or both (probes run first):
Option A — manifest probes (tcp/http/ssh) in manifest.json — see health-check.md.
Option B — custom command:
"hooks": {
"hook_health": {
"executed_on": ["health_check"]
}
}
Add workspace/jobs/api/_hooks/hook_health.py (see hook-api.md).
Run checks:
maand health_check
maand health_check --jobs api --wait --verbose
Redeploy after fixing health:
maand deploy --jobs api
Force a full redeploy without workspace changes:
maand deploy --force --jobs api
Deploy runs health checks automatically after restart when health is configured.
See health-check.md.
Prometheus monitoring
Add _prometheus/ under each job that exposes metrics (see prometheus.md):
workspace/jobs/api/_prometheus/
├── scrape.yaml # optional
├── alerts/ # optional
├── runbooks/ # optional
└── dashboards/ # optional
After adding or changing _prometheus/ content:
maand build
maand deploy --jobs api,... # app jobs first
maand deploy --jobs prometheus # assemble rules, runbooks, dashboards, scrape config
Console pages: runbooks at /consoles/runbooks/..., dashboards at /consoles/dashboards/... — see prometheus.md.
Ad-hoc commands on workers
maand collect facts probes host memory and CPU. Redirect with --generate-workers to update workspace/workers.json:
maand collect facts
maand collect facts --generate-workers > workspace/workers.json
maand build
See collect.md.
maand run_command runs shell on workers (not job workspaces):
maand run_command "uptime"
maand run_command "df -h /opt/worker" --workers 10.0.0.1,10.0.0.2
maand run_command "hostname" --labels worker --concurrency 4
maand run_command "systemctl status myservice" --health_check
Host needs bash and ssh; workers need bash and timeout.
See run-command.md.
Disable an allocation temporarily
See disable and drain for the full guide (per-allocation, per-job, per-worker, re-enable).
Create or edit workspace/disabled.json:
{
"jobs": {
"api": {
"allocations": ["10.0.0.2"]
}
}
}
Disable every job on a worker:
{
"workers": ["10.0.0.3"]
}
Disable an entire job everywhere:
{
"jobs": {
"api": {}
}
}
Then:
maand build
maand deploy
Disabled allocations are skipped for start/restart/reload/rsync; deploy stops them if running and keeps artifacts and KV. Re-enable: clear disabled.json, maand build, maand deploy.
Remove a worker or job
- Remove the host from
workers.jsonor deleteworkspace/jobs/<name>/ maand build— marks related allocationsremoved = 1maand deploy— stops jobs, removes deployed job files (keepsdata/andlogs/on workers)maand gc— deletes workerdata//logs//bin/, allocation rows, and KV references
# after editing workspace
maand build
maand deploy
maand gc
maand gc --retain-days 7 # keep deleted KV history longer
See gc.md.
Partial deploy and dry-run
Check whether deploy would change anything:
maand deploy --dry-run
Deploy only specific jobs (still ordered by deployment_seq):
maand deploy --jobs api,worker
If deploy fails partway, fix the issue and re-run — hash tracking resumes unchanged allocations.
Force redeploy when content is already promoted:
maand deploy --force --jobs api
maand deploy --dry-run --force # preview
Push files without lifecycle (rsync + promote only; fails when any allocation still needs start):
maand deploy --sync-only --jobs api
maand deploy --dry-run --sync-only --jobs api # preview sync actions
For ongoing config-only rollouts, prefer restart_policy: reload in the manifest — see Applying changes on workers.
See deploy.md.
Per-job config overrides
Optional workspace/bucket.jobs.conf:
[api]
memory = "512 mb"
If maand.conf sets job_config_selector = "prod", use bucket.jobs.prod.conf instead.
After editing:
maand build
maand deploy -b
Upgrade maand schema
When upgrading the maand binary:
maand init # applies DB migrations, keeps bucket_id and CA
maand build
maand deploy
Rolling upgrades
See rolling-deploy for max_concurrent_upgrades, version-only deploys, and rolling worker reboots.
Troubleshooting checklist
See debugging-deploy.md for a full deploy troubleshooting guide.
| Symptom | Likely fix |
|---|---|
worker.json / update_seq mismatch |
maand deploy |
| Host prerequisite error | Install ssh/rsync/python3/bun on CLI host |
| Worker prerequisite error | Install make/python3/rsync on worker; fix sudo |
| No allocations for job | Check selectors vs worker labels; run maand build |
| Build resource error | Add memory/cpu to workers or lower job limits |
| Port collision | Remove duplicate port names; maand assigns unique numbers from the pool |
ErrPortRangeExhausted |
Widen port_range in bucket.conf or remove unused jobs/ports |
ErrInvalidJobVersion |
Add or fix version on jobs in the dependency graph |
ErrHookDemandVersionMismatch |
Bump upstream job version or relax min_version/max_version |
| Upgrade script needs old/new release | Read CURRENT_VERSION / NEW_VERSION in Makefile or hook env — deploy.md |
Concept reference: concepts.md