Guides

Task-oriented documentation. For schemas and flags, follow links into reference/.

Guide When to read
day-2-ops.md Job control, health, GC, upgrades
rolling-deploy.md Batch sizes, rollout_order, rolling upgrades
disable-and-drain.md Pause workers, jobs, or single allocations
debugging-deploy.md Skipped deploys, dry-run, partial rollouts, logs
hooks-tutorial.md Python/Bun hooks walkthrough
hook-one-shot.md Single-batch cli runs (BATCH_COUNT=1)
prometheus.md _prometheus/ scrape, alerts, runbooks, dashboards
worker-reboot.md Host OS reboot without losing catalog state