TLS certificates

Maand generates a bucket certificate authority (CA) at init and per-job TLS certificates at build time. Material is stored in the KV store, then copied to workers during deploy under each job’s certs/ directory.

Related: configuration.md · build.md · deploy.md · KV persistence


Overview

maand init
  └── secrets/ca.crt, ca.key          (bucket CA, long-lived)

maand build
  └── BuildCerts
        ├── signs leaf certs with CA
        └── KV: maand/job/<job>/worker/<ip>/certs/<name>.{crt,key}
        └── KV: maand/worker/certs/ca.crt   (CA PEM for all workers)

maand deploy
  └── tmp/workers/<ip>/jobs/<job>/certs/
        ├── <name>.crt, <name>.key
        └── ca.crt
  └── rsync → /opt/worker/<bucket_id>/jobs/<job>/certs/
Layer Location Purpose
Bucket CA <bucket>/secrets/ca.crt, ca.key Signs all job leaf certificates
Build / KV maand/job/<job>/worker/<ip>/certs/* Canonical PEM storage between build and deploy
Worker disk /opt/worker/<bucket_id>/jobs/<job>/certs/ What your app/Makefile reads at runtime

Certificates are not checked into the workspace. Declare them in manifest.json; maand generates keys and PEMs.


Bucket CA (maand init)

On first maand init, maand creates:

File Description
secrets/ca.crt Self-signed CA certificate
secrets/ca.key CA private key (PKCS#1 RSA)

Defaults:

If only one of ca.crt / ca.key exists, init fails — both must be present or both absent.

The CA PEM is copied into KV at maand/worker/certs/ca.crt on every build so deploy can place it on each worker.

CA expiry: warning vs error

During maand build, maand reads secrets/ca.crt and evaluates expiry before BuildCerts runs. maand cat certs uses the same certs_renewal_buffer from maand.conf for the CA status column.

CA state maand cat certs status maand build
now > NotAfter expired Failsbucket CA certificate (secrets/ca.crt) expired on <RFC3339>
Within certs_renewal_buffer days of expiry (and not yet expired) expiring Warning on stderr — build continues
Invalid or unreadable PEM invalid Warning on stderr — build continues
Outside the renewal window ok No message

Expiring warning (stderr):

warning: bucket CA certificate (secrets/ca.crt) expires in 5 days on 2026-07-09T12:00:00Z (certs_renewal_buffer=10 days)

Expired error (build aborts before leaf cert generation):

bucket CA certificate (secrets/ca.crt) expired on 2026-06-01T12:00:00Z

Invalid CA warning:

warning: bucket CA certificate (secrets/ca.crt) is invalid or unreadable

Renewal window (same as leaf certs):

expiring when:  now >= NotAfter - (certs_renewal_buffer days)
expired when:   now > NotAfter

When certs_renewal_buffer is 0, there is no expiring phase — the CA stays ok until NotAfter, then maand build fails with the expired error. maand cat certs shows the same transition.

Unlike leaf certificates, the bucket CA is never auto-regenerated at build time. A warning means plan manual rotation — replace secrets/ca.crt and ca.key, then maand build and maand deploy.

Inspect first:

maand cat certs    # scope=ca row shows not_after, days_left, status

Rotating the CA

Replacing secrets/ca.crt or secrets/ca.key changes the CA file hash. The next maand build regenerates all job leaf certificates signed by the new CA (after the CA expiry check passes).

If the old CA is expired, maand build fails until you install a new CA pair. If it is only expiring, build prints a warning but still runs — rotate before NotAfter to avoid downtime.

# Backup old CA, install new pair under secrets/, then:
maand build
maand deploy    # push new leaf certs + ca.crt to workers

Plan a rolling maand deploy so every allocation picks up the new trust chain. Applications must trust the new ca.crt.


Job manifest (certs)

Add a certs map to workspace/jobs/<job>/manifest.json. Each key is a cert name (used for filenames and KV keys).

{
  "selectors": ["worker"],
  "certs": {
    "tls": {
      "pkcs8": false,
      "one": false,
      "subject": { "common_name": "api.internal" }
    },
    "client": {
      "pkcs8": true,
      "one": true,
      "subject": { "common_name": "api-cluster" }
    }
  }
}
Field Default Meaning
subject.common_name (required) X.509 subject CN
pkcs8 false true → private key PEM type PRIVATE KEY (PKCS#8); falseRSA PRIVATE KEY (PKCS#1)
one false trueone shared cert/key pair copied to every allocation; falseper-worker cert (SANs include that worker’s IP)

Per-worker vs shared (one)

one Behavior
false (default) Each allocation gets its own key pair. Leaf cert IP SANs: 127.0.0.1 + that worker’s IP.
true One cert is generated (using the first worker in the allocation set) and the same .crt/.key PEMs are written to every worker namespace. SANs include 127.0.0.1 and all allocated worker IPs. Use for cluster-wide identities (e.g. mutual TLS where all nodes present the same cert).

Certs are generated for all non-removed allocations (active and disabled). Disabled allocations still receive KV material on build; deploy stages it without starting the process.

Removing certs

Delete the certs block from the manifest (or remove a name). The next maand build purges matching certs/* keys from affected allocation namespaces.


Bucket configuration (maand.conf)

Leaf certificate lifetime and renewal are controlled in maand.conf at the bucket root:

ssh_user = "agent"
ssh_key = "worker.key"
certs_ttl = 60
certs_renewal_buffer = 10
Field Default in code Purpose
certs_ttl 60 (days) if unset or 0 Validity period for newly generated leaf certificates
certs_renewal_buffer 0 if unset Regenerate a stored cert when within this many days of NotAfter (0 = only after expiry)

Auto-rotation at build time

Rotation is not a background daemon. maand build checks each stored leaf cert and regenerates when:

  1. Missing — no certs/<name>.crt or .key in KV
  2. CA changedsecrets/ca.crt hash differs from last promoted build
  3. Manifest changedcerts section hash differs (build_certs namespace)
  4. Expiring soonnow >= NotAfter - certs_renewal_buffer (see below)

If none apply, existing PEMs are reused (repeat builds without config changes do not churn certificates).

Renewal buffer math

regenerate when:  now >= cert.NotAfter - (certs_renewal_buffer days)
                 (or now >= NotAfter when buffer is 0)
certs_renewal_buffer Effect
10 Regenerate when within 10 days of expiry (recommended for scheduled builds)
0 Regenerate only after NotAfter has passed

If certs_renewal_buffercerts_ttl, every leaf cert is always in the renewal window (regenerated on each maand build).

Set both values explicitly in maand.conf. Example from configuration.md:

certs_ttl = 60
certs_renewal_buffer = 10

Operational rotation workflow

Schedule regular maand build (cron/CI). When the renewal buffer triggers, build writes new PEMs to KV. maand deploy then rsyncs them to workers.

maand build
maand deploy --dry-run   # optional: see if rollout needed
maand deploy

After deploy, apps read updated files from:

/opt/worker/<bucket_id>/jobs/<job>/certs/<name>.crt
/opt/worker/<bucket_id>/jobs/<job>/certs/<name>.key
/opt/worker/<bucket_id>/jobs/<job>/certs/ca.crt

Restart or reload the job if it caches TLS material (e.g. make restart on the next deploy wave, or your own hook).

Prometheus metrics (optional)

When a prometheus job ships prometheus.yml or prometheus.yml.tpl, maand deploy (after commit) pushes certificate expiry gauges to Prometheus remote write (/api/v1/write). Push runs only at deploy, not at maand build. Failures are best-effort — logged, deploy still succeeds. Transient failures (503, other 5xx, network errors) are retried up to 5 times with backoff (2s–15s) because Prometheus may not accept remote write immediately after rollout.

Metric Meaning
maand_cert_not_after_seconds Unix timestamp of certificate NotAfter
maand_cert_expiring 1 when within certs_renewal_buffer of expiry (same window as build renewal)
maand_cert_expired 1 when past NotAfter

Labels: scope (ca or job), job, worker, cert, common_name, status.

Remote write URL — auto-discovered from the prometheus job in the workspace: first non-removed allocation IP and prometheus_port_httphttp://<worker>:<port>/api/v1/write. No maand.conf setting; if there is no prometheus job with server config, push is skipped.

Authentication — when secrets/job/prometheus contains admin_username and admin_password (same credentials used for the Prometheus web UI), maand sends HTTP Basic auth on remote write. If neither key exists, push is unauthenticated (legacy). If only one is set, push fails with a log message. Set secrets via pre_deploy / hooks or inspect with:

maand cat kv get secrets/job/prometheus admin_username --reveal
maand cat kv get secrets/job/prometheus admin_password --reveal

Prometheus must run with --web.enable-remote-write-receiver. Deploy also writes embedded cert alert rules to rules/maand/certs.yaml on the prometheus worker. See prometheus.md.


Where certificates live

KV (after build)

Namespace Keys
maand/worker certs/ca.crt
maand/job/<job>/worker/<ip> certs/<name>.crt, certs/<name>.key

Inspect:

maand cat certs --jobs api
maand cat certs --workers 10.0.0.1
maand cat kv get maand/job/api/worker/10.0.0.1 certs/tls.crt   # raw PEM (truncated in list)

Hooks and templates may read PEMs from the allocation namespace (prefer file paths on the worker for large certs in production).

Worker filesystem (after deploy)

During deploy staging, updateCerts writes:

tmp/workers/<ip>/jobs/<job>/certs/
├── <name>.crt    # mode 0644
├── <name>.key    # mode 0600
└── ca.crt

The tree is rsynced with the rest of the job. BuildJobAllocationVariables runs after BuildCerts so cert sync does not delete allocation metadata keys (*_allocation_index, peer_workers).


Using certs in your job

Makefile / application

Point your service at the deploy path (values also available as template fields .JobPath and .BucketPath):

TLS_CERT := $(CURDIR)/certs/tls.crt
TLS_KEY  := $(CURDIR)/certs/tls.key
TLS_CA   := $(CURDIR)/certs/ca.crt

CURDIR is the job directory on the worker when make start runs via runner.py.

Templates

Prefer filesystem paths in rendered config:

{
  "tls_cert": "{{ .JobPath }}/certs/tls.crt",
  "tls_key": "{{ .JobPath }}/certs/tls.key",
  "tls_ca": "{{ .JobPath }}/certs/ca.crt"
}

See templates.md.

Hooks

Read from the runtime KV API or use paths after deploy. Namespace: maand/job/<job>/worker/<worker_ip>.


Examples

Single service TLS (per worker)

{
  "selectors": ["worker"],
  "certs": {
    "tls": {
      "subject": { "common_name": "api.example.com" }
    }
  }
}

Each worker gets a distinct key; cert includes that host’s IP and 127.0.0.1.

Shared cluster cert

{
  "certs": {
    "internal": {
      "one": true,
      "pkcs8": true,
      "subject": { "common_name": "mycluster" }
    }
  }
}

All allocations share identical PEMs in KV and on disk.

Stricter renewal

# maand.conf
certs_ttl = 90
certs_renewal_buffer = 30

Build regenerates leaf certs in the last 30 days of their 90-day life. Pair with weekly maand build && maand deploy.


Inspecting certificates (maand cat certs)

List the bucket CA and every job leaf cert stored in KV with expiration and renewal status:

maand cat certs
maand cat certs --jobs api,postgres
maand cat certs --workers 10.0.0.1
Column Meaning
scope ca (bucket CA in secrets/) or job (leaf cert in KV)
job / worker Allocation (blank for CA)
cert Cert name from manifest (tls, client, …) or ca
common_name X.509 subject CN
not_after Certificate expiry (UTC)
days_left Whole days until expiry (negative if expired)
status ok, expiring (within certs_renewal_buffer of NotAfter), expired, or invalid

For the bucket CA (scope=ca), expiring triggers a maand build stderr warning; expired fails build. See CA expiry: warning vs error.


Troubleshooting

Symptom Likely cause
maand build fails: bucket CA certificate … expired on Replace secrets/ca.crt / ca.key, then build and deploy — CA expiry
maand build stderr: warning: bucket CA certificate … expires in CA within certs_renewal_buffer — plan rotation before NotAfter
No certs/ on worker Job not deployed, or manifest has no certs block
Cert unchanged across builds Still valid and outside renewal buffer; expected
All certs regenerated CA file changed, manifest certs edited, or cert entered renewal window
Build OK, worker has old cert Run maand deploy after build
ErrInvalidManifest on cert Invalid subject JSON
App rejects peer cert Worker still using old ca.crt — redeploy after CA rotation; check maand cat certs
Disabled allocation missing certs in KV Run maand build — disabled rows still get cert material