Cabalmail

Host your own email and enhance your privacy

View the Project on GitHub cabalmail/cabal-infra

Cabalmail alert runbooks

Short on-call runbooks for every alert in docs/0.7.0/monitoring-plan.md Phases 1-3. Each file follows the same shape:

  1. What this means — what condition fired the alert.
  2. Who/what is impacted — user-visible effect.
  3. First three things to check — start here on a page.
  4. Escalation — what to do if the first three don’t resolve.

When a Pushover or ntfy push includes a Runbook: link, it points to one of these files on main.

Sources of alerts

Source Routing Phase
Uptime Kuma monitors Webhook → alert_sink Lambda 1
Self-hosted Healthchecks Webhook → alert_sink Lambda 2
Prometheus rules → Alertmanager Webhook → alert_sink Lambda 3
alert_sink Lambda Pushover (critical) + ntfy (critical + warning) 1

Index

Probe failures (Kuma TCP/HTTP, Prometheus blackbox)

AWS service alerts (Prometheus rules over cloudwatch_exporter)

Host alerts (Prometheus rules over node_exporter)

Log-derived alerts (Prometheus rules over CloudWatch metric filters)

Heartbeats (missed Healthchecks pings)

After the alert resolves

The plan’s tuning discipline applies: after every page, record on the corresponding GitHub issue (or open one) whether the threshold was right, too sensitive, or too loose. Thresholds live in code:

Aim for zero false pages in a typical week (see monitoring-plan.md § “Tuning discipline”). If a runbook’s “first three things” never apply, fix the runbook in the same PR that fixes the alert.