Cabalmail

Host your own email and enhance your privacy

View the Project on GitHub cabalmail/cabal-infra

Runbook: EFSBurstCreditsLow

Fired by Prometheus rule EFSBurstCreditsLowBurstCreditBalance below 20% of baseline for 1 h.

What this means

The EFS file system is on bursting throughput mode and has spent down its accumulated credits. Once credits hit zero, EFS throttles I/O to the file system’s baseline rate (which scales with stored size — small file systems get very little baseline).

For Cabalmail, EFS holds:

Who/what is impacted

When credits run out:

First three things to check

  1. Are we genuinely burning through credits, or is this a slow drain?
    aws cloudwatch get-metric-statistics --namespace AWS/EFS --metric-name BurstCreditBalance \
      --dimensions Name=FileSystemId,Value=<fs-id> \
      --start-time $(date -u -v-24H +%FT%TZ) --end-time $(date -u +%FT%TZ) \
      --period 300 --statistics Average
    

    A steep drop in the last hour points to a runaway process. A linear drift over days means baseline > burst earnings — file system needs to grow or move to elastic mode.

  2. Who’s driving I/O? Check MeteredIOBytes per access point in the EFS console. If /uptime-kuma or /prometheus is dominating, the monitoring stack is the culprit (often Kuma writing 1-second probe results to SQLite — bump the monitor interval). If /home dominates, look for a stuck procmail or a brute-force on IMAP creating lots of failed-auth log writes.
  3. Is the file system size very small? Bursting throughput baseline = file-system size × 50 KB/s. A 5 GB file system gets 250 KB/s baseline — easy to overrun. Either store dummy ballast or migrate to elastic throughput.

Escalation