Host your own email and enhance your privacy
Fired by Prometheus rule NodeHighMemory — host memory used (1 - MemAvailable/MemTotal) above 85% for 15 min, sourced from the node_exporter daemon.
A specific EC2 instance in the ECS cluster is at memory pressure. MemAvailable accounts for reclaimable page cache, so this is “real” pressure — when it crosses 85% the kernel is starting to evict, and OOM-kills follow.
The label instance identifies the host.
Memory-bound services on that host get OOM-killed first (Killed exit code from the kernel). For Cabalmail the memory-hungry services are:
Mail tiers (Dovecot, Sendmail) are I/O- and CPU-bound, not memory-bound, and rarely contribute.
INSTANCE_ID=$(aws ec2 describe-instances --filters Name=private-ip-address,Values=<host-ip> --query 'Reservations[0].Instances[0].InstanceId' --output text)
aws ssm start-session --target "$INSTANCE_ID"
docker stats --no-stream --format "table \t\t"
journalctl -k --since '1 hour ago' | grep -i 'killed process\|oom'
If yes, ECS will already be restarting the task; cross-reference with container-restart-loop.md.
memory and memoryReservation. A task without a memory hard limit can starve neighbors:
aws ecs describe-task-definition --task-definition <family> --query 'taskDefinition.containerDefinitions[].{name:name,reservation:memoryReservation,limit:memory}'
t3.small; the monitoring stack does not at scale.warning severity. Sustained pressure escalates to a critical via container restart loops once OOM-killing kicks in.