Host your own email and enhance your privacy
ecs-reconfigureFired by Healthchecks when the ecs-reconfigure check has been silent past its 30-minute grace beyond the 30-minute expected cadence.
The reconfigure.sh loop in one or more mail-tier containers stopped pinging. This script regenerates sendmail maps, virtusertable, DKIM tables, and OS users from DynamoDB and Cognito; it runs continuously inside each mail-tier container, polling SQS and falling back to a 15-minute timer.
A missed ping means at least one mail-tier container is no longer reconciling configuration. New addresses created via the admin app may not be receiving mail; revoked addresses may still be deliverable.
The label embedded in the alert summary tells you which check was missed, but Cabalmail registers a single ecs-reconfigure check, so you don’t know from the alert alone which of the three mail tiers (imap, smtp-in, smtp-out) stopped pinging — the loop runs in all three. Address-management correctness is the user-visible impact.
for tier in imap smtp-in smtp-out; do
echo "=== $tier ==="
TASK=$(aws ecs list-tasks --cluster <cluster> --service-name cabal-$tier --query 'taskArns[0]' --output text)
aws ecs execute-command --cluster <cluster> --task "$TASK" --container $tier --interactive \
--command "/bin/sh -c 'pgrep -af reconfigure || echo no-reconfigure-process'"
done
If a tier shows “no-reconfigure-process”, supervisord stopped restarting the loop — examine the container’s supervisord log.
aws sqs get-queue-attributes --queue-url <reconfigure-queue-url> \
--attribute-names ApproximateNumberOfMessages,ApproximateNumberOfMessagesNotVisible
A growing visible count or stuck-invisible count means the loop is consuming but failing partway through (likely DynamoDB throttling — see dynamodb-throttling.md).
/cabal/healthcheck_ping_ecs_reconfigure was rotated but only one task was force-redeployed, the others are still using the old value at task-start cache. Force-redeploy the rest:
for tier in imap smtp-in smtp-out; do
aws ecs update-service --cluster <cluster> --service cabal-$tier --force-new-deployment
done
aws logs tail /ecs/cabal-imap --since 1h --filter-pattern reconfigure.aws_cognito_*_throttles in Prometheus.critical — address-management correctness is core functionality. Page-level urgency, but mail itself continues to flow on the previous map state.