Host your own email and enhance your privacy
Today, Cabalmail’s Terraform state lives in the cabal-tf-backend S3 bucket. The bucket has SSE-S3 enabled at the bucket level (AWS default since 2023), so the state file is encrypted at rest — but any IAM principal with s3:GetObject on the bucket can read fully-decrypted state. This is the standard backend posture, and it has been adequate while the only secrets in state were resource ARNs and IDs.
The 0.7.0 monitoring work surfaced a concrete case where this posture starts to chafe: the alert_sink Lambda needs Pushover credentials and an ntfy publisher token. The Phase 1 implementation works around the issue by writing placeholder values via Terraform and using ignore_changes = [value] so the operator can aws ssm put-parameter the real values out-of-band. That keeps secrets out of state, but at the cost of:
terraform.tfstate claims a value the operator immediately overwrote.Terraform 1.10 (Nov 2024) added first-class state and plan file encryption via a top-level encryption block. With KMS-backed encryption enabled, S3 read access alone is no longer sufficient to read secret values from state — the reader also needs kms:Decrypt on the configured key. That changes the calculus enough that we can comfortably manage secrets through Terraform.
This plan migrates both Terraform stacks (terraform/dns, terraform/infra) to encrypted state, then folds the Phase 1 monitoring secrets into the standard pattern.
kms:Decrypt on those keys.aws ssm put-parameter.terraform.tfvars files generated by CI. They are written to a runner’s working directory, not persisted; the protection comes from masking the underlying GitHub secrets, not from file-level encryption.cabal-tf-backend S3 bucket. make-terraform.sh writes a backend block with bucket, key, region only — no encrypt, no kms_key_id, no dynamodb_table (state locking).dev-bootstrap, stage-bootstrap, prod-bootstrap (DNS stack); dev, stage, prod (infra stack).>= 1.1.2 in module versions.tf files; CI uses hashicorp/setup-terraform@v2 without a version, which resolves to latest stable. Both are compatible with the 1.10 encryption block once we bump the floor.One CMK per environment, one alias per stack/environment combination:
| Environment | Key alias | Purpose |
|---|---|---|
| dev | alias/cabal-tf-state-dev |
Encrypts dev infra + DNS state |
| stage | alias/cabal-tf-state-stage |
Encrypts stage state |
| prod | alias/cabal-tf-state-prod |
Encrypts prod state |
One key per environment (not per stack) keeps the surface small. infra and dns for the same environment share a key; cross-environment isolation is preserved.
Key policy: deploy principal gets Encrypt, Decrypt, GenerateDataKey, DescribeKey. Account root keeps full admin (per AWS best practice — never lock yourself out of your own key). Deletion window: 30 days (max). Automatic rotation: on (annual). Multi-region: false (state lives in one region).
Update make-terraform.sh to emit:
terraform {
backend "s3" {
bucket = "cabal-tf-backend"
key = "<env>"
region = "<region>"
encrypt = true
kms_key_id = "alias/cabal-tf-state-<env>"
use_lockfile = true # S3-native locking, GA in TF 1.10; no DynamoDB needed.
}
}
encrypt = true + kms_key_id is independent of the client-side encryption block — they protect different layers (transit/at-rest in the S3 store vs. the file payload itself). Both should be on.
A top-level encryption block per stack root (terraform/dns/main.tf and terraform/infra/main.tf):
terraform {
encryption {
key_provider "aws_kms" "state" {
kms_key_id = "alias/cabal-tf-state-<env>"
region = "<region>"
key_spec = "AES_256"
}
method "aes_gcm" "state" {
keys = key_provider.aws_kms.state
}
state {
method = method.aes_gcm.state
enforced = true
}
plan {
method = method.aes_gcm.state
enforced = true
}
}
}
enforced = true is a one-way door: once set, every operator who runs terraform plan or terraform apply against this stack must have kms:Decrypt on the key. For our CI-only apply model with a single deploy principal, that is fine — and is the whole point. The migration step below uses enforced = false exactly once per stack/env, then flips it on.
The <env> and <region> placeholders mean the encryption block has to be templated like the backend block. Extend make-terraform.sh to write it alongside backend.tf, or inline both into a single generated _generated.tf file.
Bump every versions.tf’s required_version to >= 1.10. Pin setup-terraform to terraform_version: "~1.10" in the workflows so a future TF 2.x release doesn’t surprise us.
Per stack (dns, then infra) per environment (dev → stage → prod):
terraform/state-keys/ (or a one-shot aws kms create-key via the console — fine for a one-time bootstrap) creates the three CMKs and aliases. Output the key ARNs to a non-secret file (docs/0.9.0/key-arns.md) for reference.versions.tf floors and setup-terraform versions in CI. Confirm terraform plan still no-ops on every environment.make-terraform.sh. The next terraform init migrates the state file (S3 PutObject with SSE-KMS). Test on dev first; this is reversible by stripping the lines and re-init’ing with -migrate-state.Add the encryption block with enforced = false and a one-shot migration block.
terraform {
encryption {
# ... key_provider, method as above ...
state {
method = method.aes_gcm.state
# No enforced flag yet.
}
state {
unencrypted = true
}
}
}
On the next apply, TF reads the still-unencrypted state and writes encrypted. One apply per stack per environment. Confirm by downloading the state file from S3 and observing it is now an opaque blob with "encryption": {...} metadata at the top.
state { unencrypted = true } block; flip enforced = true. From this point forward, anyone without kms:Decrypt cannot read state. Same for plan files.ignore_changes = [value] and the placeholder strings on aws_ssm_parameter.pushover_user_key, aws_ssm_parameter.pushover_app_token, aws_ssm_parameter.ntfy_publisher_token.variable "pushover_user_key", variable "pushover_app_token", variable "ntfy_publisher_token" to the monitoring module and the root, all sensitive = true.secrets.PUSHOVER_USER_KEY etc.), not vars. Set them as TF_VAR_* env on the apply step rather than writing them to terraform.tfvars on disk.aws ssm put-parameter, no terraform taint.dev first, end-to-end, with the migration steps spaced out by at least one CI run each so that any breakage shows up cheaply. Then stage, then prod. The whole sequence per stack should fit in one PR per environment if we want clean rollback boundaries; bundling all three is also acceptable once we’ve done dev.
| Step | Rollback |
|---|---|
| KMS bootstrap | Disable + schedule deletion (30-day window). |
| TF version bump | Revert the workflow + versions.tf change; no state implications. |
Backend encrypt + kms_key_id |
Remove the lines, run terraform init -migrate-state. New state writes drop SSE-KMS. |
| Client-side encryption (migration apply) | Restore the prior state version from S3 versioning + remove the encryption block. |
enforced = true |
Revert to enforced = false and re-add state { unencrypted = true } migration block; one apply restores readability with the old toolchain. |
| Secret-management switch | Revert the ignore_changes removal; re-add placeholders. The real values are still in SSM; nothing breaks at runtime. |
Workflows (terraform.yml, bootstrap.yml):
kms:Encrypt, kms:Decrypt, kms:GenerateDataKey, kms:DescribeKey on the per-environment CMK. Either inline that into the existing deploy policy or add a kms policy attachment.PUSHOVER_USER_KEY, PUSHOVER_APP_TOKEN, NTFY_PUBLISHER_TOKEN (per environment).In the apply job (and plan if we want secret-aware diffs there), pass them through as env vars rather than writing to terraform.tfvars:
- name: apply-terraform
env:
TF_VAR_PUSHOVER_USER_KEY: $
TF_VAR_PUSHOVER_APP_TOKEN: $
TF_VAR_NTFY_PUBLISHER_TOKEN: $
run: terraform apply ...
GitHub auto-masks secret values in logs. They never touch the runner’s filesystem.
terraform plan to confirm it still reflects reality.aws:kms ServerSideEncryption in S3 console) and opaque when downloaded directly (no readable JSON, no plaintext secrets).s3:GetObject but without kms:Decrypt returns access-denied on the underlying object.terraform plan still produces no diff in steady state.PUSHOVER_APP_TOKEN secret in the prod GitHub environment, re-run the terraform workflow, observe the SSM SecureString updated; trigger a Kuma test alert and confirm Pushover delivery still works.aws ssm put-parameter instructions are reduced to the ntfy first-boot bootstrap only.plan-terraform.sh produces a plan output; with plan { enforced = true }, the artifact uploaded between the plan and apply jobs is encrypted at rest (already true in GitHub Actions) and additionally encrypted client-side. The apply job needs kms:Decrypt to consume it — confirm the existing apply principal has it.encryption block syntax is identical, but the key provider names differ slightly (aws_kms is the same). One-day spike to confirm.