Cabalmail

Host your own email and enhance your privacy

View the Project on GitHub cabalmail/cabal-infra

Application Surface Hardening Plan

Context

The Lambda API surface (lambda/api/*/function.py + lambda/api/_shared/helper.py) has grown organically from the 0.2.x admin-app split. Each endpoint was added in isolation and inherits a thin slice of validation from upstream callers (the React admin app, the Apple client) rather than enforcing its own. That worked while the only client was the in-house React app and the only writers were the project owner; it does not generalise to “Cabalmail is now someone’s primary mailbox” and “anyone with a Cognito account can issue raw IMAP-shaped requests.”

This plan is the application-layer counterpart to iac-quality-gates-plan.md: scanners will catch the IaC posture, but Python code that calls IMAPClient.search(raw_query) with attacker-controlled input never lights up Checkov. The findings here are the result of an audit pass across every handler under lambda/api/. They cluster naturally into five themes, addressed in five phases. Each phase is a candidate PR or small PR set, independently shippable.

The themes:

  1. Inbound XML safety on /process_dmarc. The DMARC report ingestor parses attacker-controlled XML/zip/gzip from arbitrary external senders with the stdlib xml.etree.ElementTree and no decompression cap. This is the single highest-leverage finding in the audit.
  2. Outbound message integrity on /send. Header values are written straight into EmailMessage and the resulting object — BCC field and all — is append()-ed to the user’s Outbox before SMTP submission, so every BCC recipient is permanently visible in Sent. Header injection via subject/from/in-reply-to/references is also poorly bounded.
  3. Input validation on the IMAP-shaped endpoints (/search, /list_messages, /set_flag, /move_messages, /list_envelopes, /fetch_inline_image). Folder names, flag tokens, UIDs, sort criteria, search expressions, and S3-keyed indices flow from query strings and bodies into IMAP commands and S3 keys with no whitelist. Most are not exploitable today because the IMAP master-user model scopes operations to the caller’s mailbox, but they are footguns one shape change away from real bugs.
  4. DNS-touching endpoints (/new, /revoke, /new_address_admin, /repair_dns_record, /check_dns_record, /fetch_bimi). Subdomain and apex names from the request body flow into Route 53 ChangeResourceRecordSets calls and dns.resolver queries with neither shape validation nor a server-side guard that the zone-ID-to-domain mapping actually matches.
  5. Per-endpoint abuse limits. API Gateway’s global throttle (100/50 rps) is the only rate limit. Admin-only mutations (/delete_user, /disable_user, /enable_user, /set_user_domain_access) have no per-caller ceiling. /process_dmarc and /list do unbounded DynamoDB scans. The pre-signed /upload_url window is 10 minutes — generous for the attacker if the URL leaks.

Goals

Non-goals

Current state (audit)

Inbound XML — /process_dmarc

Outbound message integrity — /send

IMAP-shaped handlers

DNS-touching endpoints

Per-endpoint abuse limits

Target state

Phase 1 — DMARC XML safety

Phase 2 — Outbound message integrity (/send)

Phase 3 — IMAP-shaped endpoint validation

A small shared validator in _shared/helper.py, used by every IMAP-shaped handler:

Every handler catches ValueError from these validators and returns a 400 with {"status": "Invalid input: <message>"}. Every handler also catches json.JSONDecodeError around json.loads(event['body']) and returns 400.

Phase 4 — DNS-touching endpoint hardening

Phase 5 — Per-endpoint abuse limits

Migration sequence

Each phase is one PR (or a small PR set) and is independently reversible.

Phase 1 — DMARC XML safety

Single PR. Touches only lambda/api/process_dmarc/. New env var DMARC_REPORT_SENDERS plumbed through Terraform (terraform/infra/modules/app/dmarc.tf) with a sensible default (“google.com,microsoft.com,yahoo-inc.com,fastmail.com,protonmail.com,mailchimp.com,emarsys.net” — extend as we observe legit senders in CloudWatch over the first week).

Rollback: revert the PR. Pre-existing reports already in cabal-dmarc-reports are unaffected — the parser change is forward-only.

Phase 2 — /send BCC removal + header validation

Single PR. Touches lambda/api/send/function.py and lambda/api/upload_url/function.py. No env var changes. No data migration.

Verification: send a test email with To+Cc+Bcc to a sinkhole address; confirm the resulting Sent-folder copy has no Bcc: header but the SMTP recipients list includes the BCC entry.

Rollback: revert the PR. No state to undo.

Phase 3 — IMAP-shaped endpoint validation

One PR for the shared validator in _shared/helper.py. Then one PR per affected handler (six handlers, six PRs) so each is independently revertable and the rollout can pause mid-stream if a validator turns out to be too strict.

Rollback per handler: revert the handler PR; the validator stays in helper.py unused. Revert the validator PR last only if all six handler PRs are reverted.

Phase 4 — DNS-touching endpoint hardening

Single PR. Touches _shared/helper.py (validators) and the five DNS-touching handlers. Adds a _zone_cache module-level dict in helper.py for the runtime verification cache.

Rollback: revert the PR. The zone-name verification is purely additive — pre-existing zones are not modified.

Phase 5 — Per-endpoint abuse limits

Smaller PR sequence:

  1. Audit-log structured emission (no enforcement, only logging). One PR.
  2. Rate-limit table + helper. One PR; adds cabal-rate-limits DynamoDB table in Terraform.
  3. Per-admin-mutation rate-limit gating. One PR per handler family (admin user-mgmt, admin domain-access). Two PRs.
  4. /list migration to query-against-GSI. Requires a one-time backfill from scan → write missing GSI keys; ship the GSI add as a separate apply before the handler PR.

Rollback per PR: revert. The rate-limit table can stay (cheap, on-demand billing); the handlers stop reading from it.

Risks and trade-offs

CI changes

Acceptance

Open questions

Out of scope for 0.10.x