Notes — Edward Clark

2026-04-12 · ~7 min read

Why I self-host mail (and why you probably shouldn’t)

I run Postfix + Dovecot at home. It’s been running two years. I would not recommend it to anyone who values their weekend.

Three weekends was my original estimate. What it turned out to be:

Weekend 1: install Postfix + Dovecot, set up IMAP, route local mail. Works. Everything works.
Weekend 2: nothing works. Gmail marks everything as spam. DKIM keys not aligned. SPF record missing. Reverse DNS not pointing at my residential IP. One ISP support ticket to set PTR.
Weekend 3: DMARC policy, ARC signature headers for the couple of auto-forwarders, TLS via Let’s Encrypt, mail-tester.com iterate until the score stops being embarrassing.

The “why you probably shouldn’t” part is not technical complexity. It’s that email deliverability is governed by ad-hoc reputation scores maintained by inbox providers, and your residential IP can drift into “looks like a botnet” territory without warning. SES at AWS costs ~$0 for the volume most users generate, and the deliverability is someone else’s problem.

I still self-host because I wanted muscle memory for SMTP. The muscle has paid off twice: once diagnosing why a client’s transactional mail was landing in spam, once reading headers on a misrouted alert. Both would have taken a paid consultant if I hadn’t done this first.

A runbook is the cheapest documentation investment — 20 minutes to write, lifetime of saved time.

Checklist I should have written first:

Static residential IP. Ask ISP for PTR record up front.
DKIM key pair + selector, publish in DNS, verify with mail-tester.com.
DMARC starting at p=none. Bump to quarantine after you see alignment in reports.
SPF one-per-type string, no more than 10 lookups.
ARC signer if you have any auto-forwarders (mailman, vacation responders).
TLS via Let’s Encrypt with auto-renew — mine has not expired once.
If you have another primary inbox, set it up as “send mail as,” not forwarder. Forwarders break ARC.

2026-05-22 · ~5 min read

Making contact-form Lambda feel snappy

The contact-form Lambda’s latency histogram, before any tuning, was bimodal: median ~180ms (warm), p99 ~1800ms (cold). The shape is the giveaway. One cliff at the warm-start plateau, one at the cold-start spike.

What I changed, in order of how much each moved p99:

1. Bundle size

Removed reportlab (no longer used once resume.pdf generation moved server-side, kept the dependency accidentally). Audit found two other heavy deps that were imported but never called. Deploy artifact went from ~12 MB to ~3 MB. Cold start dropped from ~1800ms to ~1100ms.

2. Lazy imports

Top-of-module imports are a habit. Heavy libraries moved into the handler. Another ~150ms.

3. Provisioned concurrency

Set reservedConcurrency=2 with one always-warm instance. Cold-start cliff disappears from the histogram entirely. Cost increase: ~$7/mo at low traffic. For my contact form (~2–4 messages/day, peaks during hiring season) it pencils out.

Always more levers than there is time. I have not tried SnapStart. Should, eventually.

What I have not done yet: SnapStart for harder cold-start workloads. Step Functions for multi-step warm-up. Warming multiple Lambda aliases on deploy.

Lesson, if there is one: measure the histogram, not the average. Average latency on this Lambda looked fine. The p99 was a recruiter refreshing a contact form and waiting two seconds before assuming something was broken.

2026-06-30 · ~4 min read

An on-call runbook I actually wrote

The gas-station POS goes down once every few months. The owner calls me, because I happen to be the person nearby who understands payment networks and Linux at the same time.

The first outage took 90 minutes and required a drive to the station. The fact that I had fixed it before did not help — different root cause, different fix.

So I wrote a runbook. Two pages, plain Markdown, indexed by symptom:

“Card reader won’t connect” → check WiFi AP / restart radio / power-cycle reader / verify static IP / SOS mode if all else fails.
“Payment processor slow” → ping gateway / ISP speedtest / fall back to backup terminal / notify customers / queue for batch at end of shift.
“Receipt printer out of paper” → refill. But FIRST check that what fell out isn’t actually the cutter housing. It happens.

Each step lists a completion-time estimate and “if this didn’t work, try the next thing.” Time-to-diagnose with runbook in hand: 4 minutes. Without: still 90.

A runbook is for the person who forgot what they did. Or the person who has never seen this system before.

The lesson I keep relearning: runbooks are not for the person who wrote them. They’re for the person who will write them again in six months and forgot what they did. Or the person who has never seen this system before and has forty-five seconds to decide what to try next.

A runbook is the cheapest documentation investment a working engineer makes — twenty minutes to write, lifetime of saved time. The reason most teams don’t have them is the same reason most teams have bad testing: writing the runbook is coding-adjacent work, and adjacent work is what gets cut first when the calendar gets tight.