Scheduled: PostgreSQL minor version upgrade
Brief send-tier read-only window for ~3 minutes during the rolling restart. Read API unaffected.
Tue · 13 May 03:00–03:15 CET
Resolved
DE-FAL send pool brief throttle (45 min)
23 Mar 2026 · 14:18–15:03 CET
A receiver-side rate limit cascade originating at a single ISP affected ~3% of DE-FAL outbound for 45 minutes. Throughput automatically rerouted to DE-NUE-A1 within 90 seconds; no sends were lost, only delayed by an average of 2 minutes. Root cause: their MX TLS handshake blocked our 2 oldest IPs in the FAL pool after a SPF-record edit on their side. We added pre-emptive monitoring on receiver-side TLS regressions.
14:18 CETPage · 8% of DE-FAL outbound holding in queue
14:21 CETIdentified: single receiver ISP rejecting 2 IPs in the pool
14:24 CETAuto-reroute to DE-NUE-A1 enabled · queue draining
15:03 CETResolved: queue drained, DE-FAL pool back to 100% throughput
Scheduled
Storage migration to EU-hosted StorageBox v2
17 Feb 2026 · 02:00–04:00 CET
Pre-announced 14 days in advance. Object store migrated from legacy NFS to EU-hosted StorageBox v2 + S3-compatible Garage. Zero customer-visible downtime; admin uploads paused for ~6 minutes during the cutover. Read paths served from the migrating-from store throughout. No data loss, no integrity warnings.
Resolved
BLUN AI gateway · subject-line endpoint slow (12 min)
9 Feb 2026 · 11:32–11:44 CET
Subject-line suggestion latency rose from 800 ms p50 to 6.4 s p50 for ~12 minutes due to a cache eviction storm on our embedding cache after a model deployment. No campaigns affected — block-editor fell back to "no suggestions" gracefully, and users were able to send normally. Cache warmer added pre-deploy to prevent recurrence.
Resolved
Audit log · delayed write to off-site replica (3h 22m)
28 Jan 2026 · 09:14–12:36 CET
Audit-log entries continued to write to the primary in DE-NUE during the incident, but the FAL replica fell behind by up to 3 hours due to a partitioned BGP path. No customer access was unlogged; the replica caught up at 12:36 with full continuity. Incident classified as a data-durability-only degradation; we count it on the SLA but it did not affect availability.