Zhivko Todorov
ALL CASE STUDIES

CASE 152 · ALMANAC · 2024

SNSDLQDELIVERY RETRYOBSERVABILITY

SNS deliveries that don’t silently vanish.

A notification platform fanned messages out through SNS to dozens of downstream subscribers. When a subscriber endpoint failed, SNS would retry briefly and then drop. The platform’s customers were quietly losing notifications. We added DLQs and proper delivery monitoring across every topic.

INDUSTRY

Notification platform

DOMAIN

RELIABILITY

DELIVERED

2024

STACK

AMAZON SNS·SNS DLQ·SQS·CLOUDWATCH METRICS·EVENTBRIDGE

RESULTS

What changed, by the numbers.

SILENT DROPS

−100%

DLQ-CAUGHT NOW

TOPICS WITH DLQ

47 / 47

FULL COVERAGE

AVG. RETRY EXHAUSTION

CAUGHT

WAS SILENT

CUSTOMER COMPLAINTS

−84%

MISSING-NOTIFICATION TICKETS

HOW IT WENT

The retry-and-drop pattern was SNS’s default. The team had been operating it for years without realising that "drop" really meant gone — no log, no alert, no record. Customer support tickets about missing notifications were the only signal, and they were assumed to be customer-side issues.

We attached SQS DLQs to every SNS subscription, set delivery-retry policies appropriate to each subscriber type, and routed DLQ depth alarms through EventBridge. Failed deliveries now land somewhere queryable and triggerable, not gone.

Silent drops fell to zero — failures are caught by DLQ and either retried programmatically or paged. Customer complaints about missing notifications dropped 84% as the team could reliably reproduce and resolve the remaining edge cases.

READY WHEN YOU ARE

Let's get your AWS bill (and architecture) in order.

The discovery call is free. You walk away with at least one concrete idea — even if we never work together.

Or email directly →