Zhivko Todorov
ALL CASE STUDIES

CASE 148 · VORTEX · 2025

EVENTBRIDGEEVENT REPLAYARCHIVEINCIDENT RECOVERY

Events you can replay, even from a Tuesday last quarter.

An e-commerce platform had a Lambda consumer behind EventBridge that had silently been throwing on a malformed event for six hours during a deploy. The downstream order-update emails for that window had never been sent. We turned on EventBridge Archive and Replay so the next such incident would be a recoverable one.

INDUSTRY

E-commerce

DOMAIN

RELIABILITY

DELIVERED

2025

STACK

EVENTBRIDGE·EVENTBRIDGE ARCHIVE·EVENTBRIDGE REPLAY·CLOUDWATCH METRICS·DLQ

RESULTS

What changed, by the numbers.

EVENT REPLAY WINDOW

90d

ARCHIVED

INCIDENT RECOVERY TIME

< 30m

PER REPLAY

CUSTOMER NOTIFICATION

POSSIBLE

WAS NOT

ARCHIVE COST

$140 / MO

AT VOLUME

HOW IT WENT

The incident itself was minor — the Lambda fix was a one-line config change. The painful part was the six hours of order updates customers didn’t receive. The team had no archive of the events, no way to replay them, no way to make customers whole except a generic apology.

EventBridge Archive on the rule captured every event before it routed to consumers, with 90 days of retention. Replay let us re-emit any time window to any consumer — useful for replaying after a fix, or for triggering a fresh consumer against historical data.

The next similar incident took 30 minutes to recover from end-to-end: detect the issue, fix the bug, redeploy the consumer, replay the affected time window. Customers got their delayed emails. The archive cost is $140/month, which the CEO described as the best insurance the company buys.

READY WHEN YOU ARE

Let's get your AWS bill (and architecture) in order.

The discovery call is free. You walk away with at least one concrete idea — even if we never work together.

Or email directly →