Zhivko Todorov
ALL CASE STUDIES

CASE 147 · UMBER · 2023

SQSVISIBILITY TIMEOUTDLQTUNING

SQS messages processed once, not three times.

An identity verification platform had an SQS queue feeding a Lambda consumer with the visibility timeout set to the Lambda’s 30-second timeout. About 3% of messages were being processed two or three times because the Lambda occasionally ran longer than 30 seconds. We tuned the visibility timeout and idempotency together.

INDUSTRY

Identity verification

DOMAIN

RELIABILITY

DELIVERED

2023

STACK

AWS SQS·LAMBDA·DYNAMODB (IDEMPOTENCY)·CLOUDWATCH METRICS·SQS DLQ

RESULTS

What changed, by the numbers.

DUPLICATE PROCESSING

−99%

IDEMPOTENT NOW

VISIBILITY TIMEOUT

6× p99

TUNED FROM 1×

DLQ ATTEMPTS

5

BEFORE PERMANENT FAILURE

COMPLIANCE INCIDENTS

CLEARED

KYC DUPLICATE-CHARGE

HOW IT WENT

The compliance trigger was the worst kind: an identity verification was charged twice on the customer because the underlying SQS message had been processed twice and the system wasn’t idempotent. The underlying cause was a routine Lambda slow path crossing the 30-second visibility timeout.

We set the visibility timeout to six times the Lambda’s p99 duration (a common rule of thumb) and added idempotency via DynamoDB conditional writes keyed on the message ID. The DLQ retry policy stepped from "fail after three" to "fail after five" — most of the previously-failed-after-three messages succeeded on the fourth attempt.

Duplicate processing dropped 99% — the rare remaining cases trigger the idempotency check and short-circuit. Compliance cleared the KYC duplicate-charge finding. The team has the same Lambda code, just better queue semantics around it.

READY WHEN YOU ARE

Let's get your AWS bill (and architecture) in order.

The discovery call is free. You walk away with at least one concrete idea — even if we never work together.

Or email directly →