Zhivko Todorov
ALL CASE STUDIES

CASE 156 · EDDY · 2025

WEBSOCKETAPI GATEWAYDYNAMODBSCALING

Half a million open connections, no surprises.

A social platform’s chat feature ran on a self-managed WebSocket gateway that fell over once it crossed 80k concurrent connections. The team had been scaling vertically (bigger instances) and praying. We rebuilt on API Gateway WebSocket API with DynamoDB-backed connection state.

INDUSTRY

Social platform

DOMAIN

RELIABILITY

DELIVERED

2025

STACK

API GATEWAY WEBSOCKET·AWS LAMBDA·DYNAMODB·SQS (BROADCAST)·CLOUDWATCH METRICS

RESULTS

What changed, by the numbers.

CONCURRENT CONNECTIONS

520K

PEAK MEASURED

CONNECTION-DROP RATE

< 0.05%

PER HOUR

OPERATIONAL HOURS

−94%

AWS-MANAGED

p99 MESSAGE LATENCY

< 90ms

STEADY STATE

HOW IT WENT

The self-managed gateway had been a clever piece of engineering when the platform was small. At scale, the operational model didn’t scale with it — connection state was in-process, broadcasts required cross-instance messaging the team had built by hand, and fault tolerance was "if the box dies, all clients reconnect at once."

API Gateway WebSocket API moved connection management to a managed service. Connection state moved to DynamoDB so any Lambda invocation could find the right route key. Broadcasts fanned out via SQS to keep the broadcaster latency-insensitive to subscriber count.

Peak concurrent connections hit 520k during a viral event without alerting on anything. Connection-drop rate stayed under 0.05% per hour. Operational hours dropped 94% — there’s no more "WebSocket gateway on-call." p99 message latency stays under 90ms steady-state.

READY WHEN YOU ARE

Let's get your AWS bill (and architecture) in order.

The discovery call is free. You walk away with at least one concrete idea — even if we never work together.

Or email directly →