CASE 56 · TEMPO · 2025
Self-hosted Kafka, retired without losing a partition.
A real-time bidding platform ran 28 self-hosted Kafka brokers across three AZs, with a team that had become reluctant ZooKeeper experts. We migrated to MSK Serverless for the variable workloads and MSK Provisioned for the steady ones, retired the EC2 cluster, and reclaimed the team’s attention.
Real-time bidding
MIGRATION
2025
RESULTS
What changed, by the numbers.
OPERATIONAL HOURS / WEEK
−92%
BROKER FAILURES
0
COST
−18%
CUTOVER WINDOW
ROLLING
HOW IT WENT
The team had been operating Kafka well — five 9s of partition availability over the previous year — but at significant cost. Brokers needed patching, ZooKeeper needed coddling, and the runbook was 47 pages long. Two engineers had become full-time Kafka operators by accident.
We provisioned MSK alongside the existing cluster and used MirrorMaker 2 to replicate every topic. After two weeks of mirroring with verified consumer-group offsets, we cut producers over topic-by-topic. Consumers followed when their offset position was confirmed in MSK.
No partitions lost. No data lost. The two reluctant Kafka operators went back to platform engineering work — one of them now leads the team. The 47-page runbook got archived. The new runbook is two pages.
RELATED · SAME DOMAIN
Other engagements in this space.
READY WHEN YOU ARE
Let's get your AWS bill (and architecture) in order.
The discovery call is free. You walk away with at least one concrete idea — even if we never work together.