CASE 155 · DAYBREAK · 2024
Bad deploys that roll themselves back.
A SaaS commerce platform had a deploy procedure where a human watched dashboards for ten minutes after each deploy to decide if it had gone well. About once a month they missed something subtle and customers noticed. We enabled the ECS deployment circuit breaker with appropriate health alarms.
SaaS commerce
RELIABILITY
2024
RESULTS
What changed, by the numbers.
BAD-DEPLOY DETECTION
< 2m
AUTO-ROLLBACKS / MONTH
~3
HUMAN POST-DEPLOY WATCH
OPTIONAL
INCIDENT-CAUSED-BY-DEPLOY
−81%
HOW IT WENT
The ten-minute watch was a real cost — engineers stopped what they were doing to babysit the dashboards. It was also unreliable; subtle regressions snuck past tired eyes. Some deploys got rolled back at the next customer report instead of the next minute.
The deployment circuit breaker rolls a deploy back automatically if specified CloudWatch alarms breach during the deploy window. We wired up alarms on error rate, p99 latency, and synthetic-canary health. A breach during deploy triggers immediate rollback to the previous task definition.
Bad-deploy detection landed inside two minutes. Auto-rollbacks happened roughly three times per month — each one a preventable incident the team didn’t have to investigate. The human watch became optional; most deploys go in unwatched, with the breaker as the safety net.
RELATED · SAME DOMAIN
Other engagements in this space.
READY WHEN YOU ARE
Let's get your AWS bill (and architecture) in order.
The discovery call is free. You walk away with at least one concrete idea — even if we never work together.