Zhivko Todorov
ALL CASE STUDIES

CASE 155 · DAYBREAK · 2024

ECSDEPLOYMENT CIRCUIT BREAKERCANARYAUTO-ROLLBACK

Bad deploys that roll themselves back.

A SaaS commerce platform had a deploy procedure where a human watched dashboards for ten minutes after each deploy to decide if it had gone well. About once a month they missed something subtle and customers noticed. We enabled the ECS deployment circuit breaker with appropriate health alarms.

INDUSTRY

SaaS commerce

DOMAIN

RELIABILITY

DELIVERED

2024

STACK

ECS FARGATE·ECS DEPLOYMENT CIRCUIT BREAKER·CLOUDWATCH ALARMS·CODEDEPLOY·X-RAY

RESULTS

What changed, by the numbers.

BAD-DEPLOY DETECTION

< 2m

WAS UP TO 10

AUTO-ROLLBACKS / MONTH

~3

PREVENTED INCIDENTS

HUMAN POST-DEPLOY WATCH

OPTIONAL

WAS REQUIRED

INCIDENT-CAUSED-BY-DEPLOY

−81%

YEAR-OVER-YEAR

HOW IT WENT

The ten-minute watch was a real cost — engineers stopped what they were doing to babysit the dashboards. It was also unreliable; subtle regressions snuck past tired eyes. Some deploys got rolled back at the next customer report instead of the next minute.

The deployment circuit breaker rolls a deploy back automatically if specified CloudWatch alarms breach during the deploy window. We wired up alarms on error rate, p99 latency, and synthetic-canary health. A breach during deploy triggers immediate rollback to the previous task definition.

Bad-deploy detection landed inside two minutes. Auto-rollbacks happened roughly three times per month — each one a preventable incident the team didn’t have to investigate. The human watch became optional; most deploys go in unwatched, with the breaker as the safety net.

READY WHEN YOU ARE

Let's get your AWS bill (and architecture) in order.

The discovery call is free. You walk away with at least one concrete idea — even if we never work together.

Or email directly →