CASE 66 · QUARRY · 2023
On-call runbooks that the next person on rotation can actually use.
A B2B SaaS platform had eighteen services, eighteen different on-call rotations, and eighteen different runbook formats — most of them outdated or missing. New rotation members spent their first quarter in survival mode. We standardised the runbook format and the on-call onboarding.
B2B SaaS
RELIABILITY
2023
RESULTS
What changed, by the numbers.
NEW ROTATION RAMP-UP
< 2w
RUNBOOK COVERAGE
100%
ESCALATIONS / WEEK
−54%
POSTMORTEM "RUNBOOK MISSING"
0
HOW IT WENT
The first on-call shift for a new engineer was always rough. Some services had wiki pages from 2020. Some had Notion docs nobody could find. Some just had "ask the team lead." Postmortems regularly cited "runbook was missing/outdated/wrong" as a contributing factor.
We built a Backstage template for service runbooks with required sections: the four most common alerts, their causes, their first-step mitigations, the relevant dashboards. Each service owner ran a 90-minute workshop to fill in their template. CloudWatch dashboards were embedded by URL.
New rotation members now reach competence inside two weeks instead of two months. Escalations from junior on-call to senior dropped 54% — most alerts now have actionable runbooks at the first page. The "runbook missing" line item is gone from postmortems.
RELATED · SAME DOMAIN
Other engagements in this space.
READY WHEN YOU ARE
Let's get your AWS bill (and architecture) in order.
The discovery call is free. You walk away with at least one concrete idea — even if we never work together.