CASE 173 · VERNAL · 2023
The platform team’s SLA, made measurable.
A B2B SaaS platform team had an "internal SLA" with its application-team customers — uptime for shared services like the CI cluster, the artifact registry, the secrets store. The SLA was claimed; it was never measured. We built the measurement and a public-internal dashboard.
B2B SaaS
PLATFORM
2023
RESULTS
What changed, by the numbers.
SHARED SERVICES MEASURED
14
SLA TRANSPARENCY
INTERNAL-PUBLIC
SLA-VIOLATION RESPONSE
< 1h
TRUST IN PLATFORM
+22 NPS
HOW IT WENT
The platform team had been frustrated that application teams routinely under-trusted them. "Is CI down?" was a recurring Slack question even when it wasn’t. The honest answer was that nobody had visibility — including the platform team — into whether CI was up or down at the moment.
CloudWatch Synthetics ran scripted health checks against the 14 shared services. The Grafana dashboard was internal-public — every engineer could see the current state of every platform service. PagerDuty routed alarm breaches to the platform team’s rotation.
Internal-NPS for the platform team improved 22 points in the quarter following rollout. SLA-violation response landed inside an hour. The "is CI down?" Slack pattern stopped — the answer was a Grafana link away.
RELATED · SAME DOMAIN
Other engagements in this space.
READY WHEN YOU ARE
Let's get your AWS bill (and architecture) in order.
The discovery call is free. You walk away with at least one concrete idea — even if we never work together.