CASE 125 · SANDPIPER · 2024
CI on Spot, with the wallet to prove it.
An open-source vendor ran GitHub Actions on a fleet of EC2 self-hosted runners — entirely on-demand, sized for peak. Off-peak utilisation was 12%. We rebuilt the runner fleet on Spot with Karpenter-driven scaling, and brought CI compute spend down 82%.
Open-source vendor
COST
2024
RESULTS
What changed, by the numbers.
CI COMPUTE BILL
−82%
BUILD QUEUE TIME
−61%
SPOT INTERRUPT IMPACT
< 0.4%
PROVISIONING TIME
< 25s
HOW IT WENT
The on-demand fleet had been sized for the worst Tuesday of the month. Most of the time, it sat under-utilised; engineers occasionally complained about queue time anyway because the peaks were even sharper than the sizing assumed.
Actions Runner Controller scaled runner pods on EKS via Karpenter, with Spot capacity across multiple instance families. EFS held the build cache so runners weren’t cold every time. Pod disruption budgets handled the rare Spot interruption.
CI bill dropped 82%. Peak-hour queue time dropped 61% because the elastic fleet could grow beyond what the on-demand sizing had supported. Spot interruption rate stayed under 0.4% of build-minutes; the affected builds retried automatically and finished only a couple of minutes later.
RELATED · SAME DOMAIN
Other engagements in this space.
READY WHEN YOU ARE
Let's get your AWS bill (and architecture) in order.
The discovery call is free. You walk away with at least one concrete idea — even if we never work together.