CASE 51 · FOUNDRY · 2023
Three observability stacks, one bill, one source of truth.
A travel platform had Datadog ($28k/mo), New Relic ($14k/mo), and a self-hosted Prometheus/Grafana stack on EKS ($6k/mo of compute). Three teams, three vendors, three on-call experiences. We consolidated to a single stack and saved $36k a month, without losing any monitoring capability.
Travel platform
COST
2023
RESULTS
What changed, by the numbers.
TOOLING BILL
−75%
TOOLS RETIRED
2
ON-CALL FAMILIARITY
UNIFIED
FEATURE PARITY
94%
HOW IT WENT
The consolidation argument wasn’t cost — it was on-call experience. A pager woke an engineer at 3am, and they had to remember which tool had the runbook for which service. The cost win was a bonus.
We mapped the feature surface of each tool against the team’s actual usage and identified that 94% of the active functionality could be served by Managed Grafana on top of AMP and CloudWatch. The 6% gap was investigated honestly — some Datadog APM features (real-time profiling) had no native equivalent. We accepted the gap, with a documented workaround.
Cutover ran service-by-service over six weeks with shadow alerting. On-call now opens one dashboard per service, in one tool. The bill saving funded the team’s annual Re:Invent trip with money to spare.
RELATED · SAME DOMAIN
Other engagements in this space.
READY WHEN YOU ARE
Let's get your AWS bill (and architecture) in order.
The discovery call is free. You walk away with at least one concrete idea — even if we never work together.