2026-04-29

Your Check Interval Determines Your SLO Precision

It's Sunday morning. You're half-awake with your phone in hand, watching a deploy roll out. Something goes wrong — a bad config value, a missing environment variable — and your service starts returning 500s. You catch it in the logs within seconds, revert the deploy, and the site is healthy again in under two minutes. Total actual downtime: about 12 seconds of hard failures, maybe another 30 seconds of elevated error rates. You call it a near-miss and go back to sleep.

By Friday, your monthly SLO summary lands in your inbox. That one failed check on Sunday morning ate 12% of your entire month's error budget. Not because your site was down for 5 minutes — it wasn't. But because that's the finest resolution your monitoring system had. One failed poll, 5 minutes of downtime on paper, and the rest of the month spent holding your breath.

This is the other cost of long check intervals — not just missing outages entirely, but misrepresenting the ones you do catch.

The Math: How Much Budget Does One Check Actually Cost?

When a polling monitor catches a failure, it records the entire interval as unknown or degraded time. It doesn't know when the outage started within that window — it only knows that the check failed. So a failed check at a 5-minute interval contributes up to 5 minutes of downtime to your error budget, regardless of whether the actual outage lasted 12 seconds or 4 minutes 59 seconds.

For a 99.9% SLO over a 30-day month, your total allowed downtime is 43.2 minutes. Here's what a single failed check costs you at different intervals:

Check interval	Budget per month (99.9%/30d)	Cost of one failed check	% of monthly budget	Failed checks before budget exhausted
5 minutes	43.2 min	5 min	11.6%	8
1 minute	43.2 min	1 min	2.3%	43
30 seconds	43.2 min	30 sec	1.2%	86
10 seconds	43.2 min	10 sec	0.4%	259

That Sunday morning deploy rollback? At 5-minute intervals, it consumed 11.6% of your budget. At 1-minute intervals, the same actual event — same 12 seconds of real downtime — costs 2.3%. At 30 seconds, it's 1.2%. The outage didn't change. Your ability to measure it did.

See the full budget math in the error budgets documentation.

Why This Hurts More Than It Looks

Budget erosion from coarse intervals isn't just a measurement problem — it's a confidence problem. When 12 seconds of real downtime shows up as 5 minutes in your SLO report, you can't spend the rest of the month normally. Do you freeze deploys? Slow down the release cadence? How do you explain to leadership that you burned 12% of budget on what was essentially a successful rollback?

The answer is: you can't, confidently, because your data doesn't support it. Your monitoring told you there was a 5-minute window where something was wrong. You know from logs that it was actually 12 seconds. But the budget report doesn't know that, and your SLO tooling doesn't know that. You have a number you can't trust in either direction.

Finer intervals give you a budget that actually reflects what happened. When a 12-second blip costs you 0.4% instead of 11.6%, you can see it for what it is: a minor event, handled quickly, that barely moved the needle. You can deploy again on Monday without second-guessing whether the budget can take it.

The Double Penalty at 5-Minute Intervals

Long check intervals hurt you twice. First, as the detection-gap post covers, they let short outages pass completely undetected — a 2-minute outage that resolves between checks never fires an alert and never appears in your incident log.

Second, for outages that are caught, the budget impact is inflated. The same 12-second event costs 30x more budget at 5 minutes than at 10 seconds. That inflation compounds across every noisy deploy, every flapping instance, every transient dependency hiccup. By the end of a busy month, you can find yourself in budget deficit from incidents that were, in reality, well within your reliability targets.

A monitoring system running at coarse intervals is therefore bad at both ends of the outage spectrum: it misses the short ones entirely, and it over-charges your budget for the ones it catches. This isn't a configuration problem you can tune around — it's a resolution ceiling built into the interval itself.

What About the Extra Load?

The reasonable objection here is that shorter intervals mean more requests to your service. At 10-second intervals, you're making six times as many checks as at 1 minute. For most HTTP/HTTPS monitoring of normal production services, this is negligible. A single HTTP health check at 10-second intervals is 6 requests per minute — far less traffic than any real user would generate.

The only cases where interval frequency creates meaningful load are heavily rate-limited APIs, services with expensive health check implementations, or very large fleets where the aggregate request volume adds up. For a typical web application or API endpoint, moving from 5-minute to 1-minute intervals produces no measurable impact on your infrastructure.

The cost of coarse intervals is real and compounds over time. The cost of fine intervals, for most services, is effectively zero.

What We Recommend

For production-tier services with an active SLO, 1-minute intervals are the right default. They give you a budget that accurately reflects reality, detect the vast majority of real-world outages before they compound, and don't require any special justification to your infrastructure team.

For revenue-critical paths — checkout flows, payment APIs, anything where 30 seconds of downtime is a meaningful business event — consider 30-second intervals. The budget precision improvement is significant (2.3% down to 1.2% per failed check), and at that cadence you'll catch most short outages before they resolve on their own.

For batch or cron-style work where the service isn't continuously available, polling intervals don't fit the model anyway. Use heartbeat monitoring instead: the service signals Nines after each successful run, and you alert if the expected heartbeat doesn't arrive.

Check which intervals are available on your plan at the plan comparison page. Free-tier monitors run at 5-minute intervals; paid plans unlock 1-minute and shorter. Upgrade to get the precision your SLO actually needs.