Burn-Rate Incidents
A burn-rate incident is opened automatically when SLO error budget is consumed faster than it refills, evaluated by the multi-window detector.
SLI types
availability_burn- Triggered when the fraction of failed checks consumes availability budget faster than the SLO target supports. Runs on every monitor.
latency_burn- Triggered when the fraction of slow checks (those exceeding
latency_threshold_ms) consumes latency budget faster than the latency SLO supports. Runs onhttp_checkmonitors only.
A monitor can have both types of burn-rate incident open simultaneously.
Detection windows
| Pair | Long window | Short window | Threshold |
|---|---|---|---|
| Fast | 1h | 5m | 14.4× |
| Slow | 6h | 30m | 6× |
Both windows in a pair must exceed the threshold for the pair to fire. Either pair firing opens an incident. See Multi-window thresholds for the derivation.
Lifecycle
- Investigating — set on creation.
- Identified — set manually by an operator.
- Monitoring — set manually by an operator.
- Resolved — set automatically once burn returns below threshold and stays below for the 5-minute cooldown. Auto-resolve skips Monitoring.
Warmup gate
A burn-rate window pair only evaluates once the monitor's age is at least the long window (1h for fast, 6h for slow). Below that age the pair returns Unknown and cannot open or close an incident. If neither pair is eligible, the detector returns Unknown for the monitor.
Unknown state
The detector returns Unknown when the VM datasource errors, returns no data, or the warmup gate is in effect:
- If no incident is open, none is opened.
- If an incident is open, it stays open and the cooldown clock is paused.
The incident does not auto-resolve until the burn rate is confirmed below threshold with fresh data.
Owner-only visibility
Burn-rate incidents are visible to the account owner on the monitor detail page and the incidents list. They never appear on a public status page. Only region-failure incidents are surfaced publicly.
Region-failure suppression
While a region-failure incident is open on a monitor, availability_burn evaluation for that monitor is skipped. The suppression is one-directional and one-axis:
- Only affects
availability_burn, notlatency_burn. - Only applies while the region-failure incident is open. Resumes on the next tick after resolution.
See also
- Incident detectors: region-failure and burn-rate
- Multi-window thresholds
- Incidents — region-failure lifecycle.