Burn-Rate Incidents

A burn-rate incident is opened automatically when SLO error budget is consumed faster than it refills, evaluated by the multi-window detector.

SLI types

availability_burn
Triggered when the fraction of failed checks consumes availability budget faster than the SLO target supports. Runs on every monitor.
latency_burn
Triggered when the fraction of slow checks (those exceeding latency_threshold_ms) consumes latency budget faster than the latency SLO supports. Runs on http_check monitors only.

A monitor can have both types of burn-rate incident open simultaneously.

Detection windows

PairLong windowShort windowThreshold
Fast1h5m14.4×
Slow6h30m

Both windows in a pair must exceed the threshold for the pair to fire. Either pair firing opens an incident. See Multi-window thresholds for the derivation.

Lifecycle

  1. Investigating — set on creation.
  2. Identified — set manually by an operator.
  3. Monitoring — set manually by an operator.
  4. Resolved — set automatically once burn returns below threshold and stays below for the 5-minute cooldown. Auto-resolve skips Monitoring.

Warmup gate

A burn-rate window pair only evaluates once the monitor's age is at least the long window (1h for fast, 6h for slow). Below that age the pair returns Unknown and cannot open or close an incident. If neither pair is eligible, the detector returns Unknown for the monitor.

Unknown state

The detector returns Unknown when the VM datasource errors, returns no data, or the warmup gate is in effect:

  • If no incident is open, none is opened.
  • If an incident is open, it stays open and the cooldown clock is paused.

The incident does not auto-resolve until the burn rate is confirmed below threshold with fresh data.

Owner-only visibility

Burn-rate incidents are visible to the account owner on the monitor detail page and the incidents list. They never appear on a public status page. Only region-failure incidents are surfaced publicly.

Region-failure suppression

While a region-failure incident is open on a monitor, availability_burn evaluation for that monitor is skipped. The suppression is one-directional and one-axis:

  • Only affects availability_burn, not latency_burn.
  • Only applies while the region-failure incident is open. Resumes on the next tick after resolution.

See also