Incidents
Nines opens incidents automatically when a monitor degrades. Understanding the two incident types — region-failure and burn-rate — helps you respond to the right signal.
Two incident types
Nines opens incidents for two distinct failure modes:
- Region-failure incidents — the monitor is unreachable from one or more geographic regions. These are opened immediately when enough regions report failure and resolved as soon as the monitor recovers in all regions.
- Burn-rate incidents — your SLO error budget is burning faster than your target allows, even if the monitor is currently passing. These require a Business plan and are only visible to the account owner. See Burn-rate incidents for details.
Region-failure lifecycle
A region-failure incident moves through three states:
- Investigating — Nines has detected failures in one or more regions and is collecting additional data.
- Identified — the source of the failure has been identified (this state is set manually via the incident detail page).
- Resolved — all regions are reporting success again. Nines resolves the incident automatically once recovery is confirmed.
Region-failure incidents auto-resolve as soon as the monitor recovers. You do not need to take any manual action.
Burn-rate incident lifecycle
Burn-rate incidents follow the same Investigating → Identified → Resolved flow, but with one important difference: they do not auto-resolve when the VM returns an error or no data.
If the metrics backend returns an error or has no data for the metric query — for example because a check worker restarted — Nines leaves the burn-rate incident open and the cooldown clock does not advance. The incident stays open until the burn rate is confirmed below threshold with fresh data. This prevents premature auto-resolution during transient data unavailability.
Notification timing
Notifications are sent at these lifecycle events:
- Incident created — a webhook
incident.createdevent and an email alert are dispatched immediately when an incident opens. - Incident resolved — a webhook
incident.resolvedevent and a resolution email are dispatched when the incident closes. - Incident updated — a webhook
incident.updatedevent is dispatched when an incident's details change (for example, when the status moves to Identified).
See Notifications for how to configure channels, and Webhooks for the full payload schema.
Viewing incidents
All open and recent incidents are listed on the Incidents page in your dashboard. Each incident shows the monitor name, affected regions (for region-failure incidents), the time it opened, and the current status.
The incident detail page shows the full timeline and lets you update the status or add a note. Burn-rate incidents display the SLI type (availability or latency) and the burn multiplier at the time of detection.