Last Updated: November 21, 2025
Lifecycle
| Phase | Focus |
|---|---|
Detect
|
Monitor alerts and anomalies |
Declare
|
Escalate with severity + status page |
Contain
|
Isolate systems and cut blast radius |
Recover
|
Restore services + monitor stability |
Postmortem
|
Document timeline, impact, and action items |
Runbook Snippets
pagerduty trigger
Alert the on-call roster
timeline update
Share status on incident doc
reroute traffic
Shift load to healthy regions
run postmortem
Capture learnings & owners
Communication
Balance timely updates with accuracy and publish a concise postmortem summary.
💡 Pro Tip:
Document actions in a shared channel and formalize follow-ups before closing incidents.