Postmortem basics
Topic: Monitoring basics
Summary
Write a postmortem after significant incidents. Include timeline, root cause, impact, and actions. Blameless culture. Use when you need to learn from outages and prevent recurrence.
Intent: How-to
Quick answer
- Sections: summary, impact, timeline, root cause, what went well, what to improve, action items with owners.
- Blameless. Focus on systems and process. Assign action items and due dates. Publish and discuss with team.
- Follow up on action items. Link postmortem from runbook or alert. Reuse for training.
Prerequisites
Steps
-
Draft
Summary, impact, timeline, root cause. What went well and what to improve. Action items with owners.
-
Review and publish
Blameless review. Publish. Discuss in team meeting. Link from runbook.
-
Follow up
Track action items. Close when done. Revisit if similar incident recurs.
Summary
Write blameless postmortem with timeline, root cause, and action items. Publish and follow up.
Prerequisites
Steps
Step 1: Draft
Summary, impact, timeline, root cause, actions with owners.
Step 2: Review and publish
Blameless review; publish; discuss; link from runbook.
Step 3: Follow up
Track and close action items; revisit if incident recurs.
Verification
- Postmortem published; actions tracked; runbook updated.
Troubleshooting
No root cause — Document knowns and unknowns; still add actions. Blame — Reframe as process and systems.