Skip to content

Playbook: Post-Incident Validation

Goal

Ensure system state is healthy and operational after incident resolution.

Trigger

Immediately after resolving a major incident (e.g. downtime, crash, config error).

Checklist

  • Verify affected services are running:
    systemctl status <service>
    
  • Run end-user test (UI/API ping)
  • Review logs:
    journalctl -u <service> --since "10 min ago"
    
  • Confirm alerts are cleared
  • Update incident log or ticket

Output

Postmortem summary (markdown or shared doc) + retrospective scheduling