Incident: Crashed Services¶
Description¶
System service (e.g., nginx, docker, sshd) has crashed or fails to start.
Symptoms¶
systemctl status <service>
shows failed or inactive- Logs contain
core dumped
orexit code
errors - Service not accessible over expected port
Root Cause Checklist¶
- Recent configuration change?
- Port conflict or dependency failure?
- Missing or corrupted binaries?
- Permission errors?
Resolution Steps¶
- Check service status:
systemctl status <service>
- View logs:
journalctl -u <service> --since "10 minutes ago"
- Restart the service:
systemctl restart <service>
- Validate config if applicable (e.g., nginx):
nginx -t
Preventive Actions¶
- Use monitoring and alerts (e.g., systemd watchdogs)
- Automate config testing in CI/CD
- Isolate services in containers when possible
Tools & Commands¶
systemctl
,journalctl
,ss
,lsof
,docker logs