Incident: Crashed Services¶
Description¶
System service (e.g., nginx, docker, sshd) has crashed or fails to start.
Symptoms¶
systemctl status <service>shows failed or inactive- Logs contain
core dumpedorexit codeerrors - Service not accessible over expected port
Root Cause Checklist¶
- Recent configuration change?
- Port conflict or dependency failure?
- Missing or corrupted binaries?
- Permission errors?
Resolution Steps¶
- Check service status:
systemctl status <service> - View logs:
journalctl -u <service> --since "10 minutes ago" - Restart the service:
systemctl restart <service> - Validate config if applicable (e.g., nginx):
nginx -t
Preventive Actions¶
- Use monitoring and alerts (e.g., systemd watchdogs)
- Automate config testing in CI/CD
- Isolate services in containers when possible
Tools & Commands¶
systemctl,journalctl,ss,lsof,docker logs