Skip to content

Incident: Crashed Services

Description

System service (e.g., nginx, docker, sshd) has crashed or fails to start.

Symptoms

  • systemctl status <service> shows failed or inactive
  • Logs contain core dumped or exit code errors
  • Service not accessible over expected port

Root Cause Checklist

  • Recent configuration change?
  • Port conflict or dependency failure?
  • Missing or corrupted binaries?
  • Permission errors?

Resolution Steps

  1. Check service status:
    systemctl status <service>
    
  2. View logs:
    journalctl -u <service> --since "10 minutes ago"
    
  3. Restart the service:
    systemctl restart <service>
    
  4. Validate config if applicable (e.g., nginx):
    nginx -t
    

Preventive Actions

  • Use monitoring and alerts (e.g., systemd watchdogs)
  • Automate config testing in CI/CD
  • Isolate services in containers when possible

Tools & Commands

  • systemctl, journalctl, ss, lsof, docker logs