Use Bash to Monitor and Restart Crashed Services Automatically

Introduction

In production environments, even the most stable services can fail unexpectedly. Maintaining uptime often requires an automated solution that monitors critical processes and restarts them when they crash. In this article, we’ll build a robust Bash-based monitoring script that automatically detects and recovers failed services. Along the way, we’ll add useful features like logging, email alerts, and configurable monitoring intervals — things that can help keep your systems running smoothly without constant manual supervision.

1. Checking for Running Services with Bash

Our first step is to detect whether a service or process is currently running. Bash provides several tools for this — the most common being pgrep and systemctl status. Here’s a simple example that checks for the running status of a service:

#!/bin/bash

SERVICE_NAME="nginx"
if pgrep -x "$SERVICE_NAME" &> /dev/null; then
    echo "$SERVICE_NAME is running."
else
    echo "$SERVICE_NAME is NOT running."
fi

This snippet uses pgrep to search for a process by name. The -x flag ensures it matches the exact process name, reducing false positives. This basic check can then be wrapped inside a routine that continually monitors the service.

2. Automatically Restarting Services on Failure

Once a crashed service is detected, the next step is to restart it. For system services managed by systemd, you can easily restart them with systemctl. For standalone scripts or binaries, you can use a defined start command. Here’s an example:

#!/bin/bash

SERVICE_NAME="nginx"

if ! pgrep -x "$SERVICE_NAME" &> /dev/null; then
    echo "[$(date)] $SERVICE_NAME not running, restarting..." | tee -a /var/log/service_monitor.log
    systemctl restart "$SERVICE_NAME"
    if [ $? -eq 0 ]; then
        echo "[$(date)] Successfully restarted $SERVICE_NAME." | tee -a /var/log/service_monitor.log
    else
        echo "[$(date)] Failed to restart $SERVICE_NAME." | tee -a /var/log/service_monitor.log
    fi
fi

Here, we not only restart the service but also log our actions. The tee command writes to both the console and the log file, providing visibility during manual debugging.

3. Scheduling and Automating the Monitor

The next logical step is to make the script run regularly without manual intervention. We can use the Linux cron scheduler for periodic execution. Adding an entry to your cron table makes it effortless:

*/5 * * * * /usr/local/bin/service_monitor.sh

This runs the script every five minutes. Adjust the frequency based on how critical the service is. Running the check every minute may be necessary for critical APIs, but less frequent checks might suffice for background services to avoid excessive load.

4. Adding Email Alerts for Crash Events

Logging is great, but proactive alerting ensures you know about a problem even if the service restarts automatically. By using mail or sendmail, we can notify administrators when a restart happens:

#!/bin/bash

SERVICE_NAME="nginx"
ADMIN_EMAIL="admin@example.com"

if ! pgrep -x "$SERVICE_NAME" &> /dev/null; then
    echo "[$(date)] $SERVICE_NAME stopped. Restarting..." | mail -s "$SERVICE_NAME crash alert" $ADMIN_EMAIL
    systemctl restart "$SERVICE_NAME"
fi

Including notifications helps maintain awareness of underlying issues — such as repeated restarts due to ongoing configuration or resource problems — instead of silently masking them with automation.

5. Enhancing Reliability with Configuration and Logging

To make this monitor more flexible, we can use a configuration file containing service definitions and respective restart commands. Here’s an example using a list of monitored services:

# /etc/service_monitor.conf
docker systemctl restart docker
nginx systemctl restart nginx
redis-server systemctl restart redis-server

We can modify our script to loop through this configuration file:

#!/bin/bash

CONFIG_FILE="/etc/service_monitor.conf"
LOG_FILE="/var/log/service_monitor.log"

while read SERVICE RESTART_CMD; do
    if ! pgrep -x "$SERVICE" &> /dev/null; then
        echo "[$(date)] $SERVICE crashed. Restarting..." | tee -a "$LOG_FILE"
        eval $RESTART_CMD
    fi
done < "$CONFIG_FILE"

This multi-service approach scales well and lets you define restart behaviors for different types of processes. The use of eval allows dynamic command execution, though it’s important to validate the configuration file for security reasons.

Conclusion

Using Bash to automatically monitor and restart crashed services provides a simple yet effective form of resilience. By combining process checks, restart logic, logging, and alerts, you can achieve near-continuous uptime without complex infrastructure tools. For small- to medium-scale systems, this lightweight automation is a huge reliability boost. As you scale up, consider combining this approach with system monitoring frameworks like systemd watchdogs or tools like Monit or Supervisor for even greater robustness.

Useful links: