Watchdog script to keep an application running -

I currently use an application called MxEasy on my linux servers to display video from a couple of IP cameras. The software is rather buggy and crashes occasionally. I've wrote a script that checks to see if the application is running and if its not... it launches the application.

I've tried adding this line to my crontab to have it run the script. It's running the script but not launching MxEasy. Any thing I'm over looking?

0,15,30,45,50 * * * * root  export DISPLAY=:0 && /etc/cron.hourly/MxEasyCheck.sh

BTW Ubuntu Server 12.04 is the OS

Here is MxEasyCheck.sh

MXEASY=$(ps -A | grep -w MxEasy)

if ! [ -n "$MXEASY" ] ; then
    /home/emuser/bin/MxEasy/startMxEasy.sh &
    exit
fi

#!/bin/bash
    #
    # watchdog
    #
    # Run as a cron job to keep an eye on what_to_monitor which should always
    # be running. Restart what_to_monitor and send notification as needed.
    #
    # This needs to be run as root or a user that can start system services.
    #
    # Revisions: 0.1 (20100506), 0.2 (20100507)

    NAME=sample_service
    NAME2=sample_service2
    START=/usr/sbin/$NAME
    START2=/usr/sbin/$NAME2
    NOTIFY=joe@gmail.com
    NOTIFYCC=jim@mail.com
    GREP=/bin/grep
    PS=/bin/ps
    NOP=/bin/true
    DATE=/bin/date
    # MAIL=/bin/mail
    RM=/bin/rm

    $PS -ef|$GREP -v grep|$GREP $NAME >/dev/null 2>&1
    case "$?" in
     0)
     # It is running in this case so we do nothing.
      echo "$NAME is RUNNING OK. Relax."

     $NOP
     ;;
     1)
     echo "$NAME is NOT RUNNING. Starting $NAME and sending notices."
     $START 2>&1 >/dev/null &
     NOTICE=/tmp/watchdog.txt
     echo "$NAME was not running and was started on `$DATE`" > $NOTICE
     # $MAIL -n -s "watchdog notice" -c $NOTIFYCC $NOTIFY < $NOTICE
     $RM -f $NOTICE
     ;;
    esac

     # GT06
    $PS -ef|$GREP -v grep|$GREP $NAME2 >/dev/null 2>&1
    case "$?" in
     0)
     # It is running in this case so we do nothing.
      echo "$NAME2 is RUNNING OK. Relax."

     $NOP
     ;;
     1)
     echo "$NAME2 is NOT RUNNING. Starting $NAME2 and sending notices."
     $START2 2>&1 >/dev/null &
     NOTICE=/tmp/watchdog.txt
     echo "$NAME2 was not running and was started on `$DATE`" > $NOTICE
     # $MAIL -n -s "watchdog notice" -c $NOTIFYCC $NOTIFY < $NOTICE
     $RM -f $NOTICE
     ;;
    esac

    exit

Watchdog Script for the Linux Processes (Asterisk)

1. Create a script file. Let's say astSvcControl.sh at /root

#!/bin/bash
#script file: astSvcControl.sh

PROCESS="asterisk"
PROCCHK=$(ps aux|grep -c $PROCESS)
if [ $PROCCHK -eq 1 ]
then
/usr/sbin/asterisk
echo "Started Asterisk Service at $(date)" >> /var/log/asterisk/asterisk-check.log
else
echo "$PROCESS is running $PROCCHK processes" >> /var/log/asterisk/asterisk-check.log
fi

[Note: In SuSE linux, ps aux|grep -c < process name > returns 1 if the process is not running and returns 2 or more if the process is running
In SuSE linux, grep < process name > also returns grep < process name > as one of the process, thus it always returns 1
But some flavors of linux, it returns 0 if the process is not running and returns 1 or more if the process is running
]
2. Make sure that script is executable

#chmod 770 astSvcControl.sh

3. Run the script periodically using cronjobs

#crontab -e
* * * * * bash /root/astSvcControl.sh

(run every minute)

If you want to run the script every 5 mintutes , do this

#crontab -e
*/5 * * * * bash /root/astSvcControl.sh

Recently at work I was given a feature to support the customization and installation of OpenPegasus CIMOM (CIM Server) on Linux machines in binary mode. What this means is that instead of building from source code on the Linux machines (as would be the sane thing to do in view of the huge compatibility issues), it was decided to create the binaries on my development box, and then bundle only the required portions as part of an installation script. The main reason for this was the fact that we had a dependency on an external CIM Provider (QLogic), who obviously provided us only with the binaries built on a base Linux machine (specifically, RHEL 5.8).

There were many interesting problems that arose due to library dependencies, OS/ABI incompatibilities, and GCC/GLIBC dependencies. I also learned a lot about the whole process of working with third-party vendors. I plan to cover all of them in a series of upcoming blog posts. For now, however, I would like to post some useful information about I helped the installer team enhance their installation scripts by creating a service and a service watchdog for the OpenPegasus CIMOM bundled with the QLogic provider. For representative purposes, I will use the term “My Service” to refer to the hypothetical service. I will also provide the main logic of the relevant scripts that I wrote for the purpose without violating any NDA restrictions of my workplace! So let’s get right on to it then.

Creating a service in Linux using a shell script

Creating a service in Linux is a pretty simple task. You really just add execution privileges to the shell script, drop it into the /etc/init.d folder, and then invoke a series of commands. The code for the service that installs the OpenPegasus (version 2.11.0 used) CIMOM with the bundled QLogic CIM Provider binaries is listed as follows:

#!/bin/sh
 
# chkconfig: 2345 55 10
# description:My service
# processname:myservice
 
usage() {
        echo "service myservice {start|stop|status|"
        exit 0
}
 
export PEGASUS_ROOT=/opt/pegasus2.11.0
export PEGASUS_HOME=$PEGASUS_ROOT
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:$PEGASUS_ROOT/lib
export PEGASUS_PLATFORM=LINUX_IX86_GNU
export PATH=$PATH:$PEGASUS_ROOT/bin
export PEGASUS_HAS_SSL=yes
 
case $1 in
 
    start) $PEGASUS_ROOT/bin/cimserver
        ;;
    stop) $PEGASUS_ROOT/bin/cimserver -s
        ;;
    status) if [ `pidof $PEGASUS_ROOT/bin/cimserver` ]; then
            echo "Running"
        else
            echo "Not running"
        fi
        ;;
    *) usage
        ;;
esac

Explanation:

We start off with the usual shebang followed by the path to the “sh” executable (#!/bin/sh). The following lines are quite interesting, and worth explaining in a bit more detail. The # chkconfig 2345 90 10 line merely informs the OS that we want this service script to be activated for Linux Run Levels 2,3,4 and 5. Check out “Linux Run Levels” for more information on Run Levels in Linux. The parameter 90 refers to the priority to be assigned for service startup (we usually want this to be a moderately high value) while the last parameter 10 refers to the service stop priority (this can be a moderately low value). The specific values for this parameter will depend on your service’s usage patterns. The #description line is optional, and is used to give a descriptive name to the service. The #processname line is the name that you will use for your service, and is usually the same as your script name.

The rest of the logic is pretty simple: I want to support three options – start, stop, and status. For this purpose, I export the relevant environment variables in this script itself so that it does not pollute any other namespace (you could export them in ~/.profile, or ~/.bash_profile, or ~/.bashrc for instance if you want them to be globally available). Then I merely put the logic to start/stop/query the cimserver executable, which is the executable that actually represents the OpenPegasus CIMOM. The core logic of this service script is the command pidof $PEGASUS_ROOT/bin/cimserver, which returns the PID of the specified executable in the current environment.

To install this script as a service, the following commands are performed:

#cp myservice /etc.init.d
#chmod +x myservice
#chkconfig --add myservice
#chkconfig --level 2345 myservice on

The #chkconfig –add myservice is the command that actually adds your script as a Linux service. For this, the script must be executable (chmod +x might be too permissive, feel free to choose a lower level of execution permission), and must be present in /etc/init.d (or at least a soft-link created to the file in this directory). Then, finally, the #chkconfig –level 2345 myservice on command makes your service automatically start with system boot-up. This ensures that your service is always on so long as your Linux box is up. Neat!

But what happens if the service crashes while the machine is still up? It certainly will not restart itself. For this purpose, I decided to add a service watchdog for “myservice”, as shown in the following section.

Creating a service watchdog in Linux using a shell script

The service watchdog’s responsibility is to monitor the main service (say, every minute or so), check its status, and then restart it if it is not running. This ensures a maximum downtime of a minute (or whatever value you chose) for your service. It is quite a nifty feature indeed. This is similar to the scenario where, in Windows, you would set the service properties to “Automatically Restart”. The code for the watchdog for “myservice” is given below:

#!/bin/sh
 
#chkconfig: 2345 90 10
#description: watchdog for myservice
#processname: myservice-watchdog
 
MYSERVICE_PID=`pidof /opt/pegasus2.11.0/bin/cimserver`
 
check_myservice() {
        if [ -z $MYSERVICE_PID ];then
                service myservice start
        fi
}
 
check_myservice
 
usage() {
    echo "myservice-watchdog {start|stop|status}"
    exit 0
}
 
case $1 in
    start ) if [-z $MYSERVICE_PID ];then
        service myservice start
        else
            echo "myservice is already running"
        fi
        ;;
    stop ) if [ -n $MYSERVICE_PID ];then
        service myservice stop
        else
            echo "myservice is already stopped"
        fi
        ;;
    status) if [ -z $MYSERVICE_PID ];then
            echo "myservice is not running"
        else
            echo "myservice is running"
        fi
        ;;
    *) usage
        ;;
esac

Explanation:

The logic for the watchdog might seem curiously similar to that of the service itself, and that is right. There were a number of reasons why I chose this approach:

The idea is to always monitor the state of the executable itself, and not the service. This ensures that if, for some reason, the service script returns spurious data, the watchdog can avoid spawning multiple instances of the executable, which would most likely fail anyway.
The watchdog is also installed a service. This is not usually required, but in this case it needs to support the following options: start, stop, and status. In addition, the check_myservice function is the one that is used to monitor the service itself (actually the executable).
The watchdog is triggered to be run every minute using crontab. This will only run the check_myservice function, whereas any direct invocation of the watchdog will have to supply any one of the following options: start/stop/status.
The idea is to always handle the executable indirectly via the watchdog (start/stop/status) rather than directly through the service itself, even if that is also possible. This is more of a best practice than a strict requirement.

The watchdog is installed as a service using the following commands:

#cp myservice /etc.init.d
#chmod +x myservice
#chkconfig --add myservice-watchdog
#chkconfig --level 2345 myservice-watchdog on

The explanation for the steps is the same as that for the installation of the main service itself. It is also worth noticing that the watchdog is also installed as a daemon.

Then we need to create a cron job that will trigger the check_myservice function of the watchdog every minute. For this, the best option (since we are triggering the whole process through an installation script) is to create a cron job in a text file, place that file in the /etc/cron.d directory (where user cron jobs can be placed), and the restarting the crond daemon process to make the new cron job visible to the OS, as follows:

#echo "* * * * * /etc/init.d/myservice-watchdog" > my.cron
#echo "" >> my.cron
#cp my.cron /etc/cron.d
#service crond restart

And that’s it! The most important bit to remember here is that the #echo “” >> my.cron line is required because of a bug in the way crontab behaves – it expects a newline or an empty line after the last cron job in the file. If it is missing, crontab will not fail, or throw an error, but silently avoid triggering the job! Trust me, this is mental agony that you definitely do not want to experience. The cronjob itself is pretty simple – simply call the watchdog every minute (read up on the syntax and semantics of cron jobs in Linux if you are confused by that line).

I hope that this serves a useful purpose for anyone that is planning to explore creating services and watchdogs using shell scripts in Linux.

I was looking for a script to monitor a process and restart it if it failed on Linux, so here is a watch dog script that I found on SO, here: http://stackoverflow.com/a/16787862/4028210

The script will watch a process and restart it if it is not running.

The script

#!/bin/sh

PROCESS="$1"
PROCANDARGS="<YOUR PROCESS AND ARGS>"

while :
do
    RESULT=`pgrep ${PROCESS}`
    if [ "${RESULT:-null}" = null ]; then
        echo "${PROCESS} not running, starting "$PROCANDARGS
        $PROCANDARGS &
    else
         echo "running"
    fi
    sleep 10
done

Usage

Let’s say the script is named “wdt.sh”

make sure it is executable:

chmod +x ./wdt.sh

As an example we’ll keep gedit running, run it as below:

./wdt/sh gedit