|  Viagra: 
              Keeping Services Running on BSD
 James O'Gorman 
              Services must stay up. To make this happen, the system must be 
              set up and maintained correctly. Otherwise, daemons will die on 
              you. When that happens, your users will not be happy because they 
              will not get the expected services. So the question is: "How 
              do I keep my services up?" You may believe that if you build 
              the server correctly and run bug-free code, then bad things will 
              not happen. 
              However, bad things do happen. You can spend a lot of money on 
              good hardware that may still fail. For instance, what if there is 
              a previously unknown bug in your smtp server, then some strange 
              client connects to it, sends a date string that is too long, and 
              the smtpd dies? What if you change to Apache and instead 
              of sending it a -HUP, you kill it without realizing you've 
              done so. No matter how long you have been working with UNIX, you 
              are still human and mistakes happen. Even the best administrator 
              will have daemons die once in a while. 
              There are two common ways to deal with these problems. One way 
              is that someone lets you know when a service dies, then you fix 
              the problem and hope that it does not happen again. But what if 
              you are unavailable when the service dies? 
              The other thing you can try is to make the service restart itself 
              and then notify you. Your users then will not experience downtime, 
              you know what happened, and you can deal with the problem. 
              A while ago, I was looking for a way to keep services up and running 
              on my servers. I wanted a tool that would be small, easy to administer, 
              and give me good troubleshooting tools to track down the source 
              of any problem that may arise. 
              I checked different Web sites and found a few programs that looked 
              promising -- the best of which was Daemon Tools (http://cr.yp.to/daemontools.html). 
              After looking through the documentation, I decided that I did not 
              need the amount and type of features that Daemon Tools provided. 
              Furthermore, for the number of machines I wanted to use it on, the 
              setup time was going to take me a while. Although it did not fit 
              my needs, many people use it with great success. If you are looking 
              for a nice service-watching utility, compare the script I discuss 
              here with Daemon Tools, then choose the one that best fits your 
              needs. If you find that you need to do advanced things (e.g., easily 
              pause services, get their status, or have log file monitoring), 
              Daemon Tools may be your only option. 
              After I decided against using Daemon Tools, I found a promising 
              script on UGU (http://www.ugu.com). I tried to find the URL 
              to the original script, but was unsuccessful. The original script 
              was a csh script that was for Sys V, and was not very flexible. 
              I changed a bit on it to make it work with BSD and also altered 
              the way services are restarted. I named my changed copy Viagra, 
              because it keeps the services up and at the ready. A few months 
              after that, I bought the book Unix Hints and Hacks by Kirk 
              Waingrow and saw the original version in there. Because I had always 
              meant to make my version available to others, I figured now is as 
              good a time as any (see Listing 1). Let's take a quick look 
              at the script: 
              
             
#!/bin/csh
foreach DAEMON ( inetd apache )
        ps -cax | fgrep "$DAEMON:t" | cut -c27-80 > /dev/null
        if ( $status > 0) then
                echo "Restarting $DAEMON"
                date
                /root/scripts/start/$DAEMON &
        endif
endThe Script  I will walk through the script to give you an idea on how it works 
              and how to use it.
              The first line:
              
             
foreach DAEMON ( inetd apache )
simply defines the variables for the script. In between the parentheses, 
            insert the name of each service you want to watch as it would appear 
            in a ps listing. Only services that you want to monitor should 
            be placed in here:  
             
ps -cax | fgrep "$DAEMON:t" | cut -c27-80 > /dev/null
Next, we do a process listing of the machine, ps -cax, pipe 
            the output of that to a fgrep statement that searches for the 
            service's name, fgrep "$DAEMON:t", then pipe that to a 
            cut statement. The cut statement deletes everything 
            up to column 27, because in the ps listing, column 27 is where 
            the names of the daemons first appear. We are not interested in anything 
            that comes before that. The output from all of this is piped to /dev/null, 
            because we are not really interested in what it returns, just its 
            exit status:  
             
if ( $status > 0) then
    echo "Restarting $DAEMON"
    date
Once we know whether the service is running, we have to act. The if 
            statement will check the exit status of the fgrep command. 
            If the exit status is 0, the condition will not match, and the script 
            will move on. If it does not match, we echo out a statement 
            that tells which daemon is restarting and the date so we know when 
            this happened.  If Apache has died, for example, any output from cron will get 
              emailed to the owner of the cron file. Root will receive an email 
              containing "Restarting Apache" with the date and time, 
              and Apache will be restarted.
              Restarting the Service
              
             
/root/scripts/start/$DAEMON &
Once the previous steps are completed, we must restart the service. 
            We need to execute another script. I have Viagra set to execute a 
            script that is stored in /root/scripts/start/ and is named 
            the name of the daemon that you need to restart. I think this gives 
            us a lot of room in what we want to do next.  For instance, when looking for Apache, I normally start it on 
              boot using apachectl. To keep this simple, we could place 
              a file in /root/scripts/start/ called apache, then 
              place just a couple of lines into the file. We could make those 
              lines just: 
              
             
 #!/bin/sh
  /usr/local/sbin/apachectl start
Then, when that script is executed, apachectl will be started 
            with the start command just like on boot-up of the system.  Let's say you have been having a problem with Apache -- 
              it keeps dying on you and you do not know why. We could use the 
              script that restarts Apache to do a few other things. For instance, 
              do a ps aux to get a snapshot of what is occurring in the 
              process before you restart Apache. Perhaps a w to see who 
              is logged in and what they are doing. You could also play around 
              with vmstat to see what type of memory usage appears at at 
              that time or send an email to your pager to let you know your box 
              is having problems. This could be a great troubleshooting tool for 
              your servers. 
              You could also change the way Viagra runs the scripts. For instance, 
              if there are a lot of scripts you run in /usr/local/etc/rc.d/ 
              on boot, and you want to use those to restart your services, you 
              just change the line /root/scripts/start/$DAEMON to /root/scripts/start/$DAEMON 
              start. Then, make symlinks from your /usr/local/etc/rc.d's 
              scripts to your /root/scripts/start dir. For instance, if 
              you look at Apache and in /usr/local/etc/rc.d there is a 
              file called apache.sh, you could symlink that script to /usr/root/start/apache. 
              Then, if you want to change the way Apache starts up, you only have 
              to make changes in one place. I prefer not to do this because, if 
              a service dies, I like to restart the daemon in a different manner 
              (e.g., to get a process listing mailed to me as well). 
              From there, the script loops back and goes through again for any 
              other daemons you might have defined the steps. Once it runs through 
              them all, the script ends. 
              When to Run? 
              After you have the script running the way you want, and watching 
              the daemons that you want it to watch, you must automate the running 
              of the script. In root's crontab, I have */10 * * * * /path/to/viagra. 
              This sets Viagra to run every ten minutes. Depending on your servers, 
              you may want it to run more or less often. Simply change the /10 
              to /5 if you want to run the script every five minutes, and 
              so on. 
              There is a lot you can do with this script. It may not be perfect, 
              but it has worked great for me. Feel free to look at it, poke it, 
              prod it, and change it. Use it if you like it, expand on it if you 
              like, or ignore it forever. Just make sure you have some way of 
              keeping your services up. 
              Jim O'Gorman lives in Lincoln, Nebraska with his wife, 
              son, and soon-to-be second child. He works for iPlanet E-Commerce 
              Solutions (a Sun-Netscape Alliance).
           |