| checkcron:  Checking for the Unexpected
 
Steven G. Isaacson 
Most system administrators use daemons like cron and
sendmail 
to help keep their systems running smoothly. Daemons
run unobtrusively 
in the background, starting up as needed to do their
work and then 
going back to sleep until more work needs to be done. 
Running in the background is good because for the most
part you don't 
need to see what is happening. But running in the background
also 
means that nothing obvious happens when things go wrong. 
What Could Go Wrong? 
On our main development system we use cron to bundle
up source 
code and then transfer it to various machines on the
network. The 
files are bundled, transferred, then unpacked on the
target system. 
One night errors were reported on the target system.
The next day 
changes were made to the source code to correct the
problem, but that 
night the same errors appeared. This went on for several
days, until 
someone discovered that the new code had not been transferred
to the 
target system. The new code had not been transferred
because cron 
had failed. 
This particular problem could be addressed by having
the target system 
move or remove the file when it was done with it, which
would cause 
a "missing file" error to be generated the
next night. But 
that only addresses one part of a complex system. 
What Else Could Go Wrong? 
Recently NIS failed on our "communication box,"
a computer 
dedicated to handling all of our incoming mail, that
is, mail from 
outside of the company. Without NIS the alias file was
useless and 
two days' worth of mail bounced back. 
What's needed is a general solution, a way to check
on background 
processes that doesn't itself rely upon background processes. 
A General Solution 
First, how do you tell if a background process like
cron is 
still running? Type ps -fu root, pipe the results to
grep, 
and look for cron (on some systems you cannot specify
the user 
and so must look through all processes). 
 
ps -fu root | grep cron 
 
That's easy enough to make into a shell script, and
you 
could echo a warning if grep exits with a bad exit status,
indicating that /etc/cron was not found. The script
could check 
for sendmail, NIS, ypbind -- any background processes
you want to keep tabs on. 
But there are two problems. 
Two Problems 
The first problem is a technical one. You need to make
sure that you 
find what you're looking for ... and not what you're
looking for. 
Let me explain. 
When you type ps and grep for "cron," a new
process, with the word "cron" on its command
line, is started. 
Sometimes that process shows up and sometimes it doesn't,
depending 
upon the load on the system. So if cron was found in
the ps 
output, was it /etc/cron, "grep cron," or
both? 
So why not just look for /etc/cron? 
Checking for /etc/cron doesn't work because as soon
as you 
grep for /etc/cron, /etc/cron shows up as an 
argument on the grep command line.  
Listing 1 illustrates this problem with two examples.
The first example 
usually works, the second one never works. With the
addition of a 
filter, you can make it always work. 
 
ps -fu root | sed '/grep/d' | grep cron 
 
The command sequence (ps | sed | grep) looks as 
if it won't work because the grep-delete occurs before
the 
call to grep. 
But it does work. It works because it is only after
the shell has 
parsed the command line that the three processes are
started (almost 
simultaneously). Before you can attach pipes, there
must be programs 
to attach them to. 
So, if the "grep cron" line appears in the
ps output, 
the sed command deletes it. If "grep cron"
doesn't 
appear, it's not deleted. Either way, you get the information
you 
need. 
The Real Problem 
The second problem is the real problem. 
How do you automatically check background processes
to see if they 
are still running? That is, how can you make it so the
checkcron 
script is run every so often without your having to
remember to do 
it? (Don't say cron!) 
I got around this automated process problem by using
something manual, 
my .profile. I simply added a call to checkcron. Now
whenever I log in I know within a few seconds if there
is a problem. 
Installation 
checkcron is in Listing 2. 
Installation is trivial. Customize the program for your
system (by 
editing the line with the list of daemons), and then
add one line 
to $HOME/.profile: 
 
checkcron & 
 
Every time you log in, you'll see a background pid 
number echoed to your screen and then whatever you normally
see when 
you log in. 
For each daemon that cannot be found, checkcron echos
an error 
message to your screen. If all daemons are accounted
for, it does 
nothing. Simple. 
checkcron may also be run from the command line if you
have 
been logged in for a while and simply want to double-check
your daemons.  
 
 About the Author
 
Steven G. Isaacson has been writing C and Informix
4GL applications 
since 1985. He is currently developing automated testing
tools for 
FourGen Software, the leading developer of accounting
software and 
CASE Tools for the UNIX market. He may be reached via
email at 
uunet!4gen!steve1 or steve1%4gen@uunet.uu.net.  
 
 
 |