| A Community-Style Overnight Job Spooler
 
Leor Zolman 
For a small business running a single multi-user UNIX
system, processes 
typically fall into one of two categories: real-time,
interactive 
programs or batch style/background jobs. Interactive
programs such 
as the system shell, editors/word processors, spreadsheets,
and data 
entry systems all vie concurrently for slices of the
CPU pie. Such 
programs spend most of their real time blocked waiting
for user input, 
so they tend not to have much impact on system performance
(as long 
as there is enough main memory to keep the jobs from
getting swapped 
out to disk). 
Batch-style jobs such as reports, backup scripts, or
any CPU- or disk-intensive 
processes, on the other hand, have a relatively large
impact on system 
performance. Such jobs demand as much of the available
CPU resources 
as they can possibly get. It doesn't take many such
CPU- or disk-intensive 
background jobs running simultaneously to slow user
terminal response 
time down to a crawl. 
In many cases, those "expensive" batch-style
jobs would be 
less of a pain-in-the-CPU if they could be scheduled
so as not to 
compete head-to-head with the interactive processes
for system resources. 
In a business environment, the natural solution would
be to run those 
jobs overnight whenever feasible, and reserve the business
hours for 
interactive processes and high-priority batch jobs only. 
Users should routinely be given the option of running
batch-style 
jobs overnight. For jobs that must run immediately,
background execution 
should also be an option. However, with a little bit
of encouragement 
(and after enough instances of the molasses-syndrome
due to an overloaded 
system), users will understand that overnight queueing
works out best 
for everyone.  
Pros and Crons 
The basic UNIX System V configuration includes only
a rudimentary 
set of job scheduling tools. The primary facilities,
cron and 
at, allow the scheduling of jobs for execution at particular
dates and times, but have no provision for prioritizing
or sequencing 
those jobs in order to maximize system performance.
Three users might, 
unbeknownst to each other, all schedulejobs for execution
at 8:30 
P.M. cron will dutifully start them all up at 8:30 P.M.,
resulting in some serious context-switching overhead
while the jobs 
vie for system resources. 
Now consider the case of automated daily backups. You
could just set 
up the cron table to run the backup software every morning
at 5:00 A.M., but what happens if those three long batch
jobs 
are still slugging it out at 5:00 A.M., and changing
critical 
data in the process? cron doesn't care, it just runs
the backup; 
if the backup utility cannot properly coordinate file-locking
issues 
in the course of a backup, the result may be lost data. 
A better solution would be for all jobs scheduled for
overnight processing 
to be registered with a single overseeing system, and
for that system 
to be responsible for running the jobs in an orderly,
non-interfering 
manner. The simplest way to implement this "ordering"
is to 
ensure that all jobs are scheduled sequentially, such
that each job 
is run to completion with as little competition from
other jobs as 
possible -- especially other resource-intensive jobs. 
With the addition of a prioritizing scheme, critical
job-sequencing 
issues can also be properly managed. Then, for example,
the daily 
backup script can be configured at the lowest possible
priority, so 
that it runs only after all other jobs have been completed. 
In this article, I describe a set of Bourne Shell scripts
that 
work together to provide a sequential overnight job-spooling
facility. 
The package is geared towards a "community-style"
computing 
environment -- that is, an environment that allows any
user to 
invoke a particular overnight job and that prints out
or places the 
output resulting from the job in a public destination
area on-line, 
so that any other user may choose to view or print out
the results, 
as required by the specific application. 
Any stdout/stderr output not explicitly directed into
an output 
file by an overnight job will be captured into a default
location, 
generally accessible only by a system administrator.
This feature 
may be used as a simple status- and error-logging mechanism. 
Directory Structure 
The onitesetup.sh script (Listing 1) may be used to
set up 
the directory structure and appropriate permission settings
for the 
basic onite system. I've chosen a master directory location
of /usr/spool/onite for the example implementation;
another 
location may be more appropriate for your site. In those
scripts where 
applicable, the SPOOLDIR configuration variable identifies
the master onite directory.  
Several subdirectories exist immediately beneath the
master directory. 
The subdirectory jobs itself contains another tier of
subdirectories 
corresponding to the various job priority levels. The
system may be 
configured for any number of priority levels; when there
are n 
levels of priority, the subdirectories are named P1
through 
Pn.  
In scripts where applicable, the NPRIORITIES variable
defines 
the number of priority levels implemented.  
The subdirectory stdout receives the intermixed, non-directed
("bit bucket") output of both the stdout and
stderr 
streams for the last NTOLEAVE jobs that have been run
through 
the spooler. The value of NTOLEAVE is configured in
the master 
driver script, onitego.sh.  
The subdirectory jobsdone receives the "used"
job scripts 
for the last NTOLEAVE completed jobs. The contents of
this 
directory, along with the contents of stdout, as previously
noted, exist primarily to support post-mortem analysis
by the system 
administrator.  
The onitego.sh script emits a log of all overnight spooler
activity on its standard output and error streams. I've
arbitrarily 
configured the log file to record this output in /usr/spool/onite/onite.log.
The log file is created with the proper permissions
by the installation 
script, setuponite.sh, but no other scripts explicitly
write 
to the log file. With the following line in the "root"
cron 
table, 
 
0 20,23 * * *
/usr/local/onitego.sh
>>/usr/spool/onite/onite.log 2&1 
 
the output of the master driver script is appended onto
the end of the log file every time the master driver
script executes. 
A brief description of each individual script and auxiliary
tool in 
the onite package follows. 
The Configuration Script 
onitesetup.sh (Listing 1) initializes the directory
structure 
for your custom implementation of the onite system.
Configure 
lines 15-18 for your system; line 14, the debug flag,
may be used 
to create a "dummy" hierarchy in the current
directory for 
testing purposes. To test the onite system using this
dummy 
directory, copy all the scripts into your testing directory
and change 
the initialization of debug to Y in all scripts where
debug appears. This is especially useful once the system
has 
been officially installed and you wish to test some
new modifications 
without corrupting the currently active code and job
queue directories. 
The Master Driver Script 
onitego.sh (Listing 2) invoked from the cron table,
as shown above, "wakes up" to execute all
spooled overnight 
job scripts in sequence. It scans all the $SPOOLDIR/jobs/P*
directories in order, beginning with P1, looking for
job files 
and submits each job file encountered to the shell for
processing. 
The standard output and standard error from each job
is written to 
a file in the $SPOOLDIR/stdout directory with the same
name 
as the job file. All program output from the job script
should take 
the form of explicit output files or physical output.
Any output emitted 
through the stdout and stderr streams should be considered
for the system administrator's eyes only. 
After the job has finished executing, the job file itself
is moved 
to the $SPOOLDIR/jobsdone directory. 
The standard output of the onitego.sh script provides
a running 
log of job activity. If no jobs at all were queued for
overnight processing, 
then a message to that effect is emitted. Otherwise,
the script 
creates a lock file that exists for the duration of
all job processing, 
and, for each job, writes a message announcing the name
of that job 
and the time it begins its run. 
When all jobs have been processed, the fleave.sh utility
script 
is called to delete all files in the jobsdone and stdout
directories except for those corresponding to the most
recent $NTOLEAVE 
jobs. This keeps those directories from filling up with
too much junk. 
There are some basic limitations to the design of the
onite 
system. The primary hazard is the case where a user
is permitted to 
queue a job after the driver script has already begun
execution for 
the evening. If the job is queued at a priority level
equal to or 
greater than the priority level currently being processed,
then the 
job may not be run until the next night. I've partially
addressed 
this issue by scheduling the driver script for two runs
per night, 
so that a job missed during the "first round"
is picked up 
for execution in the "second round." This
approach, however, 
assumes that all jobs from the first round are completed
before the 
scheduled time for the second round comes up; if the
earlier instance 
of the driver script is still running when the later
instance "wakes 
up," the later instance will see the lock file,
immediately abort, 
and go back to sleep. Also, a high-priority job that
ends up running 
in the second round will effectively have been bumped
down to the 
lowest possible priority, since all jobs from the first
round will 
by then have already completed. In other words, if the
priorities 
are really critical, then don't schedule the master
driver script 
for more than one run per night. 
The best way to prevent these kinds of conflicts is
to make sure no 
jobs are queued past the time when the first instance
of onitego.sh 
wakes up (see the discussion of spoolonite.sh below
for some 
built-in protective measures). 
"Run Driver NOW" Script 
From time to time, you might discover that onitego.sh
has not 
executed as normally scheduled. For instance, someone
may have inadvertently 
broken the root cron table entry while doing administrative
maintenance, or perhaps the system had experienced a
crash before 
spooler startup time and hadn't been brought back up
until after the 
startup time, so cron never had a chance to start the
process. 
onitenow.sh (Listing 3) is designed for one-shot invocation
by the system administrator in just such an event. The
script simply 
starts up the master driver immediately as a background
task immune 
to hang-up, and sends the output into the appropriate
log file. 
The Job Queuing Script 
The last of the major scripts in this package, spoolonite.sh
(Listing 4), schedules an overnight job for execution.
spoolonite.sh 
is typically run from within a shell script, accepting
the text 
of the job to be spooled on its standard input stream.
There is only 
one mandatory command line parameter, the job name,
and one optional 
parameter, the job priority level. If no priority level
is specified, 
then the job is assigned a priority of $DEFAULT_PRIORITY
as 
defined in the script. 
The two variables USE_CUTOFF and CUTOFF_TIME may be
configured to reject job submissions past a particular
time of day. 
If USE_CUTOFF is Y, then any attempt to queue a job
after the clock time specified by CUTOFF_TIME will be
rejected 
(lines 40-47). 
The variable CHECK_LOCK may be configured to reject
job submissions 
once the nightly queue has begun executing; this, in
conjunction with 
the USE_CUTOFF mechanism, effectively eliminates the
possibility 
of "orphaned" jobs in the queue after the
master driver script 
has completed its run (lines 49-57). 
Since the contents of the stdout and jobsone directories
are not broken down by priority level, only one instance
of any specific 
job name is allowed per night (lines 55-65). It is left
up to the 
system administrator, using the tools provided in this
package (such 
as oname.sh), to construct unique names for all job
scripts. 
Environmental Issues 
Since the master driver script is invoked from root's
cron 
table, all jobs are actually run under the root's user-ID
and environment, 
not under the user-ID and environment of the invoking
user. Thus, 
spoolonite.sh must see to it that the original user's
environment 
is replicated as faithfully as possible at the time
his/her overnight 
job script is run. 
Line 79 begins to construct the job file by dumping
the entire contents 
of the user's environment settings into it. Line 78
prevents a nasty 
problem in the case where the user's PS1 (primary prompt
string) variable 
was exported and happens to contain a multiline string.
If PS1 were 
not redefined in this case to isolate the embedded newline
within 
a set of quote marks, then the shell would become confused
by the 
multiline string when the time came to interpret the
job script. If 
there are any other variables in your user's environments
that could 
conceivably be set to multiline string values and then
exported, those 
variables must be redefined in a similar manner before
line 79 executes. 
If any programs invoked from a user's job script need
access to any 
variables in the user's environment, then those environment
variables 
must be exported by the job script. The design of this
package assumes 
that "unsophisticated" users will not be creating
their own 
custom environment variables and spooling jobs for overnight
execution 
that depend on those variables. Sophisticated users
can include the 
commands to define and export such variables, if necessary,
on their 
own when preparing their scripts. 
When the list of common critical environment variables
is known, however, 
then that list may be specified as the value of toexport
(line 
29). For our installation, this list includes the PATH,
two variables 
relating to database configuration, and two that affect
printer output 
routing. I know these variables are defined in every
user's startup 
profile, because I maintain those profiles. 
In line 83, spoolonite.sh generates a cd statement that
sets the current directory for job execution to the
user's actual 
current directory. Finally, the explicit job script
text is copied 
from the standard input onto the end of the job file. 
Displaying the List 
showonite.sh (Listing 5) summarizes all jobs queued
for overnight 
processing, showing the job name, name of the invoking
user, and priority 
level. The contents of each priority directory are displayed
by piping 
the output of the l command to awk for formatting. 
Cancelling a Job 
A user may change his/her mind about an overnight job,
and need to 
cancel it. killonite.sh (Listing 6) performs that duty.
It 
may be configured to restrict users to killing only
their own jobs, 
or to allow users to kill anyone's queued jobs, depending
upon the 
value of the OwnOnly variable (line 9). 
This script uses the utility script lpick.sh, described
below, 
to let the user pick a job "by number". 
Looking for a Particular Job 
It may not make any sense for certain kinds of jobs
-- for example, 
a process that checks a mailing list for illegal addresses
before 
a monthly mailing -- to be run more than once per night.
If someone 
requests such a job for the second time in a single
day, it can only 
be because they didn't realize someone else had already
scheduled 
it. isonite.sh (Listing 7) helps the system administrator
detect 
such duplications. Given a job name as the command-line
parameter, 
it returns a true status if a job by that name has already
been scheduled. 
Generating a Unique Name 
When it makes sense for a certain type of job to be
scheduled for 
multiple runs in one evening, each instance of that
job must still 
be given a unique job name. The oname.sh script (Listing 8) 
is a simple inline tool for generation of unique file
names; it uses 
the tmpname.c program described below to generate a
file name 
in the system /tmp directory, then chops off the /tmp/
prefix to return just the base file name on the standard
output. 
For example, to generate a unique job name for an instance
of a report 
identified as ren, I might use: 
 
jobname=`oname.sh ren` 
 
General Utility Programs and Scripts 
All the scripts described above were written specifically
for the 
Overnight Spooler system. The short scripts and C programs
described 
in this section are general-purpose tools used by many
of our shell 
scripts, including the onite system. 
checknum.c (Listing 9) 
This C program examines its first command-line parameter,
converts 
the leading portion of it into a number value, and returns
that ASCII 
number alone on the standard output. If the parameter
contains no 
leading numeric component, the string ERROR is returned
instead 
and the script terminates with an error status of 1.
checknum 
is used by spoolonite.sh and onitego.sh. 
tmpname.c (Listing 10) 
tmpname.c simply extends the functionality of the tempnam()
C library function to create a tool available for use
directly in 
a shell script. For example, the following command creates
a unique 
file name in /tmp that begins with the characters "abc": 
 
filename=`tmpname abc` 
 
pick.sh (Listing 11) 
Given a text file containing a list of items to select
from and a 
generic description of the flavor of item being chosen,
this script 
describes, sequentially numbers, and displays the list,
then waits 
for the user to select one of the items according to
the displayed 
sequence numbers. The user may either enter a sequence
number to make 
a selection, or press the return key alone to indicate
"none." 
If the user makes a selection, lpick.sh returns the
text of 
the selected item on the standard output; else, the
text ABORT 
is returned. killonite.sh uses lpick.sh for prompting
the user to select a job to cancel. 
fleave.sh (Listing 12) 
onitego.sh calls this utility script to clean out old
files 
in the jobsdone and stdout subdirectories. 
ask.sh (Listing 13) 
This little script prompts the user with a given text
string, insists 
upon a y/n response, and returns Y or N accordingly
on the standard output. 
A Report Queuing Example 
Listing 14 shows an example script that spools a user-requested
report 
program as an overnight job. This script, invoked from
a menu system 
in our case, prompts the user for a publication code
(using the getmag 
shell tool) and proceeds to set up a job that runs a
set of mailing 
address consistency checks for the specified publication.
Some other 
internal shell tools, such as magname and nissue, appear
in the script, but their use is related to the specific
application 
and not to the spooler system in general. 
The job text is first written to a temporary file, then
the temporary 
file is fed to spoolonite.sh in line 49. After return
from 
spoolonite.sh, the temporary file is deleted. 
A Periodic Job Spooling Example 
Earlier I mentioned the problem of backup scheduling
conflicts. By 
spooling the backup routine as the lowest-priority overnight
job, 
all potential concurrency issues can be avoided, and
it is guaranteed 
that the backup program doesn't run until after all
other processes 
have completed their tasks. 
Say you have a backup driver script named dump.sh that
performs 
the physical backup operations, and you're currently
calling it directly 
from the cron table at some fixed hour of the night.
To convert 
this task into a spooled overnight job, create a special
driver to 
spool the dump.sh script as an overnight job. Such a
driver, 
named spooldumps.sh, is shown in Listing 15. 
Then, in your cron table, simply change the line that
used 
to call dump.sh to call spooldumps.sh instead, some
time before the nightly onitego.sh run is scheduled
to begin. 
For example, here is the root cron table entry from
our system: 
 
30 18 * * 1-5 /usr/local/spooldumps.sh 
 
This causes the spooldumps.sh script to execute 
every evening at 6:30 P.M. (our onitego.sh is scheduled
to start up at 8:00 P.M.). spooldumps.sh schedules the
dump.sh process (which resides in the /u3/Backup directory)
at priority 7, the lowest priority. Thus, the dump.sh
script is the last program to execute every night.  
 
 About the Author
 
Leor Zolman wrote BDS C, the first C compiler targeted
exclusively 
for personal computers. He is currently a system administrator
and 
software developer for R&D Publications, Inc., and
columnist for both 
The C Users Journal and Windows/DOS Developer's Journal.
Leor's first book, Illustrated C, has just been published
by 
R&D. He may be reached in care of R&D Publications,
Inc., or via net 
E-mail as leor@rdpub.com ("...!uunet!bdsoft!rdpub!leor"). 
 
 
 |