| An older Option in C for find
 
Larry Reznick 
Why a C Program? 
UNIX comes with so many utilities that a lot of work
can be done by 
shell scripts that use those utilities -- and shell
scripts can 
usually be put together faster than C programs. Sometimes,
however, 
when the utility that does exactly what you want simply
does not exist, 
you just have to write a C program. 
A client of mine is putting together a system that will
receive files 
from customers transmitted by uucp. These files will
come in daily 
from all over the country and will contain various transaction
records 
that need further processing locally. At least once
a day, we want 
cron to wake up on my client's machine and move the
files out 
of the uucppublic directory into a directory where the
additional 
processing can be done. However, new files can come
in literally at 
any time, including the time that the cron job wants
to move 
the files away. The file currently being transmitted
by uucp must 
not be moved away while it is still being uploaded,
but all prior 
files will qualify. 
My first thought was to use the find utility, knowing
that 
it has a bunch of interesting options for qualifying
files and then 
emitting the names of those that qualify. None of the
timestamp comparisons 
(such as -atime) works with arguments of minutes, only
days. 
find has a -newer option that allows a specific file's
timestamp to be the base time, so that all files newer
than that time 
will qualify. What we really needed was an -older option,
since 
we didn't want to take the file being uploaded currently,
but did 
want all files prior to that one. Using ! -newer might
do the 
trick if I could touch a file with the appropriate timestamp.
However, find has no -older option that would work with
a specific amount of time in minutes, so I decided to
write one. 
I wanted this program to act as if I had given a find
command 
that had this unsupported syntax: 
 
find dirname -older
num_minutes -print 
 
where dirname was the name of the directory containing
files I cared about, and num_minutes was a number, not
a filename. 
If any file in that directory was older than the specified
minutes, 
the pathname for that file would be printed out. Since
most of the 
customer files would take only a couple of minutes maximum
to transmit 
at 2,400 bps, and a few might take as long as 5 to 10
minutes, files 
older than 15 minutes would qualify. Anything more recent
than that 
would be picked up next time around. 
How older.c Works 
older.c, shown in Listing 1, meets these requirements.
To make 
it more generally useful, the program accepts a number
of minutes 
and a set of directories on the command line. While
this particular 
application would work with a 15-minute timeframe and
only with a 
specific directory, I was sure we would find other uses
that would 
have different requirements. If no time is given, the
program defaults 
to 15 minutes, and if no directories are named, it defaults
to the 
current directory. 
Looking at Listing 1, start with the FEBDAYS() macro.
This 
implements the rule that it is leap year in every year
divisible by 
4 except the century years, which must be divisible
by 400. The mod 
(%) operator gives the remainder after a division, so
if the result 
of a mod operation is zero, the value is evenly divisible
by 
the divisor. C implements short-circuiting for the Boolean
logic operators 
and (&&) and or (||). This means that if the
total truth 
can be determined by the first part of the operation,
the second part 
does not need to be evaluated. If the first part of
an && operation 
is FALSE, the whole thing is FALSE. If the first part
of an || operation is TRUE, the whole thing is TRUE.
The opposite requires that the second part be evaluated
to be certain. 
So, if the year is 2000, which is evenly divisible by
400, this routine 
will set February to 29 days without checking further.
A year that 
is not a century year is recognized as a leap year if
it is evenly 
divisible by 4. Instead of using the % operator for
the division 
by 4, whenever the divisor is a power of 2, the binary
AND 
operator (&) can be used with a value one less than
the divisor (3) 
to get the remainder. The quotient, if needed, can be
delivered by 
using a shift to the right (>> ) instead. The
shift 
and binary operators are faster than the mod and division
operators. 
The variable progname is made global for error handling.
If 
the program reports an error, the program's name will
be part of that 
error message. Since any of the functions might deliver
an error message, 
but only main() can know the program name from the command
line, using a global variable to hold the name eliminates
the need 
to pass it around to all the functions as a parameter.
So, the first 
thing that main() does is grab argv[0], the program's
name, and put it into that variable. No matter how many
other arguments 
will be given on the command line, even the wrong number,
argv[0] 
will be present. 
The next step in main() is to check the command line
argument 
list. The total number of arguments on the command line
is in argc, 
the arguments themselves being in argv[]. For this program,
no arguments need be given, but there are no hyphenated
options. So, 
if a hyphen is the first character of the first option,
the program 
takes it as a request for help and outputs a Usage message. 
If no arguments other than the program's name are given,
argc 
will be 1 and the default time of 15 minutes will be
taken off the 
current system time. Otherwise, the command line has
the number of 
minutes to be taken off. The program will presume that
only an int 
capacity will be needed. (On most UNIX systems, an int
is the 
same size as long, so this can be quite a number of
minutes. 
If a long versus short is strictly required, one of
them should be used, not int. While the int is theoretically
the most efficient type for the system [this is a religious
issue 
and different compiler writers will disagree for the
same system], 
it is also the least portable type since the ANSI C
Standard allows 
it to be the same size as short or long, or somewhere
in between.) If an int is 16 bits, the maximum value
of a signed 
int is 32,767 minutes, which amounts to over 22 days.
If it 
is 32 bits, the maximum signed value is 2,147,483,647
minutes, which 
amounts to almost 4,083 years! Either way is sufficient
for this program's 
needs. 
If a minutes argument is given on the command line,
directory names 
may also be given (directory names may not be given
without a minutes 
argument). If no directories are named, the current
directory is used. 
If directories are named, a loop runs through them one
at a time. 
If an error occurs, the program quits the loop. 
The reduce_time() function takes the passed number of
minutes 
off the current time. While the ANSI C Standard gave
us a lot of flexibility 
in working with calendar and clock times, it did not
provide date 
arithmetic functions. The closest it came to that was
the difftime() 
function, which takes two time_t values and subtracts
them, 
giving the difference as a type double. 
It is extremely important to avoid the trap of assuming
that the time_t 
values are arithmetic types. While this may be the case
for many 
compilers, some might use a structure instead. A specific
calendar 
date and time can be converted into a time_t by the
mktime() 
function, but a specific amount of time cannot be added
or subtracted 
from a time_t value, since there is no guarantee that
the time_t 
is a number of seconds. Moreover, even where compilers
do deliver 
a time_t as a number of seconds elapsed from an epoch,
you 
cannot assume that all will use the same epoch. difftime()
allows you to handle these differences. 
While it would be easy to multiply minutes by 60 and
take the resulting 
seconds off the current time represented as a time_t
value 
to get the starting time for the timestamp comparisons,
to do so would 
risk making the code nonportable. The only truly portable
solution 
is to go through the struct tm data type and muck around
with 
the various parts of the calendar and clock. 
Therefore, I take the current time() and plug that into
the 
localtime() function, which translates the time_t value
into calendar and clock information for the local timezone.
The minutes 
and hours can be adjusted easily, and the days will
be whatever is 
left over if enough minutes were given. I take the total
minutes (0 
to 59) and subtract those from the current time's minutes.
A negative 
result means that the time crossed backward into the
previous hour, 
so I add the hour back into the minutes and subtract
one from the 
hour. I do the same thing with the hours, except this
time a negative 
result means a cross back into the previous day. 
These steps may seem to be a lot of trouble, but if
the current time 
is just a few minutes after midnight, the subtraction
will have to 
deal with a day on the calendar. The real problem is
in the lack of 
standard functions for doing date arithmetic. Maybe
the ANSI committee 
will do something about this the next time around. While
accounting 
requirements of, say, 30-, 60-, and 90-day aging or
more are usually 
met by adding 1, 2, 3, or more to the month number rather
than using 
a strict number of days, other applications might need
to be more 
precise. It would help tremendously if mktime() would
take 
unusual numbers in its struct tm argument. Then, if
it were 
given more seconds, minutes, hours, days, or months
than is reasonable 
-- or even a negative value -- it could convert the
number to 
the correct calendar amount and hand back the adjusted
time_t 
value with leap years, timezones, and so forth accounted
for. 
Once the problem has been reduced to a specific number
of days by 
which the calendar should be adjusted, a loop is needed
to work within 
the days of each month. Leap day fluctuation is accounted
for by adjusting 
February's days (day[1]). If the number of days to be
removed 
from the date is greater than the day of the month,
I reduce the day 
of the month by that number of days; since this brings
the calendar 
date back to the previous month, I reduce the month
number. If reducing 
the month number requires it, I reduce the year also
and recalculate 
February's days. Regardless, I take the number of days
in this new 
month and repeat the operation until the number of days
to be taken 
off becomes less than the value of the day of the month.
At this point, 
I take that number of days off, and the correct date
of the adjusted 
month (in the adjusted year if needed) is delivered.
By building this 
directly into the struct tm, the result can be passed
directly 
to mktime(), which returns the resulting time_t value
from 
the reduce_time() function. 
Two interesting side-effects result from this. First,
since subtracting 
a negative number is equivalent to adding, a negative
number of minutes 
will add minutes to the current time. While not useful
for this particular 
application, since it works with file timestamps, this
capability 
could be handy for other programs using this function.
Second, since 
mktime() takes the timezone and Daylight Savings Time
into 
consideration, the result will be plus or minus an hour
depending 
on whether the new time has crossed over one of the
DST boundary dates. 
This program is calculating an absolute time in minutes
without regard 
to adjustment of clocks made at DST boundary dates,
so the hour lost 
or subsequently recovered will show up in the difftime()
between 
the starting time and the new reduced time. Nevertheless,
the result 
is a correct absolute number of minutes prior to the
current time. 
One remaining issue about the reduce_time() function
would 
be to eliminate its association with the current time.
Instead of 
calculating the now variable from the time() function,
you could pass it into reduce_time() as a parameter,
also named 
now. With that, a specific number of minutes can be
removed 
from (or added to by using a negative number of minutes)
any time. 
Finding the Target Files 
The show_files() function takes a directory name and
a starting 
time. It comes up with every filename in the specified
directory and 
checks each file's timestamp against the starting time.
Reading the 
filenames from a directory is no more complicated than
reading records 
from a sequential data file. The directory is opened
with the opendir() 
function, the filenames are delivered with the readdir()
function 
in a structure, and the directory is closed with the
closedir() 
function. 
The function takes the given directory name, opens the
directory, 
and copies the name into the pathname[] variable. A
trailing 
slash is concatenated to it and the null terminator
is replaced to 
make it a regular string again. Notice that strlen()
is used 
to figure the subscript of the terminator. That information
gets translated 
into a direct placing of the / character on top of the
terminator 
without having to use strcat(), which would make yet
another 
pass through the string to find that terminator. (The
dirlen variable 
needs to be increased to represent the adding of that
/ character.) 
Since the length of the pathname part containing the
directory's name 
is known, every filename within that directory can be
appended to 
the same path, once the name is discovered. All you
have to do is 
keep track of where the pathname part ends -- and that
is what 
dirlen is for. 
pathname[] variable is set to 256 characters, allowing
the 
full pathname to be no more than 255 characters. Since
BSD and SVR4 
allow 255-character filenames, the path added to that
would exceed 
this buffer, so this is not a particularly safe strategy.
Still, this 
method should work with most pathnames, and serves to
keep the example 
simple. A more robust solution would allocate the buffer
from the 
heap and allow it to grow on demand. You might want
to challenge yourself 
to rewrite it that way. 
The readdir() loop checks the file's name to see if
the first 
character is a dot (.). The readdir() function delivers
every 
name, including the directory names . and .., and the
hidden files beginning with a dot. The program should
ignore such 
files, so, if the name does not begin with a period,
the program appends 
it to the pathname[] variable and passes the result
to the 
stat() function. This handy function reads the inode
information 
for that file, delivering all sorts of useful facts
about the file, 
including the time of the last modification (st_mtime).
That 
time is a time_t type, so it can be plugged directly
into the 
difftime() function. 
The difftime() function delivers the difference between
two 
time_t values in seconds, represented as a double type.
Treating time as an increasing value, regardless of
the form of that 
value, difftime() subtracts the second argument from
the first. 
If the result is greater than zero, the file's modification
timestamp 
must be older than the start_time, and so the file's
name is 
printed. 
Conclusion 
The older program emits full pathnames as the -print
option might do in the find program. Since we have started
using it in shell scripts, we have found additional
uses for it. I 
hope you'll find it equally handy.  
 
 About the Author
 
Larry Reznick has been programming professionally since
1978. 
He is currently working on systems programming in UNIX
and DOS. He 
teaches C language courses at American River College
in Sacramento 
and is the owner of Rezolution Technical Books. He can
be reached 
via email at: rezbook!reznick@csusac.ecs.csus.edu. 
 
 
 |