|  Useful 
              Scripts for Overworked Administrators
 Mark Prager
              I work for a startup company, which means we face the usual problems 
              of financing. Because many automated system tools are very expensive, 
              I have written several scripts to help automate some of my daily 
              tasks that monitor our system. I write these scripts in ksh 
              and csh, and, where necessary, a few small C programs because 
              these seem to be the least complicated. The C programs were compiled 
              with gcc. Similarly, the scripts usually come out self-documenting, 
              which means I can leave them running and return to them several 
              months later and still understand what I was trying to do. Also, 
              the scripts run under Solaris 2.5.1-7.0, and can easily be used 
              on other UNIX operating systems.
              Most of the scripts are run in collaboration with cron 
              so that I get periodic checks, although nothing stops them being 
              run ad hoc. I also assume that the servers have authority to enter 
              each other without being requested for passwords (achieved by using 
              the .rhosts file in the root home directory, or hosts.equiv 
              in the /etc directory). This security arrangement is sufficient 
              for our network, however, .rhosts and hosts.equiv 
              are not considered secure enough for many organizations. You may 
              need to adapt these scripts to the security structure of your own 
              environment. These scripts could probably be changed to be more 
              efficient; however, I wrote them during a severe time crunch. Once 
              they were working (because I am of the opinion that you don't 
              fix something that is not broken), I left them as originally written.
              The first script (Listing 1) is a fairly simple script that I 
              wrote to monitor the disk space on our servers. In the script, the 
              variable "limit" represents the percentage limit after 
              which I want to receive an alert that the disk is getting full. 
              The variable comp_list is the list of servers that I want 
              to check. The next two lines are the initialization of the output 
              file that will be emailed at the end of the operation. The script 
              then runs on each server in the list, gets the percentage use (from 
              the output of df) of all filesystems, and gets the filesystem 
              (mnt) for those percentages. The script then checks each 
              percentage to see whether it is greater than my quota limit. If 
              so, it writes to an output file detailing which filesystem is overloaded, 
              and with what percentage. At the end of the operation, if an output 
              file has been generated, it is mailed to me.
              Example of the output from Listing 1:
              
             
From: Super-User [root@sword.fish]
Sent: Sunday, February 11, 2001 5:00 PM
To: mark.prager@seabridgenetworks.com
 91% on barracuda : /raid308
 92% on seal : /export/raid1
I run this script at hourly intervals; however, it could be run at 
            closer intervals, and the alerting program could be changed (email) 
            to an SMS messenger program or X Window pop-up. Similarly, the script 
            can be slightly modified to provide the usage of all filesystems on 
            all the servers at periodic intervals, writing the output to a file, 
            which could later be operated on to produce a history of the disk 
            space usage on all servers.  At our company, instead of having a UNIX desktop for every user, 
              we have a number of central servers. Using a PC tool like Exceed, 
              every user can log into every server. The trouble with this scenario 
              is that every user likes to think that the server belongs to him 
              alone! To discourage such thinking, I wrote a script (Listing 2) 
              that warns users if they have too many processes running on the 
              server. Although this script does not actually kill any processes, 
              the warnings can be annoying and are good enough to keep the users 
              aware of what they are doing.
              The main part of the script starts on line 14. I first get a list 
              of all the processes on the server and filter out those users that 
              I don't want to be notified about (e.g., root and daemon). 
              The UID part filters out the banner line of the ps command. 
              The list is then sorted according to username. Line 16 is used to 
              initialize various shell variables; the variable last is 
              the username that will be checked. I initially gave it the value 
              qwert, because I know there are no users with that name on 
              our system.
              The mailusermap file looks like this:
              
             
...
markp+mark.prager@seabridgenetworks.com+35
tvguser+tvguser@seabridgenetworks.com+70
ccadm+mark.prager@seabridgenetworks.com+100
...
It is basically a database of all the users on the system, their email 
            addresses, and the number of processes each are allowed to have. As 
            shown by the example above, markp can have up to 35 processes, 
            while tvguser (which might be a common or group account) is 
            allowed up to 70 processes.  The first time around, the loop does nothing because there is 
              no user called qwert. The next time around, we get the process 
              limit of that user (userquota), and the loop then counts how many 
              processes that person has. If the variable last is not the 
              same as variable i, then we have finished counting all the 
              processes for that user (remember the list was sorted on line 15).
              Lines 23-29 check whether the user has overstepped her limit. 
              If so, the function mail_to_user is called (lines 2-13). 
              The lines 34 - 41 are the contents of the loop again, used for the 
              last user on the sorted ps list.
              In the mail_to_user function, Lines 5 and 6 determines 
              the user to be informed of the quota overload, and line 7 is a simple 
              script that is called to print out a beginning of the email to be 
              sent. The executable on line 8, pstree, is a freebie I downloaded 
              from the Internet, and it prints out the processes tree list for 
              a given user. Line 9 finishes off the email, and line 11 emails 
              it to the user.
              I run the following script hourly in conjunction with another 
              script from cron:
              
             
#!/bin/csh
set comp_list = 'stingray medusa sword seal shark salmon tuna octopus dolphin'
touch /tmp/comp$$
rm /tmp/comp$$
foreach comp ( $comp_list )
        set res='rsh $comp /usr/local/scripts/count_proc_ksh'
        echo $res >> /tmp/comp$$
end
cat /tmp/comp$$ | tr '@' '\n'
Notice that the last line translates @ into a new line character. 
            This is because line 4 of the main script prints out the user who 
            has overstepped his limit and which host, terminated by an @. 
            Hence, at the end of the script, we get a report of all the users 
            that have overstepped their limits sent by email (output of cron) 
            to the administrator. Figure 1 shows an example of the letter sent 
            to a user. Figure 2 shows an example of email sent to me.  One problem with having central servers is that, if I want to 
              find a certain process on all the servers, I must look into each 
              server, do a ps on it, and search for the process. To avoid 
              this, I wrote the following small script, which I can run centrally:
              
             
#!/bin/ksh
comp_list="stingray barracuda medusa seal salmon octopus tuna dolphin sword"
for comp in $comp_list
do
rsh $comp "ps -ef | sort | sed 's/^/'$comp' /'"
done
This script runs through the list of all the servers and for each 
            server, runs the ps command, sorts it, and (using sed) 
            adds the name of the server to the output. This script is very useful, 
            especially when looking for a user who is hogging system resources 
            through heavy commands like make and link.  A slight modification of the above script allows me to check the 
              availability status of the important servers at our site. The servers 
              need not be only UNIX, they can be NT and other black boxes such 
              as routers:
              
             
#!/bin/ksh
# ping all servers - when one goes down - let me know.
servers="router1 router2 accelar1 barracuda shark medusa shark sword dolphin 
tuna seal octopus salmon stingray tiger hippo rhino puma fox zebra elephant wolf"
for i in $servers
do
    A='ping $i 10 | grep "no answer"'
    if [[ $A != "" ]] then
        ## Program to notify me of the problem by Xmessage
        DISPLAY=172.30.30.122:0.0
        export DISPLAY
        echo "$i is DOWN" | /usr/local/bin/xmessage -fn charr24 \
          -bg yellow -fg blue -file - -center &
        # SMS Page me too
        cd /users/system/mark/sms
        # Cellcom
        /users/system/mark/sms/page_mp "PING" "server $i is DOWN"
    fi
done
Each server is pinged with a 10-second timeout. If a "no 
            answer" is received, an X-message pop-up window is sent to me 
            saying that the specific server is not answering, and I am also paged. 
            Because the page_mp script is adapted especially for the mobile 
            service in my country, I won't go into those details here. However, 
            the script can be easily modified to send emails via other popup windows 
            using Samba to inform me of a problem. Note that this script only 
            tells of a ping or communication problem; the server could 
            be down due to DoS but actually still running other activities, such 
            as databases.  Conclusion
              It can be easy to write many useful systems administration scripts 
              that will save you time and money on a day-to-day basis. Many of 
              the expensive commercial tools cover the same aspects and provide 
              similar results. All of these scripts were written with standard 
              UNIX commands and are therefore easy to adapt. There are many other 
              free tools on the Internet that can be downloaded and adapted too, 
              such as the performance analyzing scripts written by Adrian Cockroft 
              using his own scripting language (http://www.sun.com/951001/columns/adrian/column2.html). 
              In some cases, a script might not be enough and you might need to 
              migrate to some other scripting language, or in the worst case, 
              write some simple C or other language program to handle the problem.
              Mark Prager is the Senior UNIX Manager at Seabridge and has 
              a 15-year history with the software industry. He is skilled in many 
              aspects of the software industry, including software engineering, 
              computer security, and network planning. He is also a frequent contributer 
              to the CCIUG newsgroup and experienced in the management of Rational's 
              Clearcase and Multisite.
           |