INTRODUCTION
============

One of the many challenges facing UNIX systems administrators in large,
heterogeneous environments is the management of numerous, dissimilar hosts.
In order to distribute software, make configuration changes or perform
routine customer support, it is often times necessary for machines to be
broken down into categories, based on their architecture and OS level, or
referenced by subnet. And most useful to the administrator is the ability
to execute arbitrary commands on these groups of hosts, or execute a series
of commands on a single machine with the target hostnames as arguments.

Virtually every systems administration group in a large environment
maintains a script or program for accessing hosts en masse. This article
describes the masshosts tool, a highly configurable Perl script that fills
this role, with an emphasis on performance, efficiency and flexibility.


ABOUT MASSHOSTS
===============

Masshosts grew from a shell script that was originally written in our
computing environment for the purpose of executing a single command on a
large number of machines. This shell script, named allhosts, took as its
arguments a list of keywords, where each keyword would correspond to a list
of hosts that fell into a common category (for example, the keyword "ibmws"
would correspond to a list of IBM RS/6000 workstations). When allhosts was
run, it would either print out the host list that corresponded to your
keywords or, via a command-line option, optionally rsh to each host in that
list and execute a desired command.

The primary problem with allhosts, however, was its speed: each rsh command
would execute in series, requiring the first rsh to complete before the
second could begin.. Given the length of time needed to perform each remote
execution, using allhosts to execute the desired commands on a large number
of hosts would be painstakingly slow. And if any of the hosts in the list
happened to be unresponsive, the entire execution string would be held up,
waiting for that rsh to time out.

Consider, for example, a list of keywords that would correspond to 200
hosts. If we assume each rsh and command would take roughly three seconds
to complete, then using allhosts to execute that command on all 200
machines would take at least 10 minutes. And, if even 2% of those hosts
happen to be down, off the net or otherwise unavailable, that would be an
additional four minutes of execution time due to rsh timeouts. If we extend
this list of targets to include 1000 machines, not too unusual in our
environment, then we would be looking at a minimum execution time of just
under an hour, plus potentially 10 minutes in rsh timeouts.

Reality, of course, was much worse. It was not uncommon for an allhosts run
against 400 machines to take well over an hour, and runs against larger
lists of hosts had to be "split up" and run in parallel. It was out of this
parallel run technique that masshosts was born: in order to cut down
execution times, a new script was developed in Perl that would actively
manage its children processes, and provide a means for executing these
processes in parallel instead of in series. The end result of this new
approach was a performance boost that was limited only by the CPU of the
local machine; and because rsh is cheap in terms of CPU consumption, it was
not uncommon to run up to 30 or more processes at a single time, giving
over a 30x increase in performance. Command runs that once took an hour or
more would complete in minutes, and smaller runs would finish in a few
seconds. This performance boost was invaluable to our administrative
processes.

Since its first incarnation, masshosts has undergone many modifications,
all designed to increase its usability and flexibility without sacrificing
its performance. The masshosts you see here was written in Perl, and makes
use of Perl5 techniques. It was developed and tested on 5.004, but it
should run on earlier versions with few or no modifications (though note
that the -c option requires Graham Barr's excellent IO::Socket library).

Overview
--------

Masshosts can work with three different types of input: a list of hosts, a
list of filters or a list of keywords. The distinction and relationship
between keywords and filters is very important, so we'll discuss this in
detail before moving on.

The default behavior of masshosts is to take a list of keywords, turn those
into filters, and use the filters for matching hosts from a predefined list
or database. The keyword, then, functions as an easy-to-remember "tag" that
corresponds to a potentially complex filter; but, it is the filter that is
actually used in matching a hostname. Keywords are mapped to filters via
the filter configuration file, which is explained in more detail, below.

The actual host lookup and matching is performed by a custom subroutine
named "getHosts", which you (the administrator) provide. You can make the
"getHosts" routine as simple as pattern matching hostnames from /etc/hosts
(or the hosts.byname map in NIS), or as complex as looking up hosts in some
centralized database or flat text file (based on machine attributes). The
"getHosts" subroutine API, and two samples, are also presented later on.

The command syntax for masshosts is:

     masshosts [ -f | -F | -l | -L | -K ] [ -ivV ] [ -x | -X file ] [
     -n net_expr ] [ [ -crz ] [ -p N ] [ -t time_limit ] [ -o prefix ]
     -e "cmd <arguments>" ] arg1 arg2 ...

By default, arguments are a list of keywords. These keywords are looked up
in the filters file and used to fetch a list of filters. These filters are
then used to return a list of hostnames. If multiple arguments are
specified, they are OR'd together.

Options
-------

 -c           Check the connectivity to the remote host before
              attempting an rsh. Only meaningful when combined
              with the -r switch.

 -e cmd       Execute the command(s) specified by cmd for each
              hostname in the match list. The command will be
              executed on the local machine unless the -r option
              is specified. If the command string or argument
              list contains the literal pattern %HOST%, it will
              be replaced with the current hostname before being
              executed.

 -f           Arguments are an explicit list of filters to use
              for matching hosts. Filters will be OR'd together.

 -h           Print usage.

 -i           Prepend all command output with the string
              "hostname:", for each matching hostname. Only
              useful when combined with the -r switch. Ignored if
              -o is specified. This option is handy when you are
              letting the output from masshosts go to
              stdout/stderr, and you want to see which host said
              what.

 -l           Arguments are an explicit list of hostnames. Useful
              when you already know the list of machines, and
              just want to run commands on/against them quickly.

 -n net_expr  Only hosts whose IP address matches net_expr will
              be returned. This is interpreted literally as a
              regular expression, so be careful.

 -o prefix    Send the standard output and standard error from
              commands run to the files prefix.hostname.out and
              prefix.hostname.err for each hostname that is
              matched. Only meaningful when used with the -e
              switch.

 -p N         Run commands in parallel, keeping N jobs active
              simultaneously. Only meaningful when used with the
              -e switch.

 -r           Rsh(1) to each matching host and execute commands
              on the remote machine. Only meaningful when used
              with the -e switch.

 -t time_limit  Time limit, in seconds, for command execution when
                making parallel runs. Only meaningful when combined
                with the -p switch.

 -v           Be mildly verbose: display the list of outstanding
              processes after all processes have been spawned.
              Very useful if the -r switch is specified.
              Currently only meaningful when combined with the -p
              switch.

 -x           Exclude any hosts listed in the default exclusion
              file.

 -z           Delete any output files that are zero-length (i.e.,
              empty). Only meaningful when used with the -o
              switch.

 -F           Arguments are files that contain an explicit list
              of filters to use for matching hosts. Using - as an
              argument specifies standard input. Filters will be
              OR'd together.

 -K           Arguments are files that contain a list of
              keywords. These keywords will be looked up in the
              filters file and used to fetch a list of filters.
              These filters will then be used to match hostnames.

 -L           Arguments are files that contain an explicit list
              of hosts. Like -l, this is useful for those times
              when you already have a list of machines, and you
              need to run commands on/against them quickly. Using
              - as an argument specifies standard input.

 -V           Be very verbose: show when child processes are
              spawned, as well as when they are collected.
              Currently only meaningful when combined with the -p
              switch.

 -X file      Exclude any hosts listed in file.



Examples
--------

1. masshosts sparc

     Prints a list of all machines correspondiong to the keyword
     "sparc".

2. masshosts -r -e date sparc

     Rsh's to each machine corresponding to the keyword "sparc" and
     runs the 'date' command.

3. masshosts -Licr -p 25 -e 'last -10' /tmp/machines

     Rsh's to each host listed in /tmp/machines and executes the
     command 'last -10'. Runs 25 rsh's in parallel, and prepends the
     output to stdout/stderr with the target machine's hostname.

The config file
---------------

Masshosts uses a configuration file for customizing its default behavior.
The location of the configuration file is hardcoded into masshosts itself
as a Perl "require" directive. This is the only line of the masshosts
source that you should have to change (see Program Listing 1).

The configuration variables are as follows:

 Variable           Description

 $CONNECT_TIMEOUT   Used by the -c option. Specifies how long, in
                    seconds, we should wait for a successful
                    connection to a machine before assuming it is
                    down. Set this to something short: if a host
                    doesn't respond in 10 seconds or so, chances
                    are it's not going to, or it has other
                    problems that you may want to look at
                    personally.

 $EXCLUDE_FILE      The location of the file containing a list of
                    hosts to exclude from masshosts runs. Host
                    exclusion is explained in detail, below.

 $FILTER_FILE       The file that maps keywords to filters. This
                    is explained in detail, below.

 $GETHOSTS_PL       The location of the Perl library file
                    containing your "getHosts" subroutine. It is
                    included in masshosts via a "require"
                    directive.

 $RSH_CMD           The command name to use when performing an
                    rsh to a remote host. Useful for specifying,
                    for example, ssh instead of rsh.



The filter configuration file
-----------------------------

The filter configuration file defines the associations between keywords and
filters. When a keyword is specified on the command line, masshosts
consults the filter file for that keyword, and returns each matching
filter. The format of filter file is relatively simple, consisting of two
"fields" that are white-space separated. Blank lines are ignored, and #
signs denote comments (in-line comments are allowed). The first field
contains the filter, and the second field contains a pipe-separated list of
keywords that will match that filter. This keyword list is actually treated
as a regular expression, and every keyword expression in the filters file
is matched against each keyword specified on the masshosts command line.
Keywords match on whole-words only.

For example, if the filter configuration file contained the following
lines:

 sun\d+             sparc|sunos       # Sun sparcs running SunOS

 sunsol\d+          sparc|solaris     # Sun sparcs running Solaris

 x86sol\d+          x86|solarisx86    # Intel PC's running Solaris x86

 x86lin\d+          x86|linux         # Intel PC's running Linux

then the keyword "sparc" would match lines one and two, but not lines three
and four. The keyword "solaris" would match line two, but not lines one,
three or four.

Note that the matches from each keyword are combined together to form the
final list. The masshosts command line "masshosts sparc x86" would match
all four lines in the filter configuration file, and return the list of
hosts that matched each of these four filters. Another way of looking at
this is that the keywords themselves are OR'd together, so that adding more
keywords potentially gives you more matches, and hence more machines in the
list.


SPECIAL FEATURES
================

Masshosts has several special features to improve its overall performance.
These features serve to reduce its run time, make it less susceptible to
network outages and exclude certain hosts from the final match list,
regardless of the keywords or filters that were specified on the command
line.

Executing commands on remote hosts vs. local host
-------------------------------------------------

For each host that your masshost query returns, you have the opportunity to
run a command, with arguments, via the -e parameter. This command can be
executed either locally (the default) or on the remote host through an
rsh(1), if the -r switch is specified.

If the string %HOST% appears in either the command or the argument list, it
will be replaced by the current hostname in masshost's execution queue.

Parallel execution
------------------

The -p switch is arguably masshost's most powerful option. Rather than
waiting for a single command to complete before starting a new one,
masshost will spawn the desired number of processes (N) so that they run in
parallel. As each child process exits and is collected, masshosts will
spawn a new process in its place, always keeping N processes active at any
given time.

Processes can run either on the local machine, or via the -r switch, on the
remote host through an rsh(1). Note that you must be careful when
specifying the parameter to the -p switch so that you do not overload your
local machine. Commands that run locally can easily suck up available CPU
cycles, enough to the point where your performance worsens rather than
improves. Unless your jobs are going to be spending a significant amount of
time waiting for something (e.g., I/O), keep the number of parallel
processes small.

When using the -r switch, however, you can jack up the -p parameter to
fairly large values (25 and 30 are not unreasonable numbers). In terms of
local CPU, rsh(1) processes are relatively cheap.

Avoiding rsh(1) timeouts
------------------------

Perhaps the biggest source of potential delays in a masshosts run is rsh(1)
timeouts. In every environment, there are bound to be machines that are
down, not responding to network requests and possibly even off the net
entirely, but still in the hosts file or the local hosts database.
Masshosts, of course, has no way of knowing which machines are up, or are
supposed to be, and will blindly attempt to connect to every host in its
execution queue. For the machines that it can't reach, however, the end
result is that rsh(1) will hang, waiting for either a connection or a
timeout. These timeouts can take anywhere from one to two minutes to occur,
depending on your local OS.

With the -c switch, however, you can attempt to avoid rsh(1) timeouts. When
specified, masshosts will first check the network connectivity to the
remote host by attempting to connect to that host's shell port. If a
connection is not established within $CONNECT_TIMEOUT seconds, masshosts
assumes that the host is unreachable or unusable, and will not attempt an
rsh(1). This feature is implemented through Graham Barr's IO::Socket
package.

Of course, just because a machine is up, that doesn't mean that it can be
rsh'd to successfully. A down fileserver, busy CPU and a variety of other
problems can prevent a machine that is up and alive from actually executing
your commands once you have connected to it. The -c switch won't help you
in these circumstances, but the -t switch will.

Execution time limits
---------------------

When running commands in parallel, you can specify a time limit on overall
command execution in order to prevent stalled machines from tieing up the
execution queue. The -t switch specifies, in seconds, the time limit for
the command to complete, including the rsh itself. This is implemented via
a call to alarm(2). Be careful when using this option, particularly if you
are executing on multiple machine types and/or speeds: set the time limit
according to the projected execution time of your slowest host.

Host exclusion
--------------

With the -x or -X switches, you tell masshosts to exclude any hostname that
appears in the "host exclusion" file. This file has a very simple format:
one hostname per line, with no added spaces and no comments. Hostnames must
be an exact match.

IP address matching
-------------------

Specifying the -n switch and its argument allows you to restrict the
masshosts host list to machines whose IP addresses match the given regular
expression. All known IP addresses for each machine in the host list are
checked against this regular expression for a match. The upside of this
technique is that multi-homed hosts will be included if one of their
interfaces matches the regex. The downside is that it slows down masshosts,
since a gethostsbyname(2) call is made for each hostname in the host list:
the longer the host list, the slower the process.

This switch is useful when you only want to hit machines on a specific
network or subnet.


WRITING THE CUSTOM HOST-MATCHING SUBROUTINE
===========================================

At the heart of masshosts lies the subroutine that actually takes a list of
filters and uses it to generate a list of hostnames that meet the
conditions specified by one or more of those filters. This subroutine is
named "getHosts", and it must be supplied by you.

Because every environment names their hosts differently, it is not possible
to provide a generic getHosts subroutine that will work for everyone. Some
environments, for example, may choose a naming convention for their hosts
where the hostname identifies the type of machine. (Our environment takes
this approach: all RS6k's are named with "rs" followed by a numerical
suffix, all Solaris machines are "ss" followed by a numerical suffix, and
so on). Other administrators may maintain a host information database,
whether it be as simple as a flat text file, or as complex as a formal SQL
database, where machine configuration information is stored for each host
on the network.

The purpose of providing a standard API to the getHosts subroutine, and not
including it as a part of masshosts itself, is to allow you, the systems
administrator, to easily integrate masshosts into your environment. By
writing your own getHosts routine, you choose how to translate filters to
hostnames, providing ultimate flexibility.

This section describes the API for the getHosts subroutine, and provides
two samples that you can either incorporate with few or no changes, or use
as the basis for building your own.

API
---

The API for the getHosts subroutine is quite simple: only two arguments are
passed, and both are references to arrays. The getHosts subroutine itself
is stored in a file named by $GETHOSTS_FUNC, and is included into the Perl
script via a "require" directive.

The first argument, which we'll call "$arefHosts", is a reference to an
array containing the hosts that we have found in the getHosts subroutine.
The second argument, which we'll call "$arefFilters", is a reference to an
array containing our filters. Each host that matches one of the filters in
@$arefFilters should be pushed onto the array @$arefHosts.

The following two examples show two different implementations of the
"getHosts" subroutine, and will be discussed in detail.

Example 1: Matching hostnames in /etc/hosts (or hosts.byname)
-------------------------------------------------------------

Listing 2.1 shows code for a getHosts subroutine that performs the simplest
form of hostname matching: each filter is a regular expression, and those
regular expressions are matched against the hostnames in /etc/hosts or the
hosts.byname NIS map.

This function assumes that you have some sort of naming convention that
identifies machine type according to its hostname. As previously mentioned,
our environment uses this technique in naming hosts, where the hostname
consists of an alphabetic "tag" indicating its architecture and OS,
followed by a numerical sequence number (rs006, ss102, hp903, etc.). With
this arrangement, it is possible to create a regular expression that
matches, say, all RS6k's or HP workstations, and the performance is on par
with the grep family of commands. The limitation, though, is that you are
limited by the granularity of your naming convention. Our hostnames, for
example, don't differentiate between HP-UX 9.x and HP-UX 10.x, so when we
ask for HP machines, we get all of them regardless of their OS level.

Given this naming convention, our filter configuration file might look
Figure 1.1. Providing the keyword "aix" on the masshosts command line would
correspond to the filter "ibm\d+". If we specified "solaris" as a keyword,
we would get two filters: "sunsol\d+" and "sunafs\d+".

The getHosts function in Listing 2.1 takes these filters from @$arefFilters
and forms a single regular expression of the form:

          \b(filter1|filter2|filter3)\b

Using our above examples and Figure 1.1, then, the keyword "aix" would
generate the regular expression "\b(ibm\d+)\b", and the keyword "solaris"
would generate "\b(sunsol\d+|sunafs\d+)\b". The \b designations help
prevent unintended matches: if we were, for example, to specify "nfs" as
our keyword, we would not want the regular expression "fs\d+" to also match
our AFS servers, whose naming convention is "sunafs\d+".

Each line of the hosts file (or the hosts.byname NIS map if $USE_NIS is
set) is then matched against this regular expression. If a match is
successful, the matching hostname is pushed onto the @$arefHosts array.

Example 2: Querying a host information database
-----------------------------------------------

Listing 2.2 shows how masshosts could be used to query an LDAP database,
using Graham Barr's Net::LDAP Perl module, which can be obtained from the
Perl-LDAP page on the Web at http://www.connect.net/gbarr/perl-ldap/. LDAP
was chosen for this example because both the Perl modules and the
University of Michigan's LDAP server implementation (see
http://www.umich.edu/~dirsvcs/ldap/) are freely available.

The advantages of using a database query to match hostnames are numerous,
and are limited only by the amount and type of data you choose to store for
your machines. For the purposes of this example, let's assume that our
records have the following structure (in reality, there would be additional
fields required by the LDAP database, but we'll leave them out for
simplicity):

          dn: hostname=fs5, o=ourcompany
          arch: aushp
          hostname: fs5
          o: ourcompany
          osname: SunOS
          osrelease: 4.1.4
          osversion: 1

The attributes could be generated using uname -a, and then updated into the
database.  We  could then  use   masshosts to  allow  queries against  this
information, creating a filter configuration file similar to Figure 1.2.

One could even expand the attribute list for a machine's entry, allowing
storage of IP address, total RAM, CPU type, ethernet address and even which
NIS server the host is bound to (provided you update records often, of
course). The more data you choose to store, the more granularity you have
in selecting the machines for your masshosts run. For truly complex
queries, one could skip using keywords altogether, and instead use the -f
or -F options to specify search filters directly, giving you the ability to
make customized queries on the fly.


INSTALLATION
============

To install masshosts:

  1. Install the masshosts script in the desired location (/usr/local/bin,
     etc.)
  2. Change the hardcoded "require" line of masshosts to reflect the
     location of your config file
  3. Create your filters file
  4. Create the getHosts subroutine
  5. Create your config file, masshosts.pl


CONCLUSION
==========

Masshosts is a powerful and flexible tool for acessing large numbers of
machines. The custom "getHosts" subroutine makes it easy to integrate
masshosts into your existing environment, and allows you to define
host-matching filters that range from simple pattern matches to complex
database queries. The performance boosts of parallel process execution can
save you hours of execution time, and make it possible to run commands on a
large number of hosts in a reasonable amount of time, with little or no
operator babysitting. These, combined with its other features, make
masshosts an invaluable tool for administering large computing
environments.


ABOUT THE AUTHOR
================

John Mechalas has a B.S. and M.S. in Aeronautical and Astronautical
Engineering from Purdue University. He has worked at Intel Corporation for
four years, where he currently manages a UNIX systems administration and
security team for a large microprocessor design site. He can be reached at
johnm@ichips.intel.com.

