|  Dynamic 
              Round Robin
 Jonathan D. Leghart
              As more organizations become reliant on Web servers for day-to-day 
              operations, systems administrators are faced with the task of ensuring 
              that the company site is always available. Although there are several 
              products that build clusters or actively balance a load across multiple 
              machines, sometimes the expense or complexity can be prohibitive.
              DNS Round Robin has been around for quite a while and is still 
              widely used. The concept is straightforward -- for a single 
              hostname, create multiple address records. The BIND server will 
              then return a list of addresses to the requesting resolver in rotating 
              succession. Using round robin has some advantages -- it is very 
              easy to configure and inexpensive to implement. However, since BIND 
              was not designed to actively monitor hosts, using round robin does 
              not provide a true load-balancing solution. For example, if you 
              have four Web servers configured in a round robin and one server 
              goes down, the name server has no knowledge of the unavailable server 
              and will continue to return that address. Thus, one fourth of your 
              visitors will never make it to the site.
              With the release of BIND 8, several new features were added including 
              the ability to configure dynamic DNS zones. Dynamic DNS is often 
              used in environments where clients use DHCP for network configuration. 
              It allows the systems administrator to keep accurate tables without 
              constantly assigning static IPs and updating zone files. Of course, 
              if a systems administrator wants to update a zone, a simple utility 
              called nsupdate can be used to add or remove records, including 
              those for a round robin configuration.
              Webwatch.pl is a simple Perl script that can watch over your Web 
              servers and will add or remove servers from a dynamic DNS zone, 
              depending on whether each server in the round robin configuration 
              is available (Listing 1). The script is configured with a list of 
              servers and groups (round robins). (Listings are available from 
              the Sys Admin Web site: www.sysadminmag.com.) It reads 
              each server in the list, checks whether it is currently a part of 
              the round robin group, and then attempts to connect to port 80 on 
              the server. From these two tests, there are four possible outcomes:
              
              
             
              Configuring the Zone If the server is already in the round robin and a connection 
                is made, nothing is done; 
               If the server is in the round robin but a connection cannot 
                be made to port 80, the script will remove it from the round robin; 
               If the server was not in the round robin but a connection can 
                be made, it will be added back; 
               If the server is not in the round robin and no connection is 
                established, no DNS changes will be made.
              The first step to create a dynamic round robin is to configure 
              a zone that allows updates. Although it is easy to make a zone dynamic, 
              I don't like to make my entire domain dynamic. Instead, I configure 
              a subdomain specifically for dynamic, round robin configurations. 
              In my root zone file, I can then create CNAME records to point to 
              the round robin in my dynamic zone. The configuration would look 
              something like this:
              
             
In etc/named.conf:
        zone "foo.bar" IN {
        type master;
        file "db.foo.bar";
    };
        zone "rr.foo.bar" IN {
        type master;
        file "dynamic/db.rr.foo.bar";
        allow-update { 192.168.1.5; };
    };
In the db.foo.bar zone file:
    www        CNAME        www.rr.foo.bar
Now a read-only zone file for rr.foo.bar will need to be created. 
            The zone file need only contain a base set of information, including 
            a default TTL, SOA information, NS, and MX records. It should look 
            something like this: 
             
$TTL  86400
@     IN    SOA    rr.foo.bar.    hostmaster.foo.bar. (
            2001070100    ; Serial
            10800        ; Refresh
            3600         ; Retry
            604800    ; Expire
            86400)    ; Min TTL
            NS    ns1.foo.bar.
            NS    ns2.foo.bar.
            MX    mail.foo.bar.
Since the script will be adding and removing hosts, the initial file 
            does not need to contain any host information. You will, however, 
            need to be sure that this zone file has write permissions by the UID 
            used to run the script.  Script Configuration
              After the zone is set up, the script will need some minor configuration 
              changes. The first section of the script defines all the local values 
              for each installation. The value for $domain will be the 
              newly created dynamic domain (rr.foo.bar, in this case). 
              The $ttl value is the timeout value for positive responses 
              from the BIND server. (It is important to understand that with BIND 
              8, the named server recognizes different TTL values for positive 
              and negative responses for a zone or even a particular host.) For 
              any installation, this value should not exceed the amount of time 
              between Web server checks; otherwise, a caching name server may 
              keep an address cached even if the script has removed it from the 
              round robin.
              The next value, $timeout, is the number of seconds the 
              script will wait for a connection with a single Web server before 
              assuming the system is unavailable. This number, multiplied by the 
              total number of hosts the script will be checking, should not exceed 
              the time interval between checks.
              The next two values, $logfile and $nsupdate, are 
              self-explanatory. Note, however, that you can run this script as 
              any user because the named server only cares where a DNS update 
              is coming from, not who is making it. However, you will need to 
              be sure that the logfile is writable by the UID executing the script.
              The next value is a hash that defines your servers and the round 
              robin set to which they belong. For each IP address you want the 
              system to check, you must associate a hostname. For example, say 
              you have www.domain.com for your primary content, and images.domain.com 
              -- a set of servers for those bandwidth-hogging pictures. Here 
              you can define all the IPs of those servers and their appropriate 
              group (www or images). If you have several groups of servers, you 
              may want to consider running multiple scripts. Remember that the 
              interval between script runs should be greater than the number of 
              hosts you are checking in one script multiplied by the $timeout 
              value.
              The Script at Work
              The rest of the script is straightforward. To make troubleshooting 
              and customization easy, each major function was broken out:
              
              CheckConfig -- Ensures all the values in the initial configuration 
              section are useable. This includes ensuring the log file is writable 
              and that nsupdate is executable.
              CheckDNS -- Determines whether a server's IP is a part 
              of a round robin set. It uses the standard Perl function gethostbyname 
              to get the list of IPs for the round robin host.
              CheckHTTP -- Attempts to create a TCP connection to a server 
              using port 80. It will wait for the value specified in $timeout 
              to complete the connection, otherwise it will return unsuccessfully. 
              For implementations that require more than just a connection to 
              the IP (i.e., monitoring virtual servers), this section could be 
              modified to actually request data from the HTTP server and perform 
              some sort of validation to determine whether to return success or 
              failure.
              ModDNS -- The routine that interacts with the nsupdate 
              command. It simply reads all the parameters passed into the function 
              and feeds them into nsupdate.
              Logger routine -- Used to create entries in the webwatch.log 
              file. For those wanting to log to syslog, this routine could be 
              modified. It could also be changed to only log negative results. 
              For the truly dedicated, you can even set this up to send a pager 
              message any time a system is dropped from the round robin.
              TimeStamp -- A simple routine that will format the time for 
              the Logger routine.
              Running the Script
              Now that the script is set up and your dynamic zone is ready to 
              go, it's time to run the script. A simple entry in cron will 
              take care of that, however, don't forget the magic formula 
              -- the interval between script runs should be greater than the 
              total number of hosts being checked multiplied by the timeout value.
              Assuming the system running the script can see all of your servers, 
              your dynamic zone should start to populate. Running nslookup will 
              allow you to see whether the entries are showing up. Once you have 
              confirmed that the script is running, there are a few maintenance 
              tasks that you will need to perform. Obviously, you will need to 
              check the log file often for errors. You should also rotate it, 
              so it doesn't get unmanageable. As with any important service, 
              you will want to periodically check to be sure the script hasn't 
              unexpectedly died.
              Some Final Notes on Implementation
              If you decide to use this solution to manage your systems, there 
              are a few things to consider when designing your network. Be careful 
              that your monitoring system won't be on a network that may 
              lose connectivity with the Web servers, yet remain connected to 
              the DNS server. The result would remove all of your Web servers 
              from the round robin, even though they may still be available to 
              the rest of the net.
              Another point to consider is what may happen if a particular server 
              becomes overloaded. I recently suggested using this script to a 
              client who relied on round robin to balance the company's Web 
              load (more than two million visitors a day) over several servers 
              located across the United States. The client made the point that 
              sometimes servers just get overloaded, which would cause timeouts 
              and result in a server that is up (but very busy) to be removed 
              from the round robin. While true, you could also argue that removing 
              the server from the round robin would help reduce the load. Then, 
              when the server became less loaded, it would automatically be put 
              back in to the round robin. In either case, it's important 
              to be sure you have some other mechanism in place to monitor the 
              health of your systems.
              Finally, although I "home-grow" many solutions to make 
              day-to-day administration easier (not to mention keeping my pager 
              quiet at night), I often see situations where a few dollars spent 
              would have prevented hours of frustration and downtime. If your 
              installation requires a robust product, and your company can afford 
              it, spend the time and effort to research the right solution for 
              you.
              Resources
              BIND source and documentation is available at: http://www.isc.org/products/BIND
              Albitz, Paul, and Cricket Liu. 2001. DNS and BIND, 4th 
              Edition. O'Reilly & Associates.
              Wall, Larry, and Randal L. Schwartz. 2000. Programming Perl, 
              3rd Edition. O'Reilly & Associates.
              Jonathan Leghart has been messing with computers since learning 
              to program BASIC in the fourth grade. For the past five years he 
              has focused primarily on UNIX and network administration with a 
              particular interest in writing Perl scripts. He currently works 
              as a Network Systems Engineer for Lucent Worldwide Services, providing 
              consulting services to enterprise customers. Jonathan can be contacted 
              at: jonathan@leghart.org.
           |