|  Web Hosting: 
              A Migrational Case Study
 Ripduman Sohan
              Hosting, the act of providing a service on behalf of an individual 
              or company, is a concept that has been around for as long as the 
              Internet. There are many types of hosting services, including Web, 
              mail, and database hosting. However, the most popular and longest-lived 
              hosting service has been Web site hosting.
              Many organizations, such as universities, commercial companies, 
              and ISPs, provide this essential service for their users or customers. 
              Today, the Web is the most popular medium for retrieving information 
              from the Internet. To ensure your material in this information cornucopia 
              is readily available, it's essential to configure your end 
              to deal with anything your users may want, without inconveniencing 
              them or annoying you.
              In this article, I present a case study migration of a system 
              containing 203 virtual hosts from one server to another, many of 
              which had backend databases. The Web server used was Apache, the 
              database, MySQL, all running on FreeBSD and being transferred to 
              Linux. I intend to show you how simply this can be done and share 
              some of the tricks and pitfalls generally involved with setting 
              up, running, and successfully migrating medium- to large-scale Web 
              sites with this software. I've also included virtual hosting 
              because that's what the original job entailed, and also because 
              I wanted to be as thorough as possible. Nevertheless, almost all 
              of the concepts in this article should be adaptable to single Web 
              sites and different software with little or no tweaking.
              The Scenario
              The source system was a box running FreeBSD 3 on a Pentium II 
              266 located in San Francisco. It was connected to the Internet via 
              a 256-KB link and was using Apache 1.3.1. Of the 203 virtual hosts, 
              30 required databases, so it also had MySQL 2.3 installed. The machine 
              setup was incompetent -- so incompetent that the actual MySQL 
              database was available directly off the Web. It also had no backups. 
              The target machine was a brand new, default installation, Redhat 
              6.3 machine on a T3 link in New York. I didn't have physical 
              access to either side of the scenario and was working off a satellite 
              link with a 700-ms lag.
              The reason for the changeover was twofold. The company was increasingly 
              aware of the insecurity and lack of power of the source machine 
              in relation to their increasing customer base, and they were also 
              getting a better deal with a new co-location provider. My job was 
              to move the whole system, with zero downtime and no loss of client 
              data.
              
            The Move
              Backup
              The move started with the most important thing -- a system 
              backup! I couldn't back up any of the user Web files or databases 
              due to the high load on the system. As soon as I touched any of 
              these, the system became highly erratic and with 2.3 GB of user 
              data on the system and no local means of backup, I wasn't going 
              to transfer all the data via the Internet link. Therefore, I initially 
              backed up just the httpd.conf file and the password and group 
              files. However, before you start a migration, I advise you to check 
              that latest system backup is valid. You do have one, right?
              Analysis
              The next move was to download and install analog. This is a very 
              well-known and comprehensive Web log analyzer, available as a package 
              for most platforms. You can get started quickly with the following 
              steps:
              
              1. Install analog. Use your package manager, usually rpm -i 
              analog.rpm in Linux.
              2. Edit the analog.cfg, usually in /etc/analog.cfg. 
              Edit the sections LOGFILE to point to where your Web server logfile 
              lives, and the section OUTFILE to point to your output filename.
              3. Turn on the hourly report with the command FULLHOURLY ON 
              in the analog.cfg file.
              4. Run the binary, usually /usr/bin/analog.
              
              This will create a full hourly breakdown report using your log, 
              which you can view with any browser. I did this to build a time 
              profile so I would know the best time for me to actually log into 
              the machine in order to copy and move the data. Most Web servers 
              go through a daily cycle of use, depending on the time zone of their 
              audience, and it's best to work when load average is lowest 
              to minimize disruption to the system. If you can do the migration 
              with downtime or without affecting the service, go ahead and skip 
              this step.
              New System Build
              After figuring my optimal timings, I built the new server. If 
              you choose to run a dedicated machine as a server, a little forethought 
              in design can go a long way to prevent problems. Your most important 
              resources on a Web server are memory and disk space. Work on maximizing 
              those. Make your Web data partition as large as possible and put 
              it on a separate drive if necessary. Most default Linux server installations 
              come with setups that are not really optimal to Web servers -- 
              do you really need X Windows? What about gimp? Get rid of all unnecessary 
              software. This usually frees up to 800 MB and makes software conflicts 
              less likely. On most RedHat-compatible Linux distributions, you 
              can use the following commands to work with installed packages:
              
              rpm -qa -- Provides the full list of installed packages
              rpm -qi packagename -- Information on a selected package
              rpm -e packagename -- Removes the selected package
              
              Next, trim memory usage. Disable (or replace with lighter equivalents) 
              all services that don't have to be running: atd, bind, 
              dhcp, and Sendmail (replaceable by ssmtp) are the 
              usual candidates. You can usually remove the packages or just remove 
              the startup scripts from your startup scripts directory. Also ensure 
              that you have adequate swap space (usually as much as available 
              ram). Swap space, at least in Linux, comes in two variants: partition 
              or file. Use the partition type -- at least if your swap space 
              gets corrupted your system won't.
              I then upgraded the "key software". Key software is 
              software on which system functionality depends. This is usually 
              Apache and its related support software. It's worth using the 
              latest available stable version of your key software that you deem 
              fit for consumption. (My personal method of finding out the Apache 
              version to use is by querying http://www.slashdot.org by 
              using netcraft.) If you have any modules used by Apache (e.g., PHP), 
              it's worth getting the latest versions. Also, this is the time 
              to install any third-party software you want to use; Professional 
              FTP Daemon (Proftpd) and OpenSSH are popular options in this respect. 
              If you're using a system with a package manager and you don't 
              need any non-standard options (i.e., those requiring a source compile), 
              get the installable package. The reason for this is that any issues 
              with that particular OS and software will have been worked out by 
              the package maintainers, so it's worth using the package.
              Performance Tuning
              The crux of the matter is configuring your software so it performs 
              well. Although I could write several articles on how to configure 
              your system, I'll only give you the main ideas behind maximizing 
              performance for Apache and, to some extent, the system.
              Apache is the machine's interface to the world. Configure 
              this poorly, and your beautiful new server with its oceans of ram 
              and your T3 link won't be worth anything.
              To configure it well, first, get rid of all unnecessary Apache 
              modules and add any custom ones you do want. You can do this by 
              editing the httpd.conf file and looking for lines similar 
              to:
              
             
LoadModule info_module        modules/mod_info.so
This tells Apache to load the module info_module into memory 
            when it starts. Go through each LoadModule line of the default installation 
            and disable all the modules you'll never need. (This is done 
            by putting a # sign at the front of the line.) Typical modules rarely 
            used on production Web servers are mod_autoindex (creates automatic 
            indexes for directories) and libproxy.so (proxy caching module). 
            You can find the complete module description for each standard module 
            in the Apache documentation. This is done to minimize the memory Apache 
            uses because fewer loaded modules mean less memory allocated to module 
            code. Sometimes, disabling modules can also lead to server speed increases.  If you're using Perl as a scripting language for the server, 
              consider having mod_perl loaded. This will eliminate having 
              to run an instance of Perl every time you start up a script, which 
              means better response time for the server.
              Next, I always modify the lines StartServers, MinSpareServers, 
              and MaxSpareServers. These lines go together. To understand 
              this, remember that creating a process on most operating systems 
              is quite expensive in terms of time, and in Apache, each process 
              is known as a server. Hence, you'll want to start a reasonable 
              number of processes when you start up the program (StartServers 
              line), while simultaneously having a reasonable number of processes 
              free to serve any other incoming requests before it creates any 
              additional new ones (MinSpareServers line). Conversely, you 
              don't want to waste memory with spare processes lying about 
              after they've finished doing their work (MaxSpareServers 
              line). I find the values 8, 4, and 10 work well for most setups.
              The next lines to modify are the lines MaxClients and MaxRequestsPerChild. 
              MaxClients relates to the maximum number of clients that 
              can connect to the server simultaneously. A larger number means 
              more concurrent connections, but worse performance; a smaller number 
              means the opposite. A good compromise is a value of 200. The line 
              MaxRequestsPerChild relates to the number of requests each 
              process can handle before it is forced to die. This prevents errant 
              processes (e.g., one that leaks memory) from hogging system resources. 
              If you're confident everything works well, you can set this 
              value to zero to provide that little extra boost in performance.
              As a trick, you can use the above parameters to provide a limited 
              service while you perform maintenance or migration work. When I 
              migrated my data, I restarted Apache on the source machine with 
              a single server and a MaxClients of 50. This allowed users 
              to still get (some) service while I had a more usable machine.
              You can also turn of hostname lookups (HostNameLookups 
              line). This prevents Apache from looking up and logging the DNS 
              record of the connecting client, as opposed to just the IP. Finally, 
              you should avoid providing server-side scripting (.shtml 
              files) because this forces Apache to parse each page it sends and 
              makes them uncacheable.
              Regarding your system, there are two things that are consequential 
              to performance. The first is the maximum number of open files you 
              can have at any one time. In Linux, you can modify this parameter 
              by modifying the file /proc/sys/fs/file-max. The command 
              echo 16384>/proc/sys/fs/file-max will increase maximum 
              open files to 16384. For administration ease, you can put this command 
              in one of your startup scripts.
              The second thing you can do is to rebuild the kernel, cutting 
              out all the unnecessary drivers. This frees memory and makes the 
              kernel leaner and, therefore, faster. However, if you're working 
              off a remote link, be certain the kernel works before deployment. 
              Having an exact replica locally, in terms of software and hardware, 
              may help here.
              Data Migration
              Data migration is relatively painless if carried out properly. 
              The first thing to do is to make sure all your user accounts have 
              been duplicated with the correct logins and passwords. If you're 
              migrating between homogeneous machines, just copy the relevant password 
              and shadow files across. Otherwise, you may have to manually migrate 
              accounts, but this is dependent on source and destination systems. 
              Most shadow systems now have interchangeable password files, but 
              check first. I was lucky that all the information my company had 
              for virtual hosting was assigned by them (including password), so 
              I made a text file with the user information in it, and use the 
              Linux newusers command to create all the new users.
              Remember when migrating user accounts to make sure that all the 
              accounts on the new server have the same group identifier (GID) 
              and user identifier (UID) as on the old system. This prevents permission 
              and ownership problems when you copy the data across. It may also 
              be beneficial to create separate groups for different account classes. 
              For example, I have separate groups for the Web sites that connect 
              to databases, and for those that don't.
              After the users are set up on the system, you need to create or 
              migrate the Apache virtual hosts for each virtual host. An easy 
              way of keeping your virtual host configuration separate is to put 
              it in a separate file (e.g., the directive Include conf/vhosts.conf 
              in your httpd.conf would allow additional configuration directives 
              (the virtual hosts) in the vhosts.conf). This makes a migration 
              easy -- just modify the file for your new configuration and 
              include it in your new setup.
              You must ensure that all virtual hosts have their own transfer 
              and error log files. This is handy for the customers, because it 
              allows them to maintain and analyze their own log information. It's 
              handy for you, because it frees you of the same task.
              After this, it's smooth sailing. All that's left is 
              archiving the data off the old server and restoring it on the new 
              one. There are several ways to do this. My favorite method requires 
              both machines to have OpenSSH installed. Then use the following 
              command, carried out in the data directory of the source server:
              
             
tar -cf - * |(ssh -l username destination.host.com tar -xvpf -)
This archives all the data on the host server and unarchives it at 
            the destination, all in one command. Nevertheless, if you feel so 
            inclined, go through the tar, copy, untar cycle. I recommend 
            you don't change the Web data directories when moving the files, 
            in case of any hard coded paths.  Databases should also be migrated at this point. With MySQL, the 
              process is easy. The basic steps are dumping the data to a text 
              file, copying the file to the new server, creating the database 
              on the new server, and importing the data into the database. A quick, 
              typical example is:
              
             
oldserver$ mysqldump dbname >outfile  \
  (dump the database dbname to file outfile)
oldserver$ scp outfile newserver: 
  (copy the outfile to the new server using secure copy)
newserver$ mysqladmin create dbname  \
  (create the database dbname on the new server)
newserver$ mysql dbname <outfile   \
  (import data from file outfile)Testing  After migration, you're almost there -- but do things 
              work? The last thing you want is to change your DNS entries to point 
              to the new server, or create new DNS entries only to realize that 
              things don't work. However, you can't check to see whether 
              things work unless you move the DNS entries!
              Fortunately, there are a number of solutions to this problem. 
              The most comprehensive one I use is the following:
              
              1. Create a DNS server on an extra machine with fake records that 
              indicate the new server is the Web server for all the virtual hosts 
              you are hosting on it.
              2. Find a set of machines on the same network as the fake DNS 
              server that will be used for testing. Point their primary DNS server 
              to this server.
              3. Surf the virtual host sites to see whether they work.
              
              This works because the fake DNS server is programmed to assume 
              that it is the master domain holder for the virtual hosts and, hence, 
              passes the wrong information to the client machines. The client 
              machines then contact the new server and request data from it. The 
              advantage of this is that a number of machines can simultaneously 
              be used to do testing.
              A less elaborate but easier option is to have one machine on the 
              network configured to have no DNS server, but have a modified hosts 
              file (/etc/hosts for Linux) to point to the virtual hosts. 
              If your server machine needs to do some sort of host lookup, this 
              option is very useful, because it allows you to check that the whole 
              system works before migrating records.
              Once you're satisfied that everything works, change the DNS 
              settings to reflect the new server. The job is now done. Keep the 
              old server active for some time after retiring it, because the DNS 
              propagation takes time (usually a few days, but sometimes up to 
              a month). You can then gradually recycle or destroy it.
              Security
              You can never be too safe on a production server. A few things 
              to check are:
              
              
             
              Conclusion Don't give users access to the system if they don't 
                require it. Make their shells /bin/false. 
               If you allow CGI scripts, vet them to ensure they aren't 
                malicious. 
               If you use a custom Web-based administration tool, make sure 
                it's secure. 
               If the main access to your system is via ftp, consider 
                using ProFTPD. This is an excellent ftp server with lots 
                of security features, including the ability to lock users into 
                their home directory, apply quotas, allow logins without a valid 
                shell, and control the maximum number of concurrent user logins. 
               As much as possible, use OpenSSH to access your system. Disable 
                telnet and any other services you don't need. 
               Periodically check and update your software for any discovered 
                security vulnerabilities. 
               Use TCP Wrappers to control access to services. 
               Periodically monitor your log files for any suspicious activity.
              As you know, Web servers are integral but sensitive parts of today's 
              Internet. I hope this article has given you an insight to how important 
              it is to plan ahead when preparing or migrating a Web server. I 
              also tried to show how simple setup or migration can be, with little 
              or no downtime, if the planning is done well.
              Links
              Analog Weblog analyzer -- http://www.analog.cx
              
              SSMTP (Send only Sendmail emulator) -- http://rpmfind.net/linux/RPM/contrib/libc6/i386////ssmtp-2.38-1.i386.html 
              
              Netcraft International -- http://www.netcraft.com 
              
              ProFTPD (Professional FTP Daemon) -- http://www.proftpd.net
              
              OpenSSH -- http://www.openssh.com/
              Ripduman Sohan is currently finishing off a degree in Software 
              Engineering at City University, London. He's originally, and 
              still, based in Kenya and has been using and promoting *nix based 
              systems since he was 14 years old.
           |