|  AIX 
              Alternate Disk Installation
 Jeff Marsh
              In this article, I will describe some tools within AIX (some new, 
              some old) that can help you reduce the off-hours time spent by your 
              administration staff during maintenance upgrades. I will also show 
              you some uses for these same toolsets that can help you reduce recovery 
              times due to rootvg corruption.
              Alternate Disk Installation
              What is it? According to the IBM AIX Installation Guide:
              "Alternate disk installation, available in AIX Version 4.3, 
              allows installing the system while it is up and running, allowing 
              installation or upgrade down time to be decreased considerably."
              Thus, with another set of bootable drives within a server, you 
              can install maintenance (e.g., upgrade your system from AIX 4.3.3.04 
              to AIX 4.3.3.06) during the day without interruption or any effects 
              to the running applications. However, you will still need a reboot 
              to make it active.
              The support model prior to Alternate Disk Installation required 
              all work to be done off-hours during an application maintenance 
              window that generally took two to four hours. Now you can reduce 
              that off-hour time from two to four hours per server to just the 
              time to reboot. I'll also show you how you can complete multiple 
              upgrades in that same reboot window using Network Installation Manager 
              (NIM).
              Requirements
              To enable Alternate Disk Installation, you need to install the 
              following base-level filesets and upgrade to at least these corresponding 
              fileset levels. These filesets do not require a reboot to install:
              
             
Base level filesets:            Fileset levels:
bos.alt_disk_install.rte              26
bos.alt_disk_install.boot_images      27
You will also need another free, bootable drive within your server. 
            In this case, you are configuring new servers with four internal drives 
            for systems administration purposes: two drives for the primary rootvg 
            mirrored, and two for alt_disk_install implementations. You 
            could get by with just one additional drive, but we prefer to have 
            two.  How It Works
              Alternate Disk Installation works by cloning your primary rootvg 
              running on hdisk0 and hdisk1, for example, to a second set of drives, 
              hdisk2 and hdisk3. After the system completes those copies using 
              basic find, backup, and restfile utilities, 
              it will install the latest maintenance level you designate.
              This process is shown in Figure 1. First, you clone hdisk0/1 to 
              hdisk2/3, and then you apply maintenance to the newly cloned hdisk2/3 
              while the applications continue to run against hdisk0/1.
              To complete this task from SMIT, issue the following fast path. 
              You should expect to see the following panels:
              
             
smitty alt_clone
Clone the rootvg to an Alternate Disk:
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
* Target Disk(s) to install                      [hdisk2 hdisk3]
  Phase to execute                               all
+
  image.data file                                []
/
  Exclude list                                   []
/
   Bundle to install                             [update_all]
+
    -OR-
  Fileset(s) to install                          []
  
  Fix bundle to install                          []
    -OR-
  Fixes to install                               []
  
  Directory or Device with images                [/mnt]
  (required if filesets, bundles or fixes used)
  
  installp Flags
  COMMIT software updates?                       yes
+
  SAVE replaced files?                           no
+
  AUTOMATICALLY install requisite software?      yes
+
  EXTEND file systems if space needed?           yes
+
  OVERWRITE same or newer versions?              no
+
  VERIFY install and check file sizes?           no
+
  Customization script                           []
/
  Set bootlist to boot from this disk
  on next reboot?                                yes
  
  Reboot when complete?                          no
+
  Verbose output?                                no
+
  Debug output?                                  no
+
[BOTTOM]
  
F1=Help             F2=Refresh          F3=Cancel
F4=List
F5=Reset            F6=Command          F7=Edit
F8=Image
F9=Shell            F10=Exit            Enter=Do
From the example above, note the following:  
             
              At the completion of the operation, you will notice from the bootlist 
            -m normal -o command that the bootlist will be set to hdisk2 hdisk3, 
            and issuing an lspv command will show the following: You are cloning to hdisk2 and hdisk3. 
               You are running an update_all operation of maintenance 
                mounted in the /mnt mount point. (In this case, this is 
                a CD-ROM with the AIX 4.3.3.06 maintenance filesets.) 
               You are specifying that this operation should change our bootlist 
                to hdisk2 and hdisk3 after completion. 
               You are not asking the process to complete an immediate reboot 
                upon completion of the upgrade because this is something you want 
                to schedule in an appropriate maintenance window.
              
             
root@aknimp1:/> lspv
hdisk0         000f261d90bf6ea0    rootvg         
hdisk1         000f261dae86d104    rootvg         
hdisk2         000f261db52d4d95    altinst_rootvg
hdisk3         000f261db52d4ca6    altinst_rootvg
hdisk4         000f018d07d4f412    None           
hdisk5         000f261dbde71c66    None           
hdisk6         000f261dbd8eea89    nimresvg       
At this point, you have cloned and installed the latest AIX maintenance 
            level during the day. You are now ready to activate that latest maintenance 
            with a reboot operation at whatever time is appropriate for the outage 
            to your application users. You can save significant off-hours time 
            for maintenance upgrades; our off-hours time has been reduced to the 
            time needed for a simple reboot.  Alternate Disk Installation -- After the Reboot
              After the reboot, issue the oslevel command or complete 
              the appropriate verifications to ensure your maintenance upgrade 
              occurred as expected. If you issue the lspv command, you 
              will notice the following:
              
             
root@aknimp1:/> lspv
hdisk0         000f261d90bf6ea0    old_rootvg     
hdisk1         000f261dae86d104    old_rootvg     
hdisk2         000f261db52d4d95    rootvg         
hdisk3         000f261db52d4ca6    rootvg         
hdisk4         000f018d07d4f412    None           
hdisk5         000f261dbde71c66    None           
hdisk6         000f261dbd8eea89    nimresvg       
Both hdisk2 and hdisk3, from which you have booted, now show a volume 
            group identifier of rootvg. Hdisks 0 and 1 now show a volume group 
            of old_rootvg and are varied off.  Now, you have several options. My preference is to leave hdisk0 
              and hdisk1 alone with the old maintenance levels in case you need 
              to fall back on them.
              Let's assume that after the reboot your applications aren't 
              working well with the latest maintenance. The previous support model 
              suggests that you need to get the mksysb backup taken prior 
              to your upgrade and begin a restore process. This could take two 
              hours or more, with the hope that the tape image was good. The new 
              support model with Alternate Disk Installation says to change your 
              bootlist back to hdisk0 and hdisk1 and to reboot the server. At 
              some future point, when you decide the maintenance is good and you 
              don't need to fall back, you can clone the latest maintenance 
              residing on hdisk2/3 back to hdisk0/1.
              Cloning Back to hdisk0/1
              To complete the cloning of hdisk2/3 back to hdisk0/1, you must 
              issue the following commands:
              
             
               alt_disk_install -W hdisk0 hdisk1 -- Wakes up the 
                old_rootvg 
               alt_disk_install -S -- Puts the old_rootvg 
                back to sleep 
               alt_disk_install -X old_rootvg -- Removes the old_rootvg 
                volume group name associated with hdisk0/1 from the ODM and assigns 
                them a value of "none", which will allow the cloning 
                to recur cleanly. 
                smitty alt_clone -- Reclone back to hdisk0/1 using 
                the previous example.
              I will discuss using the above commands further in the next section; 
              however, in order to reclone drives that have been previously used 
              to boot from, you must follow the commands verbatim to remove the 
              knowledge of the old_rootvg volume group name from the ODM.
              Other Uses for alt_disk_install
              Some other items that alt_disk_install may be helpful with 
              are:
              
             
              Recovery of Corrupted Files in rootvg Nightly backup of your system -- Using alt_disk_install, 
                you can backup your system nightly (or at whatever frequency is 
                appropriate) without having to manage mksysb tapes. If 
                you suffer some type of rootvg corruption, either major 
                or minor, you can restore using the data on the cloned drives. 
                mksysb Images -- The alt_disk_install command 
                can be used to install images (AIX 4.3 or later) onto AIX 4.1 
                and later versions. 
               You can also use alt_disk_install for recovery of corrupted 
                files in rootvg and to reduce the size of logical volumes 
                in rootvg, as described in the following sections.
              If you suffer major corruption (hdisk failure), and the server 
              crashes, and if you have cloned that data to another bootable drive, 
              you could interface with SMS, for example, to change your bootlist 
              to your other cloned drives and quickly recover the server.
              If you suffer minor corruption within the rootvg where 
              a file or a few files are corrupted or inadvertently deleted, you 
              can wake up the cloned copy of the rootvg and copy those 
              deleted or corrupted files back to the primary rootvg while 
              the server is up and running.
              In this example, you are booted against hdisk0/1 and have recently 
              cloned the system to hdisk2/3. To access the cloned copy of the 
              rootvg while the server is up and running, complete the following:
              
              1. alt_disk_install -W hdisk2 hdisk3 -- Wakes up the 
              cloned copy:
              
             
root@aknimp1:/> alt_disk_install -W hdisk2 hdisk3
Waking up altinst_rootvg volume group ...
Replaying log for /dev/alt_hd4.
2. From a df -k command, you will notice that the wake up command 
            has mounted the alternate rootvg logical volumes, which are 
            prefaced with /alt_inst prefix:  
             
root@aknimp1:/> df -k
Filesystem    1024-blocks    Free %Used  Iused %Iused Mounted on
/dev/hd4            49152    5608   89%   1226     5% /
/dev/hd2           753664    5056  100%  19966    11% /usr
/dev/hd9var         16384   14340   13%    222     6% /var
/dev/hd3            32768   30376    8%     98     2% /tmp
/dev/lvexport      131072  126772    4%     41     1% /export
/dev/lv01         4980736   94468   99%   4546     1% /export/lpp_source
/dev/lv02          917504  448868   52%  29468    13% /export/spot
/dev/lvmksysb    15204352 3381328   78%     31     1% /export/mksysb
/dev/lvadmin       131072  126868    4%     25     1% /admin
/dev/hd1            16384   15820    4%     20     1% /home
/dev/lvadsm         16384      56  100%     21     1% /var/adsm
/dev/alt_hd4        49152    5704   89%   1192     5% /alt_inst
/dev/alt_lvadmin   131072  126868    4%     25     1% /alt_inst/admin
/dev/alt_hd1        16384   15820    4%     20     1% /alt_inst/home
/dev/alt_hd3        32768   30376    8%     98     2% /alt_inst/tmp
/dev/alt_hd2       753664    5056  100%  19966    11% /alt_inst/usr
/dev/alt_hd9var     16384   14380   13%    219     6% /alt_inst/var
/dev/alt_lvadsm     16384    1848   89%     20     1% /alt_inst/var/adsm
3. Copy the corrupted files from the appropriate alt_inst logical 
            volume/filesystem. In this case, I corrupted my /etc/hosts 
            file, so I will issue the following command to restore it from my 
            latest cloned backup:  
             
cp /alt_inst/etc/hosts /etc/hosts
4. When you have restored the required files, put the altinst_rootvg 
            back to sleep, which will unmount the /alt_inst logical volumes/filesystems 
            by issuing:  
             
alt_disk_install -S
Reducing Logical Volumes Size Within the rootvg  Remember the pain associated with the need to reduce the size 
              of a logical volume within the rootvg? It took a tape restore 
              of the system to complete. Now, you can complete that reduction 
              within a simple cloning process. The steps to complete that process 
              are as follows:
              
              1. Issue a mkszfile command to create the /image.data 
              file.
              2. Edit the /image.data file and specify SHRINK=yes 
              in the logical_volume_policy stanza:
              
             
image_data:
        IMAGE_TYPE= bff
        DATE_TIME= Tue Oct 3 10:29:55 CDT 2000
        UNAME_INFO= AIX aknimp1 3 4 000F261D4C00
        PRODUCT_TAPE= no
        USERVG_LIST= nimresvg
        OSLEVEL= 4.3.3.10
logical_volume_policy:
        SHRINK= yes
        EXACT_FIT= no
ils_data:
        LANG=  en_US
3. Clone the rootvg to hdisk2 and hdisk3, specifying your customized 
            /image.data file by issuing one of the following commands:  sm itty alt_clone (remember to specify the location of 
              your image.data file on the image.data file prompt)
              
              or
              
              al t_disk_install -i/image.data -B -C hdisk2 hdisk3 (from 
              the command line)
              4. After the completion of the cloning operation, wake up the 
              altinst_rootvg by issuing:
              
             
alt_disk_install -W hdisk2 hdisk3
5. Review your df -k output and compare the primary logical 
            volume sizing to their /alt_inst counterparts.  6. If you are satisfied with the sizing reduction, change your 
              bootlist (bootlist -m normal hdisk2 hdisk3) and reboot.
              Network Installation Managment (NIM)
              I want to briefly discuss NIM and show how well it interfaces 
              with alternate disk installation. It can easily help you to manage 
              upgrades on a group of servers, thus saving you even more time.
              What Is NIM?
              Paraphrasing from the AIX Network Installation Management Guide 
              and Reference, "NIM is a base component of AIX and permits 
              and aids in the installation and maintenance of AIX, it's basic 
              operating system, and additional software and fixes that may be 
              supplied over the network. NIM provides for the customization of 
              machines both during and after installation. As a result, NIM has 
              eliminated the reliance of the systems administration staff on tapes 
              and CD-ROMs for software installation and maintenance."
              In this case, you are using NIM to centrally manage a group of 
              standalone machines (NIM clients) from a centrally located network 
              attached to a NIM master. From the NIM master, you can manage operating 
              system installations, maintenance upgrades, mksysb images 
              for backup and recovery, installation of new servers (cloning), 
              and the re-installation of existing servers in case of a disaster.
              There's a great deal of functionality provided by NIM. I 
              recommend reviewing the usage guide to see what NIM features could 
              benefit your environment. I also recommend a good Redbook from IBM, 
              NIM: From A to Z in AIX 4.3 (SG24-5524-00), which was published 
              in February 2000.
              I won't cover the specifics of setting up the NIM master 
              and the corresponding NIM client configurations; it is not an overly 
              complicated process. However, it will require someone with NIM-specific 
              knowledge to lay out the functional NIM environment. If you support 
              SP complexes, you have already had a fair amount of exposure to 
              NIM even though it is buried one layer below PSSP.
              One key feature of NIM that will help manage a group of servers 
              concurrently is the Machine Group definition. Within NIM, you can 
              operate as easily on a single machine as you can a group of machines. 
              For instance, we have defined several machine groups within our 
              NIM master environment. These definitions allow us to operate on 
              a group of like servers concurrently. 
              How Does It Integrate with Alternate Disk Installation?
              NIM knows how to fully exploit Alternate Disk Installation. For 
              example, look at the initial clone and update_all operation. 
              Let's say you want to use NIM to extend the model (instead 
              of upgrading the maintenance level on a single server) and you want 
              to complete this operation on ten Lotus Notes servers that are similarly 
              configured and are defined in a Notes machine group within NIM. 
              From SMIT on the NIM master, issue the following fast path and you 
              will see this panel:
              
             
smitty nim_alt_clone 
Clone the rootvg to an Alternate Disk
Type or select values in entry fields.
Press Enter AFTER making all desired changes.
                                                   [Entry Fields]
* Target Machine / Group to Install                [NOTES]              +
* Target Disk(s) to install                        [hdisk2 hdisk3]
  Phase to execute                                 all                  +
  IMAGE_DATA resource                              []                   +/
    EXCLUDE_FILES resource                         []                   +/
     (leave blank to include all files in backup)
  
  BUNDLE to install                                []                   +
    -OR-
  Fileset(s) to install                            []
  
  FIX_BUNDLE to install                            []                   +
    -OR-
  FIXES to install                                 [update_all]
  
  LPP_SOURCE                                       [aix433_lppsource]   +
  (required if filesets, bundles or fixes used)
  
  installp Flags
    COMMIT software updates?                       yes                  +
    SAVE replaced files?                           no                   +
    AUTOMATICALLY install requisite software?      yes                  +
    EXTEND filesystems if space needed?            yes                  +
    OVERWRITE same or newer versions?              no                   +
    VERIFY install and check file sizes?           no                   +
  Customization SCRIPT resource                    []                   +/
  Set bootlist to boot from this disk
  on next reboot                                   yes                  +
  Reboot when complete?                            no                   +
  Verbose output?                                  no                   +
  Debug output?                                    no                   +
  Group controls (only valid for group targets):
    Number of concurrent operations                []                   #
    Time limit (hours)                             []                   #
F1=Help             F2=Refresh           F3=Cancel            F4=List
F5=Reset            F6=Command           F7=Edit              F8=Image
F9=Shell            F10=Exit             Enter=Do                    
In this example, you would cause every server defined in the Notes 
            Machine group to begin a process to clone itself from hdisk0/1 to 
            hdisk2/3. At the completion of the cloning operation, NIM would then 
            NFS-mount the aix433_lppsource resource (in this case, it's 
            the AIX 4.3.3 lppsource filesystem, which includes the 4.3.3.06 
            maintenance) and apply it to the newly cloned hdisk2/3 on each of 
            these servers. This also instructs NIM to change the bootlist on each 
            of these servers as a part of the operation but does not cause an 
            immediate reboot. I recommend, however, using NIM to schedule a reboot 
            of all these servers during the maintenance window.  All of this work, including the cloning and upgrading of the maintenance 
              level, can be completed during the day without affecting the running 
              application (e.g., Notes). For the previous support model, this 
              same upgrade would have taken about 2 hours per server plus reboot 
              time to complete during an application maintenance window, generally 
              in the middle of the night. If a single person worked to complete 
              this process, this could have taken about 25 hours spread across 
              multiple weekends to complete. With NIM and Alternate Disk Installation, 
              this upgrade outage can be reduced to the time to reboot these 10 
              servers concurrently (or about 30 minutes, in our case). Note that 
              your time may vary depending on speed of network, number of filesets 
              being updated, time to reboot, and problems encountered.
              Figure 2 shows the process using NIM/Machine Groups and Alternate 
              Disk Installation. First, you instruct the NIM master to have each 
              of the servers in the defined machine group clone hdisk0/1 to hdisk2/3 
              (depicted in red). Then, NIM will NFS-mount the appropriate LPPSOURCE 
              filesystem containing the AIX 4.3.3.06 maintenance level and apply 
              that maintenance to the newly cloned drives (operation in green). 
              Again, this process happens concurrently on all servers in the defined 
              NIM machine group without affecting the running applications.
              Conclusion
              My team is in the process of rolling out this methodology change. 
              I think we can significantly reduce the amount of time spent in 
              support of our current AIX standalone infrastructure. I also think 
              Alternate Disk Installation and NIM, can help you better manage 
              your infrastructure and provide some consistency to your installation, 
              upgrade, maintenance, and build procedures. In conclusion, I hope 
              the above discussion will help you significantly reduce the amount 
              of off-hours time associated with maintenance or fileset upgrades 
              within AIX.
              Jeff Marsh is the Systems Advisor to the UNIX Server Team working 
              at American Century Investments, a premier investment manager serving 
              nearly two million individual and institutional investors. Jeff 
              can be contacted at: jeffrey_marsh@americancentury.com.
           |