Article
Figure 1
Figure 2

jun_sup2001.tar

AIX Alternate Disk Installation

Jeff Marsh

In this article, I will describe some tools within AIX (some new, some old) that can help you reduce the off-hours time spent by your administration staff during maintenance upgrades. I will also show you some uses for these same toolsets that can help you reduce recovery times due to rootvg corruption.

Alternate Disk Installation

What is it? According to the IBM AIX Installation Guide:

"Alternate disk installation, available in AIX Version 4.3, allows installing the system while it is up and running, allowing installation or upgrade down time to be decreased considerably."

Thus, with another set of bootable drives within a server, you can install maintenance (e.g., upgrade your system from AIX 4.3.3.04 to AIX 4.3.3.06) during the day without interruption or any effects to the running applications. However, you will still need a reboot to make it active.

The support model prior to Alternate Disk Installation required all work to be done off-hours during an application maintenance window that generally took two to four hours. Now you can reduce that off-hour time from two to four hours per server to just the time to reboot. I'll also show you how you can complete multiple upgrades in that same reboot window using Network Installation Manager (NIM).

Requirements

To enable Alternate Disk Installation, you need to install the following base-level filesets and upgrade to at least these corresponding fileset levels. These filesets do not require a reboot to install:

Base level filesets:            Fileset levels:
bos.alt_disk_install.rte              26
bos.alt_disk_install.boot_images      27

You will also need another free, bootable drive within your server. In this case, you are configuring new servers with four internal drives for systems administration purposes: two drives for the primary rootvg mirrored, and two for alt_disk_install implementations. You could get by with just one additional drive, but we prefer to have two.

How It Works

Alternate Disk Installation works by cloning your primary rootvg running on hdisk0 and hdisk1, for example, to a second set of drives, hdisk2 and hdisk3. After the system completes those copies using basic find, backup, and restfile utilities, it will install the latest maintenance level you designate.

This process is shown in Figure 1. First, you clone hdisk0/1 to hdisk2/3, and then you apply maintenance to the newly cloned hdisk2/3 while the applications continue to run against hdisk0/1.

To complete this task from SMIT, issue the following fast path. You should expect to see the following panels:

smitty alt_clone

Clone the rootvg to an Alternate Disk:
Type or select values in entry fields.
Press Enter AFTER making all desired changes.

* Target Disk(s) to install                      [hdisk2 hdisk3]
  Phase to execute                               all
+
  image.data file                                []
/
  Exclude list                                   []
/
   Bundle to install                             [update_all]
+
    -OR-
  Fileset(s) to install                          []
  
  Fix bundle to install                          []
    -OR-
  Fixes to install                               []
  
  Directory or Device with images                [/mnt]
  (required if filesets, bundles or fixes used)
  
  installp Flags
  COMMIT software updates?                       yes
+
  SAVE replaced files?                           no
+
  AUTOMATICALLY install requisite software?      yes
+
  EXTEND file systems if space needed?           yes
+
  OVERWRITE same or newer versions?              no
+
  VERIFY install and check file sizes?           no
+
  Customization script                           []
/
  Set bootlist to boot from this disk
  on next reboot?                                yes
  
  Reboot when complete?                          no
+
  Verbose output?                                no
+
  Debug output?                                  no
+
[BOTTOM]
  
F1=Help             F2=Refresh          F3=Cancel
F4=List
F5=Reset            F6=Command          F7=Edit
F8=Image
F9=Shell            F10=Exit            Enter=Do

From the example above, note the following:

You are cloning to hdisk2 and hdisk3.
You are running an update_all operation of maintenance mounted in the /mnt mount point. (In this case, this is a CD-ROM with the AIX 4.3.3.06 maintenance filesets.)
You are specifying that this operation should change our bootlist to hdisk2 and hdisk3 after completion.
You are not asking the process to complete an immediate reboot upon completion of the upgrade because this is something you want to schedule in an appropriate maintenance window.

At the completion of the operation, you will notice from the bootlist -m normal -o command that the bootlist will be set to hdisk2 hdisk3, and issuing an lspv command will show the following:

root@aknimp1:/> lspv
hdisk0         000f261d90bf6ea0    rootvg         
hdisk1         000f261dae86d104    rootvg         
hdisk2         000f261db52d4d95    altinst_rootvg
hdisk3         000f261db52d4ca6    altinst_rootvg
hdisk4         000f018d07d4f412    None           
hdisk5         000f261dbde71c66    None           
hdisk6         000f261dbd8eea89    nimresvg

At this point, you have cloned and installed the latest AIX maintenance level during the day. You are now ready to activate that latest maintenance with a reboot operation at whatever time is appropriate for the outage to your application users. You can save significant off-hours time for maintenance upgrades; our off-hours time has been reduced to the time needed for a simple reboot.

Alternate Disk Installation -- After the Reboot

After the reboot, issue the oslevel command or complete the appropriate verifications to ensure your maintenance upgrade occurred as expected. If you issue the lspv command, you will notice the following:

root@aknimp1:/> lspv
hdisk0         000f261d90bf6ea0    old_rootvg     
hdisk1         000f261dae86d104    old_rootvg     
hdisk2         000f261db52d4d95    rootvg         
hdisk3         000f261db52d4ca6    rootvg         
hdisk4         000f018d07d4f412    None           
hdisk5         000f261dbde71c66    None           
hdisk6         000f261dbd8eea89    nimresvg

Both hdisk2 and hdisk3, from which you have booted, now show a volume group identifier of rootvg. Hdisks 0 and 1 now show a volume group of old_rootvg and are varied off.

Now, you have several options. My preference is to leave hdisk0 and hdisk1 alone with the old maintenance levels in case you need to fall back on them.

Let's assume that after the reboot your applications aren't working well with the latest maintenance. The previous support model suggests that you need to get the mksysb backup taken prior to your upgrade and begin a restore process. This could take two hours or more, with the hope that the tape image was good. The new support model with Alternate Disk Installation says to change your bootlist back to hdisk0 and hdisk1 and to reboot the server. At some future point, when you decide the maintenance is good and you don't need to fall back, you can clone the latest maintenance residing on hdisk2/3 back to hdisk0/1.

Cloning Back to hdisk0/1

To complete the cloning of hdisk2/3 back to hdisk0/1, you must issue the following commands:

alt_disk_install -W hdisk0 hdisk1 -- Wakes up the old_rootvg
alt_disk_install -S -- Puts the old_rootvg back to sleep
alt_disk_install -X old_rootvg -- Removes the old_rootvg volume group name associated with hdisk0/1 from the ODM and assigns them a value of "none", which will allow the cloning to recur cleanly.
smitty alt_clone -- Reclone back to hdisk0/1 using the previous example.

I will discuss using the above commands further in the next section; however, in order to reclone drives that have been previously used to boot from, you must follow the commands verbatim to remove the knowledge of the old_rootvg volume group name from the ODM.

Other Uses for alt_disk_install

Some other items that alt_disk_install may be helpful with are:

Nightly backup of your system -- Using alt_disk_install, you can backup your system nightly (or at whatever frequency is appropriate) without having to manage mksysb tapes. If you suffer some type of rootvg corruption, either major or minor, you can restore using the data on the cloned drives.
mksysb Images -- The alt_disk_install command can be used to install images (AIX 4.3 or later) onto AIX 4.1 and later versions.
You can also use alt_disk_install for recovery of corrupted files in rootvg and to reduce the size of logical volumes in rootvg, as described in the following sections.

Recovery of Corrupted Files in rootvg

If you suffer major corruption (hdisk failure), and the server crashes, and if you have cloned that data to another bootable drive, you could interface with SMS, for example, to change your bootlist to your other cloned drives and quickly recover the server.

If you suffer minor corruption within the rootvg where a file or a few files are corrupted or inadvertently deleted, you can wake up the cloned copy of the rootvg and copy those deleted or corrupted files back to the primary rootvg while the server is up and running.

In this example, you are booted against hdisk0/1 and have recently cloned the system to hdisk2/3. To access the cloned copy of the rootvg while the server is up and running, complete the following:

1. alt_disk_install -W hdisk2 hdisk3 -- Wakes up the cloned copy:

root@aknimp1:/> alt_disk_install -W hdisk2 hdisk3
Waking up altinst_rootvg volume group ...
Replaying log for /dev/alt_hd4.

2. From a df -k command, you will notice that the wake up command has mounted the alternate rootvg logical volumes, which are prefaced with /alt_inst prefix:

root@aknimp1:/> df -k
Filesystem    1024-blocks    Free %Used  Iused %Iused Mounted on
/dev/hd4            49152    5608   89%   1226     5% /
/dev/hd2           753664    5056  100%  19966    11% /usr
/dev/hd9var         16384   14340   13%    222     6% /var
/dev/hd3            32768   30376    8%     98     2% /tmp
/dev/lvexport      131072  126772    4%     41     1% /export
/dev/lv01         4980736   94468   99%   4546     1% /export/lpp_source
/dev/lv02          917504  448868   52%  29468    13% /export/spot
/dev/lvmksysb    15204352 3381328   78%     31     1% /export/mksysb
/dev/lvadmin       131072  126868    4%     25     1% /admin
/dev/hd1            16384   15820    4%     20     1% /home
/dev/lvadsm         16384      56  100%     21     1% /var/adsm
/dev/alt_hd4        49152    5704   89%   1192     5% /alt_inst
/dev/alt_lvadmin   131072  126868    4%     25     1% /alt_inst/admin
/dev/alt_hd1        16384   15820    4%     20     1% /alt_inst/home
/dev/alt_hd3        32768   30376    8%     98     2% /alt_inst/tmp
/dev/alt_hd2       753664    5056  100%  19966    11% /alt_inst/usr
/dev/alt_hd9var     16384   14380   13%    219     6% /alt_inst/var
/dev/alt_lvadsm     16384    1848   89%     20     1% /alt_inst/var/adsm

3. Copy the corrupted files from the appropriate alt_inst logical volume/filesystem. In this case, I corrupted my /etc/hosts file, so I will issue the following command to restore it from my latest cloned backup:

cp /alt_inst/etc/hosts /etc/hosts

4. When you have restored the required files, put the altinst_rootvg back to sleep, which will unmount the /alt_inst logical volumes/filesystems by issuing:

alt_disk_install -S

Reducing Logical Volumes Size Within the rootvg

Remember the pain associated with the need to reduce the size of a logical volume within the rootvg? It took a tape restore of the system to complete. Now, you can complete that reduction within a simple cloning process. The steps to complete that process are as follows:

1. Issue a mkszfile command to create the /image.data file.

2. Edit the /image.data file and specify SHRINK=yes in the logical_volume_policy stanza:

image_data:
        IMAGE_TYPE= bff
        DATE_TIME= Tue Oct 3 10:29:55 CDT 2000
        UNAME_INFO= AIX aknimp1 3 4 000F261D4C00
        PRODUCT_TAPE= no
        USERVG_LIST= nimresvg
        OSLEVEL= 4.3.3.10

logical_volume_policy:
        SHRINK= yes
        EXACT_FIT= no

ils_data:
        LANG=  en_US

3. Clone the rootvg to hdisk2 and hdisk3, specifying your customized /image.data file by issuing one of the following commands:

sm itty alt_clone (remember to specify the location of your image.data file on the image.data file prompt)

al t_disk_install -i/image.data -B -C hdisk2 hdisk3 (from the command line)

4. After the completion of the cloning operation, wake up the altinst_rootvg by issuing:

alt_disk_install -W hdisk2 hdisk3

5. Review your df -k output and compare the primary logical volume sizing to their /alt_inst counterparts.

6. If you are satisfied with the sizing reduction, change your bootlist (bootlist -m normal hdisk2 hdisk3) and reboot.

Network Installation Managment (NIM)

I want to briefly discuss NIM and show how well it interfaces with alternate disk installation. It can easily help you to manage upgrades on a group of servers, thus saving you even more time.

What Is NIM?

Paraphrasing from the AIX Network Installation Management Guide and Reference, "NIM is a base component of AIX and permits and aids in the installation and maintenance of AIX, it's basic operating system, and additional software and fixes that may be supplied over the network. NIM provides for the customization of machines both during and after installation. As a result, NIM has eliminated the reliance of the systems administration staff on tapes and CD-ROMs for software installation and maintenance."

In this case, you are using NIM to centrally manage a group of standalone machines (NIM clients) from a centrally located network attached to a NIM master. From the NIM master, you can manage operating system installations, maintenance upgrades, mksysb images for backup and recovery, installation of new servers (cloning), and the re-installation of existing servers in case of a disaster.

There's a great deal of functionality provided by NIM. I recommend reviewing the usage guide to see what NIM features could benefit your environment. I also recommend a good Redbook from IBM, NIM: From A to Z in AIX 4.3 (SG24-5524-00), which was published in February 2000.

I won't cover the specifics of setting up the NIM master and the corresponding NIM client configurations; it is not an overly complicated process. However, it will require someone with NIM-specific knowledge to lay out the functional NIM environment. If you support SP complexes, you have already had a fair amount of exposure to NIM even though it is buried one layer below PSSP.

One key feature of NIM that will help manage a group of servers concurrently is the Machine Group definition. Within NIM, you can operate as easily on a single machine as you can a group of machines. For instance, we have defined several machine groups within our NIM master environment. These definitions allow us to operate on a group of like servers concurrently.

How Does It Integrate with Alternate Disk Installation?

NIM knows how to fully exploit Alternate Disk Installation. For example, look at the initial clone and update_all operation. Let's say you want to use NIM to extend the model (instead of upgrading the maintenance level on a single server) and you want to complete this operation on ten Lotus Notes servers that are similarly configured and are defined in a Notes machine group within NIM. From SMIT on the NIM master, issue the following fast path and you will see this panel:

smitty nim_alt_clone 

Clone the rootvg to an Alternate Disk

Type or select values in entry fields.
Press Enter AFTER making all desired changes.
                                                   [Entry Fields]
* Target Machine / Group to Install                [NOTES]              +
* Target Disk(s) to install                        [hdisk2 hdisk3]
  Phase to execute                                 all                  +
  IMAGE_DATA resource                              []                   +/
    EXCLUDE_FILES resource                         []                   +/
     (leave blank to include all files in backup)
  
  BUNDLE to install                                []                   +
    -OR-
  Fileset(s) to install                            []
  
  FIX_BUNDLE to install                            []                   +
    -OR-
  FIXES to install                                 [update_all]
  
  LPP_SOURCE                                       [aix433_lppsource]   +
  (required if filesets, bundles or fixes used)
  
  installp Flags
    COMMIT software updates?                       yes                  +
    SAVE replaced files?                           no                   +
    AUTOMATICALLY install requisite software?      yes                  +
    EXTEND filesystems if space needed?            yes                  +
    OVERWRITE same or newer versions?              no                   +
    VERIFY install and check file sizes?           no                   +
  Customization SCRIPT resource                    []                   +/
  Set bootlist to boot from this disk
  on next reboot                                   yes                  +
  Reboot when complete?                            no                   +
  Verbose output?                                  no                   +
  Debug output?                                    no                   +
  Group controls (only valid for group targets):
    Number of concurrent operations                []                   #
    Time limit (hours)                             []                   #

F1=Help             F2=Refresh           F3=Cancel            F4=List
F5=Reset            F6=Command           F7=Edit              F8=Image
F9=Shell            F10=Exit             Enter=Do

In this example, you would cause every server defined in the Notes Machine group to begin a process to clone itself from hdisk0/1 to hdisk2/3. At the completion of the cloning operation, NIM would then NFS-mount the aix433_lppsource resource (in this case, it's the AIX 4.3.3 lppsource filesystem, which includes the 4.3.3.06 maintenance) and apply it to the newly cloned hdisk2/3 on each of these servers. This also instructs NIM to change the bootlist on each of these servers as a part of the operation but does not cause an immediate reboot. I recommend, however, using NIM to schedule a reboot of all these servers during the maintenance window.

All of this work, including the cloning and upgrading of the maintenance level, can be completed during the day without affecting the running application (e.g., Notes). For the previous support model, this same upgrade would have taken about 2 hours per server plus reboot time to complete during an application maintenance window, generally in the middle of the night. If a single person worked to complete this process, this could have taken about 25 hours spread across multiple weekends to complete. With NIM and Alternate Disk Installation, this upgrade outage can be reduced to the time to reboot these 10 servers concurrently (or about 30 minutes, in our case). Note that your time may vary depending on speed of network, number of filesets being updated, time to reboot, and problems encountered.

Figure 2 shows the process using NIM/Machine Groups and Alternate Disk Installation. First, you instruct the NIM master to have each of the servers in the defined machine group clone hdisk0/1 to hdisk2/3 (depicted in red). Then, NIM will NFS-mount the appropriate LPPSOURCE filesystem containing the AIX 4.3.3.06 maintenance level and apply that maintenance to the newly cloned drives (operation in green). Again, this process happens concurrently on all servers in the defined NIM machine group without affecting the running applications.

Conclusion

My team is in the process of rolling out this methodology change. I think we can significantly reduce the amount of time spent in support of our current AIX standalone infrastructure. I also think Alternate Disk Installation and NIM, can help you better manage your infrastructure and provide some consistency to your installation, upgrade, maintenance, and build procedures. In conclusion, I hope the above discussion will help you significantly reduce the amount of off-hours time associated with maintenance or fileset upgrades within AIX.

Jeff Marsh is the Systems Advisor to the UNIX Server Team working at American Century Investments, a premier investment manager serving nearly two million individual and institutional investors. Jeff can be contacted at: jeffrey_marsh@americancentury.com.