Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


6.25.4 Testing hot standby processing

After you have finished checking the operation of HA Monitors and servers, you must test hot standby processing. Perform this test after you have started HA Monitors and servers.

Organization of this subsection

(1) Checking planned hot standby

Execute the planned hot-standby switchover command (monswap command) to test planned hot-standby switchover. After that, execute the server and host status display command (monshow command).

Shown below is an example of the execution results of the server and host status display command (monshow command) when planned hot standby processing performed from host 1 to host 2 was successful. If the status of the target server is ONL, the planned hot standby processing was successful.

#/opt/hitachi/HAmon/bin/monshow
KAMN213-I Own host name : host2
    Own servers               Pair servers
    Alias     Status          Status     Host name
    server2   ONL

If you are using the multi-standby function, confirm that servers have started successfully in all secondary systems.

(2) Checking automatic hot standby in the event of a server failure

The user must generate a server failure to check whether HA Monitor executes automatic hot standby processing successfully. After the failure has been recovered, execute the server and host status display command (monshow command).

Shown below is an example of the execution results of the server and host status display command (monshow command) when automatic hot standby processing from host 1 to host 2 was successful. If the status of the target server is ONL, the automatic hot standby processing was successful.

#/opt/hitachi/HAmon/bin/monshow
KAMN213-I Own host name : host2
    Own servers               Pair servers
    Alias     Status          Status     Host name
    server2   ONL

(3) Checking automatic hot standby in the event of a program error

If a server is in the monitor mode and the program management function is used, the user must generate a program error to check whether HA Monitor performs hot standby processing automatically. After the error has been recovered, execute the server and host status display command (monshow command) with the -u option specified.

Shown below is an example of the execution results of the server and host status display command (monshow command) when automatic hot standby processing was successful after a program error. If the program status is ACT, the automatic hot standby processing was successful.

#/opt/hitachi/HAmon/bin/monshow -u
KAMN213-I Own host name : host2
     Alias     Program Alias  Status   Patrol time
     server2   UAP1            ACT        3600

(4) Checking automatic hot standby in the event of a host failure

The user must generate a host failure to check whether HA Monitor executes automatic hot standby processing successfully. If you have specified VMware ESXi-based virtualization in the reset path configuration, you must verify that the reset path is set correctly by generating a host failure on the local host and then checking whether the virtual machine is reset.

After the failure has been recovered, execute the server and host status display command (monshow command).

You can generate a host failure by forcibly terminating the host.

Shown below is an example of the execution results of the server and host status display command (monshow command) when automatic hot standby processing from host 1 to host 2 was successful. If the status of the target server is ONL, the automatic hot standby processing was successful.

#/opt/hitachi/HAmon/bin/monshow
KAMN213-I Own host name : host2
    Own servers               Pair servers
    Alias     Status          Status     Host name
    server2   ONL

If a file system is created on a shared disk, it might take time to perform a file system integrity check when the server is started. Note that a timeout might occur if the dev_timelimit, dev_offlimit, or dev_onlimit operand (shared-resource switchover timeout value) is specified in the server environment definition for a server that will use the file system. Therefore, perform hot-standby switchover resulting from a host failure to confirm that a timeout does not occur.

After you have performed hot standby processing resulting from a host failure, confirm that OS dump files are output successfully and that neither setting errors nor capacity shortages occur. For details about how to verify that dump file output processing was successful, see the Linux Tough Dump documentation.

If hosts are to be reset to protect data on shared disks, but a specified reset path is invalid, a wrong host might be reset. To check the validity of the reset path settings, generate a host failure on each host and confirm that hot standby processing is performed correctly. If you use the physical partition reset function in a virtualization environment created by either Hitachi server virtualization (Virtage) or VMware ESXi, check hot standby processing resulting from a failure in the guest OS as well as hot standby processing resulting from a host failure caused by a failure in the host OS and hardware.

If you use SCSI reservation for shared disk, confirm that all shared disk reservations were obtained in the active system after hot standby processing. For details about how to check the reservation status of shared disks, see 4. Obtain the reserved status for the disk in 7.5.7 Handling device failures on a shared disk (while the active server is running) (using SCSI reservation for shared disk).