Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


3.3.6 Physical partition reset function in a virtualization environment

HA Monitor enables you to use Hitachi server virtualization (Virtage) or VMware ESXi to perform the hot standby operation in a virtualization environment.

In BladeSymphony with Hitachi server virtualization (Virtage) installed, you can divide the processor into logical partitions (LPARs) and perform the hot standby operation in units of LPARs. An environment that uses Hitachi server virtualization to run LPARs is called the LPAR mode, and an environment that does not use Hitachi server virtualization is called the Basic mode. In this manual, a BladeSymphony environment in which Hitachi server virtualization is not installed is generally referred to as the Basic mode.

If you use VMware ESXi, you can use a single processor to run multiple virtual machines and perform the hot standby operation in units of virtual machines.

This subsection explains the physical partition reset function that can be used as an option in these virtualization environments in comparison with the normal hot standby operation.

Organization of this subsection

(1) Normal hot standby operation

HA Monitor detects host failures for each LPAR in the LPAR mode and each virtual machine when VMware ESXi is used. By default, if a host failure occurs in the active system, the HA Monitor for the standby system resets only the LPAR or virtual machine where the HA Monitor for the active system is running. The figure below shows the normal hot standby operation in the event of a host failure in the active system in the LPAR mode. When VMware ESXi is used, the normal hot standby operation in the event of a host failure in the active system is the same as for the LPAR mode. Simply replace LPAR with virtual machine.

Figure 3‒25: Hot standby operation in the LPAR mode

[Figure]

(2) Operation when the physical partition reset function is used

Normally if HA Monitor's host reset operation fails, the target host is placed in a wait state that requires user intervention.

If a reset operation on a host (LPAR or virtual machine) fails, the physical partition reset function determines that one of the failures listed below has occurred and resets the entire processor containing the erroneous host (LPAR or virtual machine):

When this function is used, HA Monitor performs the reset operation regardless of whether another LPAR or virtual machine is running in the processor.

Resetting of the entire processor occurs when the host resulting in a failure and the host detecting the failure are located in different processors. If the host resulting in a failure and the host detecting the failure are located in the same processor, the entire processor is not reset, in which case the target host will wait for user intervention.

When the physical partition reset function is used, HA Monitor gives precedence to performing the hot standby operation for an LPAR or virtual machine resulting in a failure over continuing operation in other LPARs or virtual machines in the same processor. As a result, the standby server on the target host starts as the active server without having to wait for user intervention.

The figure below shows the operation when the physical partition reset function is used in the LPAR mode. When the physical partition reset function is used with VMware ESXi, operation is the same as for the LPAR mode. Simply replace LPAR with virtual machine.

Figure 3‒26: Operation using the physical partition reset function

[Figure]

If the hot standby operation is performed by using the physical partition reset function, the entire processor is reset. This means that all other LPARs (virtual machines) in the same processor are reset at the same time.

Before using this function, you must check that the processor subject to the reset operation does not contain any LPAR or virtual machine that is more important in terms of job processing than hot standby. If such an LPAR or virtual machine is running in the processor, do not use this function. For details about the HA Monitor processing when the physical partition reset function is not used and a host reset has failed, see 4.2.4 Processing in the event a host reset has failed.

(3) Required environment settings

You use the partition_reset operand in the HA Monitor environment settings to specify whether the physical partition reset function is to be used.