Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


4.2.2 Hot standby operation when both hosts detect a failure at the same time

If the active and standby systems both detect a failure at the same time, both systems might issue a host reset, resulting in termination of both systems. To prevent this, HA Monitor provides a function for determining the host that is to perform a host reset first (host that has reset priority). For details about configurations in which active and standby systems might both be terminated, see 3.3.1 Preventing concurrent host resets.

This subsection explains how HA Monitor performs the hot standby operation when the active and standby systems both detect a failure at the same time. The following figure shows the HA Monitor processing flow in the event both hosts detect a host failure at the same time.

Figure 4‒13: Processing flow when both hosts detect a host failure at the same time

[Figure]

#1

This setting is specified with the standbyreset operand in the HA Monitor environment settings.

#2

This setting is specified with the cpudown operand in the HA Monitor environment settings. If online is specified, the active system becomes the host that has reset priority. If standby is specified, the standby system becomes the host that has reset priority. If system is specified, the host that has reset priority is not determined.

#3

This setting is specified with the address operand in the HA Monitor environment settings.

#4

If VMware ESXi-based virtualization is used, the system waits for 40 seconds.

The following subsections explain the differences in the hot standby operation depending on whether the user specifies the host that has reset priority as indicated in the figure.

Organization of this subsection

(1) Hot standby operation when the user specifies the host that has reset priority

If the HA Monitor environment settings provide that the user specifies the host that has reset priority and both hosts detect a failure at the same time, the host that has reset priority resets the remote host. A host that does not have reset priority waits for 20 seconds (40 seconds if VMware ESXi-based virtualization is used) after it has detected a failure and then issues a reset request. Note that even if a host has been defined as having reset priority, its reset priority might change depending on whether the local host has more running servers than the remote host.

(2) Hot standby operation when the user does not specify the host that has reset priority

If the HA Monitor environment settings provide that the user does not specify the host that has reset priority and both hosts detect a failure at the same time, HA Monitor issues a reset request from both hosts. In this case, concurrent host resets are prevented because only the first reset request accepted by SVP is processed. In the operation mode in which the host that has reset priority is not specified, the time required for the hot standby operation is reduced because neither host waits for issuance of a reset.

You can choose this operation mode when you employ a mutual hot-standby configuration, use BladeSymphony, and SVP supports the cluster management function. For details about the cluster management function, see the hardware-related documentation. The user must specify the host that has reset priority in the case where you want to perform hot standby switching to one of the hosts in the event of a failure on all monitoring paths.

If the user does not specify the host that has reset priority, they must specify the reset path and SVP settings. For details, see 6.7.1 Information required for configuration (BladeSymphony) and (7) Mutual hot-standby configuration (when a host with reset priority is not specified).