Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


4.5.1 Detection of host failures and host reset (multi-standby)

This subsection describes some of the differences between when the multi-standby function is used compared to when it is not used, with respect to detection of a host failure and host reset, detection of a host failure and SCSI reservation, and the function for controlling hot standby based on the availability of LAN communications.

In a hot-standby configuration using the multi-standby function, multiple standby servers are run in multiple standby systems. Therefore, in the event of a failure, such as a monitoring path failure, multiple standby systems might perform the hot standby operation at the same time. When the multi-standby function is used to prevent such concurrent hot standby operations, HA Monitor monitors hosts even among standby systems.

Organization of this subsection

(1) Host reset

If a host failure is detected in a standby system, the host that detected the host failure resets the host on which is running a standby server with a higher priority than the local host's standby server. The following figure shows the processing flow for monitoring host status and resetting hosts when the multi-standby function is used.

Figure 4‒27: Processing flow for monitoring host status and resetting hosts (when the multi-standby function is used)

[Figure]

In the figure above, standby system 2 that has a standby server with priority 2 resets standby system 1 because the priority of the standby server resulting in the failure is 1.

If a reset attempt from standby system 2 fails or the reset operation fails, the actual host that will issue a reset differs. For details about the processing flow for issuing a reset in the event a reset attempt from a standby system has failed, see 4.2.3 Host reset when there are multiple standby systems.

(2) SCSI reservation

If a configuration is split into multiple configuration segments due to a monitoring path failure, hot standby processing is started for each configuration segment. The host that will continue with jobs as the active system is the host that contains the server with the highest priority within the configuration segment that contains the server with the lowest priority.

The following figure shows the flow of hot standby processing when failures occur on multiple hosts at the same time.

Figure 4‒28: Hot standby processing when failures occur on multiple hosts at the same time (when the multi-standby function is used)

[Figure]

In this figure, HA Monitor detects a failure on the active server and switches from host 1 in the active system to host 2, which has the highest priority. HA Monitor then switches from host 2 to host 3 when a failure is detected on the host that has the highest priority including the active server.

(3) Function for controlling hot standby based on the availability of LAN communications

If a host failure is detected on the active system or the standby system, HA Monitor checks whether LAN communication is available. HA Monitor then runs the highest-priority standby server as the active server on a host on which LAN communication is available, leaving the other standby servers on standby. In addition, it terminates the server on any host on which LAN communication is not available.

The following figure shows the general flow of operations when a host failure is detected due to a monitoring path failure, and communication is not available via a business application LAN in the active system, but communication is available via a business application LAN in the standby system.

Figure 4‒29: Hot standby operation when communication is not available via a business application LAN in the active system, but communication is available via a business application LAN in the standby system

[Figure]

The following figure shows the general flow of operations when a host failure is detected due to a monitoring path failure, and communication is available via a business application LAN in the active system, but communication is not available via a business application LAN in the standby system.

Figure 4‒30: Hot standby operation when communication is available via a business application LAN in the active system, but communication is not available via a business application LAN in the standby system

[Figure]