Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


4.2.1 How to determine a reset issuing host

In HA Monitor, if a host failure occurs on the active system, its standby system performs a host reset on the active system and then performs the hot standby operation. Depending on the hot-standby configuration, multiple standby systems might issue a host reset, thereby delaying recovery of the active system. For this reason, you must designate in advance the standby system that is to perform host reset. The standby system designated to perform host reset is called the reset issuing host. For details about the configuration in which multiple standby systems might issue a host reset, see 3.3.3 Preventing multiple resets on a host.

This subsection explains how to designate the reset issuing host and the processing performed by the reset issuing host in the event of a host failure.

Organization of this subsection

(1) How to designate the reset issuing host

At the time of system startup, the operator starts the active server in the active system. In the standby system, the operator starts the standby server corresponding to the active server. When the first combination (pair) of active and standby systems is established, that standby system becomes the reset issuing host for the paired active system. If a host failure occurs in the active system, the first standby system paired with the active system (reset issuing host) resets the active system. Even if the same active system is paired with another standby system, only the first standby system paired with that active system can reset the active system as the reset issuing host.

If the servers are terminated and the pair relationship is no longer valid between the active system and the first standby system that became the reset issuing host, the active system is paired with a new standby system and that standby system becomes the reset issuing host. If multiple pairs are established between active and standby systems, the standby system with the lowest host address becomes the reset issuing host thereafter.

The following figure shows how HA Monitor designates the reset issuing host.

Figure 4‒10: How HA Monitor designates the reset issuing host

[Figure]

In this figure, when active server 1 is terminated and the pair relationship between this active system and standby system 1 is no longer valid, one of the standby systems that are paired later, standby system 2 or standby system 3, becomes the reset issuing host. In this example, standby system 2 becomes the reset issuing host because its host address is smaller than that of standby system 3.

(2) Processing by the reset issuing host in the event of a host error

The reset issuing host resets the active system and then notifies the other standby systems of completion of the reset operation. Upon receiving the reset completion message, a standby system that is not the reset issuing host switches the active server in the active system that was reset over to the standby server on its local host.

If a standby system that is not the reset issuing host detects a host failure in the active system, the standby system waits without doing anything until it receives a reset completion message from the reset issuing host. If the standby system does not receive a reset completion message within 40 seconds (80 seconds when VMware ESXi-based virtualization is used) for a reason such as a failure in the monitoring LAN, it determines that the reset from the reset issuing host has failed, in which case the standby system that detected the host failure in the active system resets the active system.

The following figure shows the processing by the reset issuing host in the event of a host failure.

Figure 4‒11: Processing by the reset issuing host in the event of a host failure

[Figure]

The following provides the details of the processing by the reset issuing host in the event of a host failure, where the item numbers correspond to the numbers in the figure:

  1. Send a reset instruction.

    When the standby system (reset issuing host) detects a host failure, it resets the active system.

  2. Send a reset completion message.

    When the reset operation has been completed, the reset issuing host sends a reset completion message to the other standby systems.

  3. Perform the hot standby operation.

    When the reset completion message is received, each active server in the active system is switched over to the corresponding standby system.

The following figure shows the processing when a standby system that is not the reset issuing host has detected a host failure in the active system.

Figure 4‒12: Processing when a standby system that is not the reset issuing host has detected a host failure in the active system

[Figure]

The following provides the details of the processing when a standby system that is not the reset issuing host has detected a host failure in the active system, where the item numbers correspond to the numbers in the figure:

  1. No reset completion message has been received.

    If a host that is not the reset issuing host detects a host failure in the active system, it waits for a reset completion message from the reset issuing host.

  2. Send a reset instruction.

    If no reset completion message is received, for a reason such as a failure in the monitoring LAN, the host that detected the failure resets the active system.

The amount of time the host waits for the reset completion message is 40 seconds (80 seconds if VMware ESXi-based virtualization is used).