Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


1.2.4 Failures detected by HA Monitor

This subsection explains the failures that are detected by HA Monitor.

HA Monitor performs hot-standby switchover when detecting server failures that occur on servers, host failures that occurs on hosts, or resource failures that occur on resources (LANs or disks).

Organization of this subsection

(1) Range of monitoring performed by HA Monitor

To detect server failures, host failures, and resource failures, HA Monitor monitors the following items:

#: System disks and disks for business use are monitored.

The following figure shows the range of monitoring performed by the HA Monitors in the system configuration.

Figure 1‒3: Range of monitoring performed by HA Monitor

[Figure]

(2) Server failures

Server failures can be classified as those that the server itself detects and those that the server cannot detect.

(a) Failures that the server itself detects

Failures that the server detects include the following:

  • Server's own logical errors

  • Any status that disables server operation because of a failure in a resource (such as a disk device)

In the event of a failure that the server itself detects, the server notifies HA Monitor of the failure and terminates abnormally. The HA Monitor that receives the failure notification executes hot standby. During hot standby processing, the standby server uses its own recovery function based on the inherited information to restart operation and inherits job processing.

(b) Failures that the server cannot detect

Failures that the server cannot detect include server slowdown.

HA Monitor detects the failures that the server itself cannot detect. When HA Monitor detects such a failure, it terminates the erroneous server and then executes hot standby.

(c) Difference in detectable server failures depending on the server operating mode

Detectable failures depend on whether the server is running in the server mode or the monitor mode. The following table shows the server failure detection methods and their availability in each server operating mode.

Table 1‒2: Availability of server failure detection methods in each server operating mode

Detection method

Server mode

Monitor mode

The server detects the failure and notifies HA Monitor.

Y

Y

(Creation of a server monitoring command is required#1)

HA Monitor monitors the server and detects failures.

Y

N#2

Legend:

Y: Can be detected

N: Cannot be detected

#1

In the monitor mode, in order to have HA Monitor perform the hot standby operations automatically in the event of a failure that the server itself can detect, you must create a server monitoring command. For details, see 3.2.1 Monitoring a server in the monitor mode (for the active server). Note that when you use the monitor-mode program management function to monitor UAPs, there is no need to implement UAP monitoring processing in the server monitoring commands. For details, see 3.6.1 Monitoring UAPs.

#2

You can use the monitor-mode program management function to monitor UAPs.

(3) Host failures

Host failures occur on the portion of the host that is not the server. HA Monitor assumes that a host failure occurred upon disruption of alive messages. Host failures include the following:

If any of the preceding failures disable communications through all monitoring paths or are reported by the failure management processor, HA Monitor detects a host failure, terminates the erroneous host, and then executes hot-standby switchover.

(4) Resource failure

A failure that occurs on a hardware component in the system is called a resource failure. HA Monitor performs a hot-standby switchover when detecting any of the following failures:

When HA Monitor detects a LAN failure or disk failure, it switches the active host by planned hot-standby switchover or by host pair shutdown, in the same way as when a server failure or host failure is detected.

(a) LAN failure

A failure that occurs on a LAN is called a LAN failure. HA Monitor can detect a LAN failure by monitoring whether communication is possible over the LAN. For details, see 3.4.1 LAN monitoring and automatic hot standby in the event of a failure.

(b) Disk failure

A failure that occurs on a disk is called a disk failure. HA Monitor can detect a disk failure by monitoring whether a write to the disk is possible. For details, see 3.4.2 Disk monitoring.