1.2.4 Failures detected by HA Monitor
This subsection explains the failures that are detected by HA Monitor.
HA Monitor performs hot-standby switchover when detecting server failures that occur on servers, host failures that occurs on hosts, or resource failures that occur on resources (LANs or disks).
- Organization of this subsection
(1) Range of monitoring performed by HA Monitor
To detect server failures, host failures, and resource failures, HA Monitor monitors the following items:
-
Servers
-
Hosts
-
Resources (LANs and disks#)
#: System disks and disks for business use are monitored.
The following figure shows the range of monitoring performed by the HA Monitors in the system configuration.
(2) Server failures
Server failures can be classified as those that the server itself detects and those that the server cannot detect.
(a) Failures that the server itself detects
Failures that the server detects include the following:
-
Server's own logical errors
-
Any status that disables server operation because of a failure in a resource (such as a disk device)
In the event of a failure that the server itself detects, the server notifies HA Monitor of the failure and terminates abnormally. The HA Monitor that receives the failure notification executes hot standby. During hot standby processing, the standby server uses its own recovery function based on the inherited information to restart operation and inherits job processing.
(b) Failures that the server cannot detect
Failures that the server cannot detect include server slowdown.
HA Monitor detects the failures that the server itself cannot detect. When HA Monitor detects such a failure, it terminates the erroneous server and then executes hot standby.
(c) Difference in detectable server failures depending on the server operating mode
Detectable failures depend on whether the server is running in the server mode or the monitor mode. The following table shows the server failure detection methods and their availability in each server operating mode.
Detection method |
Server mode |
Monitor mode |
---|---|---|
The server detects the failure and notifies HA Monitor. |
Y |
Y (Creation of a server monitoring command is required#1) |
HA Monitor monitors the server and detects failures. |
Y |
N#2 |
- Legend:
-
Y: Can be detected
N: Cannot be detected
- #1
-
In the monitor mode, in order to have HA Monitor perform the hot standby operations automatically in the event of a failure that the server itself can detect, you must create a server monitoring command. For details, see 3.2.1 Monitoring a server in the monitor mode (for the active server). Note that when you use the monitor-mode program management function to monitor UAPs, there is no need to implement UAP monitoring processing in the server monitoring commands. For details, see 3.6.1 Monitoring UAPs.
- #2
-
You can use the monitor-mode program management function to monitor UAPs.
(3) Host failures
Host failures occur on the portion of the host that is not the server. HA Monitor assumes that a host failure occurred upon disruption of alive messages. Host failures include the following:
-
Host's hardware failures or power failures
If dual hardware (such as processors and disks) is employed by using functionalities such as OS functions, reduced operation of a single hardware unit is not treated as a host failure.
-
Kernel failures
-
HA Monitor failures
-
Failures on all monitoring paths
-
Host slowdown
If any of the preceding failures disable communications through all monitoring paths or are reported by the failure management processor, HA Monitor detects a host failure, terminates the erroneous host, and then executes hot-standby switchover.
(4) Resource failure
A failure that occurs on a hardware component in the system is called a resource failure. HA Monitor performs a hot-standby switchover when detecting any of the following failures:
-
LAN failure
-
Disk failure
When HA Monitor detects a LAN failure or disk failure, it switches the active host by planned hot-standby switchover or by host pair shutdown, in the same way as when a server failure or host failure is detected.
(a) LAN failure
A failure that occurs on a LAN is called a LAN failure. HA Monitor can detect a LAN failure by monitoring whether communication is possible over the LAN. For details, see 3.4.1 LAN monitoring and automatic hot standby in the event of a failure.
(b) Disk failure
A failure that occurs on a disk is called a disk failure. HA Monitor can detect a disk failure by monitoring whether a write to the disk is possible. For details, see 3.4.2 Disk monitoring.