Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


4.7.1 UAP monitoring control

When you modify a UAP in such a manner that it can issue HA Monitor's APIs, you can use HA Monitor to monitor the UAP without having to create a UAP monitoring process in the server monitoring command.

The two types of APIs to be issued are for UAP monitoring start (hamon_patrolstart function) and for UAP monitoring stop (hamon_patrolstop function). For details about the APIs, see 10. APIs.

HA Monitor monitors a UAP by monitoring UAP processes and operation reports. You can choose to monitor both UAP processes and operation reports or only UAP processes. To specify the monitoring method, you use the patrol operand in the monitor-mode program environment definition and the hamon_patrolstart function. Select the monitoring method as appropriate for the types of failures you wish to have detected.

The following table lists the types of detectable failures and the available monitoring methods.

Table 4‒12: Detectable failures and available monitoring methods

Detectable failure

Monitoring processes

Monitoring operation reports

UAP abnormal termination

Y

Y

UAP slowdown

System call hang

N

Y

Page-in wait

N

Y

Scheduling wait

N

Y

Legend:

Y: Can be detected.

N: Cannot be detected.

The following subsections explain HA Monitor processing for monitoring UAP processes and operation reports.

Organization of this subsection

(1) Monitoring UAP processes

HA Monitor checks for a UAP process by connecting to the UAP when the hamon_patrolstart function is issued, and then by monitoring that connection. If the connection is lost, HA Monitor detects the loss of the process and determines that a UAP failure has occurred. HA Monitor monitors a UAP connection at one-second intervals.

If a program restart command has been created when HA Monitor detects a UAP failure, HA Monitor restarts the UAP; otherwise, HA Monitor performs necessary operations including restarting the server and hot standby processing. For details about HA Monitor processing that occurs after detection of a UAP failure, see (3) HA Monitor processing after detection of a UAP failure.

The following figure shows the processing flow when HA Monitor monitors UAP processes.

Figure 4‒36: Processing flow when HA Monitor monitors UAP processes

[Figure]

HA Monitor starts monitoring UAP processes once it has received the hamon_patrolstart function issued from the UAP. If there is no monitoring request from the UAP (the hamon_patrolstart function is not issued), HA Monitor does not detect UAP failures. If the start_timeout operand has been specified in the server environment definition, a timeout occurs when the monitor-mode server is started.

(2) Monitoring UAP operation reports

HA Monitor can detect failures, including UAP slowdowns, that are caused by the events listed below and that cannot be detected by the UAP itself. It does this by monitoring the operation reports that are sent to HA Monitor at specific intervals by APIs issued from the UAP.

You must specify a UAP operation report monitoring interval (program slowdown monitoring interval) in the patrol operand in the monitor-mode program environment definition. If no operation report is sent from the UAP within the specified program slowdown monitoring interval, HA Monitor determines that a UAP failure has occurred.

When HA Monitor detects a UAP failure, it terminates forcibly the process that issued the hamon_patrolstart function. If a program restart command has been created, HA Monitor restarts the UAP; otherwise, HA Monitor performs necessary operations including restarting the server and hot standby processing. For details about HA Monitor processing that occurs after detection of a UAP failure, see (3) HA Monitor processing after detection of a UAP failure.

If the UAP generates a process, you must specify in the program restart command and the server termination command the processing needed to terminate the generated process.

The following figure shows the processing flow when HA Monitor monitors UAP operation reports.

Figure 4‒37: Processing flow when HA Monitor monitors UAP operation reports

[Figure]

HA Monitor starts monitoring UAP operation reports once it has received the hamon_patrolstart function issued from the UAP. If there is no monitoring request from the UAP (the hamon_patrolstart function is not issued), HA Monitor does not detect UAP failures. If the start_timeout operand has been specified in the server environment definition, a timeout occurs when the monitor-mode server is started.

(3) HA Monitor processing after detection of a UAP failure

The HA Monitor processing that occurs after a UAP failure has been detected depends on the program and server definitions and on the server status.

The following figure shows the HA Monitor processing after detection of a UAP failure.

Figure 4‒38: HA Monitor processing after detection of a UAP failure

[Figure]

#1

Specify in the restartcommand operand in the monitor-mode program environment definition.

#2

Specify in the exec_retry operand in the monitor-mode program environment definition.

#3

Specify in the servexec_retry or switch_error operand in the server environment definition.

#4

Specify in the servexec_retry or switch_retry operand in the server environment definition.

(4) Monitoring a UAP consisting of multiple processes

If one UAP consists of multiple processes, HA Monitor can group those processes together and monitor them as a single UAP (a UAP defined in the monitor-mode program environment definition). HA Monitor monitors the individual processes that have issued the hamon_patrolstart function. If a failure occurs in one of the monitored processes, HA Monitor treats the failure as a UAP failure.

Note that the same program name must be set in the HAMON_UAPNAME environment variable for all the processes that issue the hamon_patrolstart function.

The following figure shows an example of monitoring the UAP operation reports of a UAP that consists of multiple processes.

Figure 4‒39: Example of monitoring the UAP operation reports of a UAP that consists of multiple processes

[Figure]