Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


3.4.2 Disk monitoring

With disk monitoring of HA Monitor, hot-standby switchover can occur when access to the following disks fails:

This section describes monitoring of system disks and disks for business use.

Organization of this subsection

(1) System disk monitoring

If the system disk becomes inaccessible when system disk monitoring is not used, hot-standby switchover does not occur while the OS and HA Monitor is operating. If system disk monitoring is used, hot-standby switchover can occur when the system disk becomes inaccessible, regardless of whether the OS and HA Monitor can operate.

(a) System disk monitoring method

In system disk monitoring, a file created on the system disk is used as the monitoring target. HA Monitor periodically checks whether a write is possible to the monitoring-target file. Note that the monitoring target must be a file in a file system. For details about how to set the monitoring target, see (c) Required environment settings.

The following describes the times when HA Monitor monitors a system disk and describes the processing of monitoring.

Times when HA Monitor performs monitoring

The following shows when HA Monitor starts monitoring and when it stops monitoring.

  • Monitoring start time

    • When HA Monitor startup is completed

    • When system disk monitoring is restarted#

  • Monitoring end time

    • When HA Monitor stops

    • When system disk monitoring is temporarily stopped#

#

For details about temporarily stopping and restarting system disk monitoring, see 9.6 monchange (changes settings and operations while HA Monitor and servers are running).

Processing of monitoring

An attempt is made to write to the monitoring-target file at set intervals (the check interval). If a write to the monitoring-target file is successful, HA Monitor judges that the system disk is healthy and continues monitoring. The following figure shows the processing that is performed if a write to the monitoring-target file is successful.

Figure 3‒31: Processing performed if a write to the monitoring-target file is successful

[Figure]

If an attempt to write to the monitoring-target file fails, HA Monitor retries the attempt at retry intervals. If a write attempt fails a certain number of times (retry count), HA Monitor judges that access to the system disk failed. The following figure shows the processing that is performed if a write to the monitoring-target file fails. This figure is an example in which the retry count is set to 1.

Figure 3‒32: Processing performed if a write to the monitoring-target file fails (when the retry count is 1)

[Figure]

If the system disk does not respond to a write to the monitoring-target file, HA Monitor waits for a response until the period calculated by retry interval × (retry count + 1) elapses. If the system disk does not respond until the period calculated by retry interval × (retry count + 1) elapses, HA Monitor judges that access to the system disk failed. The following figure shows the processing that is performed if the system disk does not respond to a write to the monitoring-target file. This figure is an example in which the retry count is set to 1.

Figure 3‒33: Processing performed if the system disk does not respond to a write to the monitoring-target file (when the retry count is 1)

[Figure]

(b) Processing performed if access to the system disk fails

The processing that HA Monitor performs differs depending on the type of the node for which system disk access failed and whether hot-standby switchover is possible. The following shows possible patterns of processing performed when system disk access fails.

If the primary system fails to access the system disk when hot-standby switchover is possible

Processing is performed as follows:

  1. The primary system notifies the secondary system that access failed.

  2. The primary system stops.

    If the method of hot-standby switchover triggered by host reset is selected, host reset occurs. If another method is selected, an OS panic is caused.

  3. The secondary system performs hot-standby switchover based on the selected method.

The following figure shows the preceding operation.

Figure 3‒34: Processing performed if the primary system fails to access the system disk when hot-standby switchover is possible

[Figure]

If the secondary system fails to access the system disk when hot-standby switchover is possible

Processing is performed as follows:

  1. The secondary system notifies the primary system that access failed.

  2. The secondary system stops.

    If the method of hot-standby switchover triggered by a host reset is selected, a host reset occurs. Note that the host reset occurs independently of the setting of the standbyreset operand in the HA Monitor environment settings. If a method other than the method triggered by host reset is selected, an OS panic is caused.

The following figure shows the preceding operation.

Figure 3‒35: Processing performed if the secondary system fails to access the system disk when hot-standby switchover is possible

[Figure]

If the primary system fails to access the system disk when hot-standby switchover is impossible

HA Monitor does not perform hot-standby switchover and continues monitoring of the system disk. If hot-standby switchover becomes possible after the failure occurred, HA Monitor behaves in the same way as in the case where the primary system fails to access the system disk when hot-standby switchover is possible.

If the secondary system fails to access the system disk when hot-standby switchover is impossible

HA Monitor continues system disk monitoring. If hot-standby switchover becomes possible after the failure occurred, HA Monitor behaves in the same way as in the case where the secondary system fails to access the system disk when hot-standby switchover is possible.

(c) Required environment settings

This subsection describes the environment settings that must be specified to monitor the system disk.

Create a monitoring definition file for the system disk, and then specify use for the disk_ptrl operand in the HA Monitor environment settings.

For details about how to specify the settings in the monitoring definition file for the system disk, see (1) Settings in the files required for monitoring the system disk in 6.20.2 Settings in the files required for disk monitoring.

(2) Monitoring of a disk for business use

If HA Monitor is not monitoring a disk for business use and either of the following cases applies, HA Monitor does not perform hot-standby switchover even when the disk for business use becomes inaccessible:

In the preceding situation, if HA Monitor is set to monitoring a disk for business use, hot-standby switchover can occur when the disk becomes inaccessible.

If a failure in a disk for business use causes a server failure, the server specification allows HA Monitor to perform hot-standby switchover by monitoring servers. In this case, monitoring disks for business use is not necessary.

(a) Method of monitoring a disk for business use

In monitoring of a disk for business use, a file created on the disk for business use is used as the monitoring target. HA Monitor periodically checks whether a write is possible to the monitoring-target file.

Note that the monitoring target must be a file in a file system. Character special files cannot be used as the monitoring target. HA Monitor can monitor a disk for business use when the monitoring target is a file system on the disk for business use on which character special files are configured. For details about how to set the monitoring target, see (c) Required environment settings.

The following describes the times when HA Monitor monitors a disk for business use and describes the processing of monitoring.

Times when HA Monitor performs monitoring

The following shows when HA Monitor starts monitoring and when it stops monitoring.

  • Monitoring start time

    • When startup of the active server is completed

    • When the monitoring of a disk for business use is restarted#

  • Monitoring end time

    • When the active server stops

    • When the monitoring of a disk for business use is temporarily stopped#

Note that disks for business use are not monitored on a server that is in the restart wait state.

#

For details about temporarily stopping and restarting the monitoring of a disk for business use, see 9.6 monchange (changes settings and operations while HA Monitor and servers are running).

Processing of monitoring

A write is attempted once to the monitoring-target file every certain length of time (check interval). If a write to the monitoring-target file is successful, HA Monitor judges that the disk for business use is healthy and continues monitoring. The following figure shows the processing that is performed if a write to the monitoring-target file is successful.

Figure 3‒36: Processing performed if a write to the monitoring-target file is successful

[Figure]

If an attempt to write to the monitoring-target file fails, HA Monitor retries the attempt at retry intervals. If a write attempt fails a certain number of times (retry count), HA Monitor judges that access to the disk for business use failed. The following figure shows the processing that is performed if a write to the monitoring-target file fails. This figure is an example in which the retry count is set to 1.

Figure 3‒37: Processing performed if a write to the monitoring-target file fails (when the retry count is 1)

[Figure]

If the system disk does not respond to a write to the monitoring-target file, HA Monitor waits for a response until the period calculated by retry interval × (retry count + 1) elapses. If the system disk does not respond until the period calculated by retry interval × (retry count + 1) elapses, HA Monitor judges that access to the disk for business use failed. The following figure shows the processing that is performed if the system disk does not respond to a write to the monitoring-target file. This figure is an example in which the retry count is set to 1.

Figure 3‒38: Processing performed if the system disk does not respond to a write to the monitoring-target file (when the retry count is 1)

[Figure]

(b) Processing performed if access to a disk for business use fails

If access to a disk for business use fails, HA Monitor performs one of the following operations based on the setting of the disk_ptrl_act operand in the server environment definition:

  • Performs planned hot-standby switchover for servers.

  • Causes host pair shutdown to perform hot-standby switchover as a host failure.

  • Continues monitoring of the disk for business use without performing hot-standby switchover.

Planned hot-standby switchover for servers and host pair shutdown can be performed only when hot-standby switchover for servers is possible. However, even in a case where hot-standby switchover for servers is impossible when access to the disk for business use fails, if hot-standby switchover becomes possible later, hot-standby switchover is performed.

If servers are grouped, servers for which grouped-system switchover can be performed are switched over on a server group basis. If a server group includes servers for which grouped-system switchover could not be performed, start those servers on the host that has become active.

The following figure shows a typical example of processing performed if access to a disk for business use fails when hot-standby switchover is possible. Note that this figure shows an example in a case where planned hot-standby switchover for servers is performed.

Figure 3‒39: Processing performed if access to a disk for business use fails when hot-standby switchover is possible (in a case where planned hot-standby switchover for servers is performed)

[Figure]

(c) Required environment settings

This subsection describes the environment settings that must be specified to monitor disks for business use.

For each monitoring-target disk for business use, create a monitoring definition file for the disk for business use. Then, specify use for the disk_ptrl operand in the server environment definition.

For details about how to specify the settings in a monitoring definition file for the disk for business use, see (2) Settings in the files required for monitoring disks for business use in 6.20.2 Settings in the files required for disk monitoring.