Nonstop Database, HiRDB Version 9 System Operation Guide

[Contents][Index][Back][Next]

26.9.7 System switchover when an error other than a server failure occurs

This subsection explains how to perform a system switchover when one of the following errors occurs:

Organization of this subsection
(1) A large number of server processes terminate abnormally
(2) An RDAREA I/O error (path error) occurs

(1) A large number of server processes terminate abnormally

If a large number of server processes terminate abnormally, new services might not be accepted. Although abnormal termination of server processes will not cause HiRDB to terminate abnormally, HiRDB is essentially in online stopped status. Also, because HiRDB does not terminate abnormally, a system switchover is not performed. The following subsections explain how to perform system switchovers when HiRDB is in online stopped status.

(a) System switchover preparations

Specify the pd_down_watch_proc operand.

The unit can be terminated abnormally when the number of server processes terminating abnormally exceeds the value set in the pd_down_watch_proc operand during a specified period of time. The facility that terminates HiRDB abnormally in such circumstances is called the process abnormal termination monitoring facility. This facility is used to terminate HiRDB abnormally and perform a system switchover when HiRDB is in online stopped status. For details about the process abnormal termination monitoring facility, see 8.13 Monitoring the number of times server processes terminate abnormally (abnormal termination monitoring facility).

(b) Reducing the system switchover time

When a large number of server processes terminate abnormally, a large amount of troubleshooting information might be output, causing the ensuing system switchover to take a long time. Specifying the following operands suppresses output of troubleshooting information, making it possible to reduce the system switchover time when many server processes have terminated abnormally:

Also, when you specify Y in the pd_ha_switch_timeout operand, if the internal termination processing of the running HiRDB (normal unit for the standby-less system switchover facility) when a system switchover occurs exceeds the server failure monitoring time, the system switchover can occur without waiting for the internal termination processing of the running HiRDB.

(2) An RDAREA I/O error (path error) occurs

This subsection explains how to perform a system switchover when an RDAREA input/output error (path error) occurs. For this purpose, an input/output error (I/O error) means an error that occurs when HiRDB fails to perform an operation on a file because HiRDB cannot identify the file. The error code returned from the request for access to the HiRDB file system is -1544.

(a) System switchover preparations

Specify the pd_db_io_error_action operand.

If unitdown is specified in the pd_db_io_error_action operand, the unit terminates abnormally when an RDAREA I/O error occurs, causing a system switchover to be performed. When the cause of the I/O error is a path error, job tasks can continue because I/O processing can be performed after the system switchover is performed. For this purpose, a path error means a status in which files cannot be accessed because the path of communication between HiRDB and the files was interrupted.

For details about specifying unitdown in the pd_db_io_error_action operand, see 20.20 Actions to take when an RDAREA I/O error occurs.

Check the value specified in the pd_mode_conf operand.

If a system switchover is to be performed when HiRDB terminates abnormally, specify pd_mode_conf=MANUAL2. If a system switchover is not to be performed (if HiRDB is set to restart in the abnormally terminated system), specify pd_mode_conf=MANUAL1.

Specify the switchtype operand for HA Monitor or Hitachi HA Toolkit Extension (applicable to the server mode only).

Specify switch in HA Monitor's or Hitachi HA Toolkit Extension's switchtype operand. When switch is specified, a system switchover will be performed when HiRDB terminates abnormally.

(b) Operation

When an I/O error occurs and HiRDB terminates abnormally, perform a system switchover and continue the processing that was in progress when the error occurred. To resolve the error, read the messages that are output. Then, perform another system switchover, or terminate and restart HiRDB, as appropriate. If the I/O error re-occurs after the system switchover, the RDAREA shuts down. If this happens, use the database recovery utility (pdrstr command) to recover the RDAREA.