Nonstop Database, HiRDB Version 9 System Operation Guide

[Contents][Index][Back][Next]

26.7.8 System switchover when an error other than a server failure occurs

This subsection explains how to perform a system switchover when one of the following errors occurs:

Organization of this subsection
(1) A large number of server processes terminate abnormally
(2) An RDAREA I/O error (path error) occurs

(1) A large number of server processes terminate abnormally

If a large number of server processes terminate abnormally, new services might not be accepted. Although abnormal termination of server processes will not cause HiRDB to terminate abnormally, HiRDB is essentially in online stopped status. Also, because HiRDB does not terminate abnormally, a system switchover is not performed. The following subsections explain how to perform system switchovers when HiRDB is in online stopped status.

(a) System switchover preparations

Specify the pd_down_watch_proc operand.

HiRDB (or an applicable unit for a HiRDB parallel server configuration) terminates abnormally when the number of server processes terminating abnormally exceeds the value set in the pd_down_watch_proc operand during a specified period of time. The facility that terminates HiRDB abnormally in such circumstances is called the process abnormal termination monitoring facility. This facility is used to terminate HiRDB abnormally and perform a system switchover when HiRDB is in online stopped status. For details about the process abnormal termination monitoring facility, see 8.13 Monitoring the number of times server processes terminate abnormally (abnormal termination monitoring facility).

Specify the switchtype operand for HA Monitor or Hitachi HA Toolkit Extension.

Specify switch in the switchtype operand for HA Monitor or Hitachi HA Toolkit Extension. When switch is specified, a system switchover will be performed when HiRDB terminates abnormally.

(b) Mutual system switchover configuration

Performing a system switchover might not be effective and might actually cause traffic to increase because more than one HiRDB is running on the same server machine. If you are using the process abnormal termination monitoring facility in a mutual system switchover configuration, we recommend that you do not perform a system switchover when HiRDB terminates abnormally. Instead, restart HiRDB in the system where it terminated abnormally by specifying pd_mode_conf=MANUAL1.

When running in the server mode, specify either restart or manual in the switchtype operand of HA Monitor or Hitachi HA Toolkit Extension. When restart is specified, HiRDB in the system where the error occurred restarts. When HiRDB cannot be restarted in the system where the error occurred, perform a system switchover and restart HiRDB in the system that was the switchover destination. When manual is specified, a system switchover will not be performed automatically even if HiRDB cannot be restarted.

(c) Reducing the system switchover time

When a large number of server processes terminate abnormally, a large amount of troubleshooting information might be output, causing the ensuing system switchover to take a long time. Specifying the following operands suppresses output of troubleshooting information, making it possible to reduce the system switchover time when many server processes have terminated abnormally:

Also, when you specify Y in the pd_ha_switch_timeout operand, if the internal termination processing of the running HiRDB when a system switchover occurs exceeds the server failure monitoring time, the system switchover can occur without waiting for the internal termination processing of the running HiRDB.

(2) An RDAREA I/O error (path error) occurs

This subsection explains how to perform a system switchover when an RDAREA input/output error (path error) occurs. For this purpose, an input/output error (I/O error) means an error that occurs when HiRDB fails to perform an operation on a file because HiRDB cannot identify the file. The error code returned from the request for access to the HiRDB file system is -1544.

(a) System switchover preparations

Specify the pd_db_io_error_action operand.

If unitdown is specified in the pd_db_io_error_action operand, HiRDB (or a unit for a HiRDB parallel server configuration) terminates abnormally when an RDAREA I/O error occurs, causing a system switchover to be performed. When the cause of the I/O error is a path error, job tasks can continue because I/O processing can be performed after the system switchover is performed. For this purpose, a path error means a status in which files cannot be accessed because the path of communication between HiRDB and the files was interrupted.

For details about specifying unitdown in the pd_db_io_error_action operand, see 20.20 Actions to take when an RDAREA I/O error occurs.

Specify the switchtype operand in HA Monitor or Hitachi HA Toolkit Extension.

Specify switch in the switchtype operand of HA Monitor or Hitachi HA Toolkit Extension. When switch is specified, a system switchover will be performed when HiRDB terminates abnormally.

(b) Operation

When an I/O error occurs and HiRDB terminates abnormally, perform a system switchover and continue the processing that was in progress when the error occurred. To resolve the error, read the messages that are output. Then, perform another system switchover, or terminate and restart HiRDB, as appropriate. If the I/O error re-occurs after the system switchover, the RDAREA shuts down. If this happens, use the database recovery utility (pdrstr command) to recover the RDAREA.