If a large number of server processes has terminated abnormally, new services may not be accepted. HiRDB does not terminate abnormally because a server process has terminated abnormally, but HiRDB is essentially in online stopped status. Also, because HiRDB will not have terminated abnormally, system switchover will not be performed. The procedure for performing system switchover when HiRDB is in online stopped status is explained below.
HiRDB (or applicable unit for a HiRDB/Parallel Server) can be terminated abnormally when the number of server processes terminating abnormally exceeds the value set in the pd_down_watch_proc operand during a specified period of time. The facility that terminates HiRDB abnormally in such circumstances is called the process abnormal termination monitoring facility. This facility is used to terminate HiRDB abnormally and perform system switchover when HiRDB is in online stopped status. For details about the process abnormal termination monitoring facility, see 8.13 Monitoring the number of times server processes terminate abnormally (abnormal termination monitoring facility).
If system switchover is to be performed when the process abnormal termination monitoring facility terminate HiRDB abnormally, specify pd_mode_conf=MANUAL2. If system switchover is not to be performed (if HiRDB is to be restarted on the abnormally terminated system), specify pd_mode_conf=MANUAL1.
Specify switch in the switchtype operand for HA monitor or Hitachi HA Toolkit Extension. When switch is specified, system switchover will be performed when HiRDB terminates abnormally.
In this case, a system cannot be switched automatically even if a large number of server processes terminate abnormally and HiRDB terminates abnormally. Systems can be switched only by a user operation (such as by executing a system switchover shell script). Example system switchover operations are explained below.
Performing a system switchover may not be effective and may actually cause traffic to increase because more than one HiRDB is running on the same server machine. When using the process abnormal termination monitoring facility in a mutual system switchover configuration, Hitachi recommends that you not perform system switchover when HiRDB terminates abnormally. Instead, restart HiRDB in the system where it terminated abnormally by specifying pd_mode_conf=MANUAL1.
When running in the server mode, specify either restart or manual in the switchtype operand of HA monitor or Hitachi HA Toolkit Extension. When restart is specified, HiRDB on the system resulting in an error restarts. When HiRDB cannot be restarted on the system resulting in an error, perform a system switchover and restart HiRDB on the system that was the switchover destination. When manual is specified, system switchover will not be performed automatically even if HiRDB cannot be restarted.
When a large number of server processes terminates abnormally, a large amount of troubleshooting information may be output, requiring much time to perform a system switchover. Specifying the operands listed below suppresses output of troubleshooting information and makes it possible to reduce the system switchover time when many server processes have terminated abnormally:
Also, specifying Y in the pd_ha_switch_timeout operand makes it possible to perform system switchover without waiting for HiRDB termination processing in the running system if that termination processing (normal BES unit for the standby-less system switchover facility) exceeds the server failure monitoring time when system switchover occurs. Note that this operand can be specified only when operating in the server mode.
Table 25-55 lists the errors that affect the system switchover time.
Table 25-55 Errors that affect the system switchover time
Error type (cause of system switchover) | Effects? | |||||
---|---|---|---|---|---|---|
Monitor mode | Server mode | |||||
Abnormal termination of HiRDB | Abnormal termination of pdprcd | No | No | |||
Abnormal termination of system server | Yes | No | ||||
Abnormal termination of user server | Critical | May | No | |||
Non- critical1 | PDCWAITTIME exceeded | pd_client_waittime_ over_abort=Y (default) | May | No | ||
pd_client_waittime_ over_abort=N | No | No | ||||
Internal forced termination2 | May | No | ||||
Abort | May | No | ||||
Rollback occurred in UAP connected to XA | May | No | ||||
Other than the above | May | No | ||||
Slowdown of HiRDB | No response from pdprcd | Yes | No | |||
System failure | No | No | ||||
Planned system switchover | No | No |