25.20.1 A large number of server processes has terminated abnormally

If a large number of server processes has terminated abnormally, new services may not be accepted. HiRDB does not terminate abnormally because a server process has terminated abnormally, but HiRDB is essentially in online stopped status. Also, because HiRDB will not have terminated abnormally, system switchover will not be performed. The procedure for performing system switchover when HiRDB is in online stopped status is explained below.

Organization of this subsection
(1) System switchover preparations
(2) Mutual system switchover configuration
(3) Reducing system switchover time

(1) System switchover preparations

(a) Specify the pd_down_watch_proc operand

HiRDB (or applicable unit for a HiRDB/Parallel Server) can be terminated abnormally when the number of server processes terminating abnormally exceeds the value set in the pd_down_watch_proc operand during a specified period of time. The facility that terminates HiRDB abnormally in such circumstances is called the process abnormal termination monitoring facility. This facility is used to terminate HiRDB abnormally and perform system switchover when HiRDB is in online stopped status. For details about the process abnormal termination monitoring facility, see 8.13 Monitoring the number of times server processes terminate abnormally (abnormal termination monitoring facility).

(b) Check the value specified in the pd_mode_conf operand

If system switchover is to be performed when the process abnormal termination monitoring facility terminate HiRDB abnormally, specify pd_mode_conf=MANUAL2. If system switchover is not to be performed (if HiRDB is to be restarted on the abnormally terminated system), specify pd_mode_conf=MANUAL1.

(c) Specify the switchtype operand for HA monitor or Hitachi HA Toolkit Extension (applicable to the server mode only)

Specify switch in the switchtype operand for HA monitor or Hitachi HA Toolkit Extension. When switch is specified, system switchover will be performed when HiRDB terminates abnormally.

(d) Monitoring of the system switchover time period (applicable to the monitor mode only)

In this case, a system cannot be switched automatically even if a large number of server processes terminate abnormally and HiRDB terminates abnormally. Systems can be switched only by a user operation (such as by executing a system switchover shell script). Example system switchover operations are explained below.

(2) Mutual system switchover configuration

Performing a system switchover may not be effective and may actually cause traffic to increase because more than one HiRDB is running on the same server machine. When using the process abnormal termination monitoring facility in a mutual system switchover configuration, Hitachi recommends that you not perform system switchover when HiRDB terminates abnormally. Instead, restart HiRDB in the system where it terminated abnormally by specifying pd_mode_conf=MANUAL1.

When running in the server mode, specify either restart or manual in the switchtype operand of HA monitor or Hitachi HA Toolkit Extension. When restart is specified, HiRDB on the system resulting in an error restarts. When HiRDB cannot be restarted on the system resulting in an error, perform a system switchover and restart HiRDB on the system that was the switchover destination. When manual is specified, system switchover will not be performed automatically even if HiRDB cannot be restarted.

(3) Reducing system switchover time

When a large number of server processes terminates abnormally, a large amount of troubleshooting information may be output, requiring much time to perform a system switchover. Specifying the operands listed below suppresses output of troubleshooting information and makes it possible to reduce the system switchover time when many server processes have terminated abnormally:

Also, specifying Y in the pd_ha_switch_timeout operand makes it possible to perform system switchover without waiting for HiRDB termination processing in the running system if that termination processing (normal BES unit for the standby-less system switchover facility) exceeds the server failure monitoring time when system switchover occurs. Note that this operand can be specified only when operating in the server mode.

Table 25-55 lists the errors that affect the system switchover time.

Table 25-55 Errors that affect the system switchover time

Error type (cause of system switchover)Effects?
Monitor modeServer mode
Abnormal termination of HiRDBAbnormal termination of pdprcdNoNo
Abnormal termination of system serverYesNo
Abnormal termination of user serverCriticalMayNo
Non- critical1PDCWAITTIME exceededpd_client_waittime_
over_abort=Y (default)
MayNo
pd_client_waittime_
over_abort=N
NoNo
Internal forced termination2MayNo
AbortMayNo
Rollback occurred in UAP connected to XAMayNo
Other than the aboveMayNo
Slowdown of HiRDBNo response from pdprcdYesNo
System failureNoNo
Planned system switchoverNoNo
Legend:
No: Has no effects on the system switchover time.
However, depending on when the error occurred, the system switchover time may be affected.
May: May have effects on the system switchover time.
Specifying the operands listed below makes it possible to minimize the effects these errors have on the system switchover time:
  • pd_cancel_dump=noput
  • pd_dump_suppress_watch_time
Yes: Does often have effects on the system switchover time.
1 In the case of this error, HiRDB does not usually terminate abnormally. However, when the pd_down_watch_proc operand is specified, the number of server processes terminating abnormally is monitored and HiRDB is terminated abnormally if this number exceeds a specified value.
2 HiRDB issues SIGKILL internally and terminates processing. Forced termination resulting when PDCWAITTIME is exceeded or the pdcancel command is issued is not included.