Nonstop Database, HiRDB Version 9 System Operation Guide

[Contents][Index][Back][Next]

15.12 Facility for taking a unit down when a physical error is detected

This section explains how to use the facility for taking a unit down when a physical error is detected. Here, a physical error is defined as an I/O error or file-opening error that occurs in response to RDAREA access. As the shutdown cause, i/o error occurred or open error occurred is output in the KFPH00306-E message.

Organization of this section
(1) About the facility for taking a unit down when a physical error is detected
(2) Notes
(3) Preparation
(4) Actions to take when a physical error occurs

(1) About the facility for taking a unit down when a physical error is detected

If an error occurs in an RDAREA during operation, HiRDB localizes the affected range by placing the RDAREA in error shutdown status, and then continues processing. However, if a system has a small number of RDAREAs, the entire job might stop, even if only the RDAREA where the error occurred is placed in error shutdown status. In this case, before the job can be resumed, the RDAREA that was placed in error shutdown must be recovered.

When the facility for taking a unit down when a physical error is detected is used, it can cause a unit down without the RDAREA where the error occurred being placed in error shutdown status (in the case of the master directory RDAREA, a unit down occurs when a physical error is detected, even if the facility for taking a unit down when a physical error is detected is not being used). By taking the unit down, you can remove the cause of the error shutdown precipitated by the physical error before restarting the unit. This allows you to avoid the work that would have been necessary to recover the RDAREA had it been placed in error shutdown status.

(2) Notes

Pay attention to the following when you use this facility:

(3) Preparation

To use this facility, specify the system common definition (pdsys) as follows:

  1. Specify unitdown for the pd_db_hold_action operand.
    When this value is specified, the KFPH23047-I message is output and the unit goes down when a physical error is detected. In this case, no error shutdown occurs in the RDAREA.
  2. Specify MANUAL2 for the pd_mode_conf operand.
    By specifying this value, you can prevent HiRDB (the unit) from automatically restarting following a unit down.
    If AUTO or MANUAL1 is specified, there is a risk that after the unit is taken down by the facility for taking a unit down when a physical error is detected, HiRDB (the unit) might automatically restart before the cause of the error is removed. In this case, the physical error will be detected again, resulting in repeated unit downs and restarts.

(4) Actions to take when a physical error occurs

When the facility for taking a unit down when a physical error is detected is used, and it causes a unit down, the HiRDB administrator must take the actions described in the procedure shown in the following figure.

Figure 15-6 Actions to take when a physical error occurs

[Figure]

#
If the cause of the physical error cannot be eliminated, this same cause will make the unit go down again even after it is restarted. If this happens, specify dbhold for the pd_db_hold_action operand in the system common definition to avoid using the facility for taking a unit down when a physical error is detected.