Nonstop Database, HiRDB Version 9 System Operation Guide
(1) About the facility for taking a unit down when a physical error is detected
If an error occurs in an RDAREA during operation, HiRDB localizes the affected range by placing the RDAREA in error shutdown status, and then continues processing. However, if a system has a small number of RDAREAs, the entire job might stop, even if only the RDAREA where the error occurred is placed in error shutdown status. In this case, before the job can be resumed, the RDAREA that was placed in error shutdown must be recovered.
When the facility for taking a unit down when a physical error is detected is used, it can cause a unit down without the RDAREA where the error occurred being placed in error shutdown status (in the case of the master directory RDAREA, a unit down occurs when a physical error is detected, even if the facility for taking a unit down when a physical error is detected is not being used). By taking the unit down, you can remove the cause of the error shutdown precipitated by the physical error before restarting the unit. This allows you to avoid the work that would have been necessary to recover the RDAREA had it been placed in error shutdown status.
Pay attention to the following when you use this facility:
- If a physical error occurs during RDAREA access, outputting the KFPH00307-E message and placing the RDAREA in command shutdown status, unit down does not occur, even if unitdown is specified in the pd_db_hold_action operand.
- If a physical error occurs when you are using the facility for taking a unit down when a physical error is detected, the processing-target RDAREA might be placed in error shutdown in the following cases:
A UAP or utility is being executed in the pre-update log acquisition mode or no-log mode.
A UAP or utility that is being executed on a user LOB RDAREA that was placed in the no-log mode because NO was specified for the RECOVERY operand of the CREATE TABLE statement.
When you use the facility for taking a unit down when a physical error is detected, avoid these types of operations as much as possible. If they are required, make a backup before executing a UAP or utility so that you will be able to recover the RDAREA to the latest status even if it is placed in error shutdown status.
- When you use the facility for taking a unit down when a physical error is detected, monitor the KFPH23047-I message. If a unit goes down and the KFPH23047-I message is output, take the actions described below in (4) Actions to take when a physical error occurs. If you restart HiRDB before removing the cause of unit down, the physical error will be detected again, resulting in repeated unit downs and restarts.
(3) Preparation
To use this facility, specify the system common definition (pdsys) as follows:
- Specify unitdown for the pd_db_hold_action operand.
When this value is specified, the KFPH23047-I message is output and the unit goes down when a physical error is detected. In this case, no error shutdown occurs in the RDAREA.
- Specify MANUAL2 for the pd_mode_conf operand.
By specifying this value, you can prevent HiRDB (the unit) from automatically restarting following a unit down.
If AUTO or MANUAL1 is specified, there is a risk that after the unit is taken down by the facility for taking a unit down when a physical error is detected, HiRDB (the unit) might automatically restart before the cause of the error is removed. In this case, the physical error will be detected again, resulting in repeated unit downs and restarts.
(4) Actions to take when a physical error occurs
When the facility for taking a unit down when a physical error is detected is used, and it causes a unit down, the HiRDB administrator must take the actions described in the procedure shown in the following figure.
Figure 15-6 Actions to take when a physical error occurs
- #
- If the cause of the physical error cannot be eliminated, this same cause will make the unit go down again even after it is restarted. If this happens, specify dbhold for the pd_db_hold_action operand in the system common definition to avoid using the facility for taking a unit down when a physical error is detected.
All Rights Reserved. Copyright (C) 2011, 2015, Hitachi, Ltd.