7.5.9 Handling device failures on a shared disk (while the standby server is starting, on standby, or terminating) (using SCSI reservation for shared disk)
If an I/O error occurs while the standby server is starting, on standby, or terminating on the device specified in the scsi_device or dmmp_device operand in the server environment definition, HA Monitor issues the KAMN725-W and KAMN726-E messages and resumes processing. In a redundant configuration with multipath software, HA Monitor issues the KAMN726-E message only when a failure has occurred on all paths to the same shared disk.
To recover from a device failure when the standby server is starting, on standby, or terminating:
-
Determine the cause of the device failure.
Determine the cause of the failure by referencing the KAMN725-W and KAMN726-E messages and the message issued by the kernel and by using hardware management tools.
-
Resolve the cause of the device failure.
Resolve the cause of the device failure by taking an appropriate action, such as by replacing the erroneous device.
In a redundant configuration with multipath software, do not at this time restore the path that resulted in the failure to online status (failback).
-
In a multi-path configuration, restore the path that has been recovered from the failure to online status (failback).
Restore a recovered path to online status (failback) by using the appropriate command provided by the multipath software (HDLM, DMMP, or HFC-PCM). For details about how to restore paths to online status, see the manual Hitachi Dynamic Link Manager Software User's Guide (for Linux(R) systems). Alternatively, see the documentation for DMMP or HFC-PCM.
For a single-path configuration, or in a VMware ESXi-based virtualization environment (where DMMP is not used), this step is not necessary.
If you restart the OS when there is a server that is running as the active system in the other system, the following message might be output to the syslog file: kernel: sd x:x:x:x: reservation conflict (x: numeric value). Note that the message to be output might differ depending on the OS version.
This means that the active server has obtained a SCSI reservation to protect the disk. This is a normal operation and there is no need to take any action.