Hitachi

JP1 Version 12 JP1/Network Node Manager i Setup Guide


19.8.3 General HA troubleshooting

Organization of this subsection

(1) Resource hosting subsystem process stops unexpectedly

Starting an HA cluster resource on a computer running the Windows Server 2012, Windows Server 2012 R2 or Windows Server 2016 operating system stops the resource hosting subsystem (rhs.exe) process unexpectedly.

For details about this known problem, see the Microsoft Support Website article:

http://support.microsoft.com/kb/978527
Important

Always run the NNMi resource in a separate resource monitor (rhs.exe) specific to the resource group.

(2) Product monitoring times out

The system log contains a message similar to the following example:

VCS ERROR V-16-2-13027 Thread(...) Resource(<resource group>-app) - monitor procedure did not complete within the expected time.

This message indicates that the product could not monitor the resources within the time set in Veritas Cluster Server or Symantec Cluster Server.

A timeout value of 60 seconds is set as the default for Veritas Cluster Server or Symantec Cluster Server.

To change the timeout value set in Veritas Cluster Server or Symantec Cluster Server, run the following commands (in the order shown here):

/opt/VRTSvcs/bin/haconf -makerw
/opt/VRTSvcs/bin/hares -override resource-group-app MonitorTimeout
/opt/VRTSvcs/bin/hares -modify resource-group-app MonitorTimeout <value in seconds>
/opt/VRTSvcs/bin/haconf -dump -makero

(3) Log files on the active cluster node are not updating

This situation is normal. It occurs because the log files have been redirected to the shared disk.

For NNMi, review the log files in the location specified by HA_NNM_LOG_DIR in the ov.conf file.

(4) Cannot start the NNMi HA resource group on a particular cluster node

If the nnmhargconfigure.ovpl or nnmhastartrg.ovpl command does not correctly start, stop, or switch the NNMi HA resource group, review the following information:

If you cannot locate the source of the problem, you can start the NNMi HA resource group manually by using HA product commands:

  1. Mount the shared disk.

  2. Assign the virtual host to the network interface:

    • WSFC

      - Start Failover Cluster Management.

      - Expand the resource group.

      - Right-click resource-group-ip, and then click Bring Online.

    • VCS or SCS

      /opt/VRTSvcs/bin/hares -online resource-group-ip -sys local-host-name

  3. Start the HA resource group.

    Example:

    • Windows

      %NnmInstallDir%misc\nnm\ha\nnmhastartrg.ovpl NNM -start resource-group

    • Linux

      $NnmInstallDir/misc/nnm/ha/nnmhastartrg.ovpl NNM -start resource-group

    Return code 0 indicates that NNMi started successfully.

    Return code 1 indicates that NNMi did not start correctly.

(5) The message System error XXXX occurred is displayed (in Windows)

A system (OS or cluster software) error might have occurred. For details, see the OS or cluster software documentation.

Error examples: Examples of errors in WSFC are described below.

Action: Check the nature of the system error and take appropriate action. When a specified IP address is not valid for NNMi, as is the case in these examples, check the IP address that is to be used.