19.8.3 General HA troubleshooting

Organization of this subsection

(1) Resource hosting subsystem process stops unexpectedly
(2) Product monitoring times out
(3) Log files on the active cluster node are not updating
(4) Cannot start the NNMi HA resource group on a particular cluster node
(5) The message System error XXXX occurred is displayed (in Windows)

(1) Resource hosting subsystem process stops unexpectedly

Starting an HA cluster resource on a computer running the Windows Server 2012, Windows Server 2012 R2 or Windows Server 2016 operating system stops the resource hosting subsystem (rhs.exe) process unexpectedly.

For details about this known problem, see the Microsoft Support Website article:

http://support.microsoft.com/kb/978527

Important: Always run the NNMi resource in a separate resource monitor (rhs.exe) specific to the resource group.

To Page Top

(2) Product monitoring times out

The system log contains a message similar to the following example:

VCS ERROR V-16-2-13027 Thread(...) Resource(<resource group>-app) - monitor procedure did not complete within the expected time.

This message indicates that the product could not monitor the resources within the time set in Veritas Cluster Server or Symantec Cluster Server.

A timeout value of 60 seconds is set as the default for Veritas Cluster Server or Symantec Cluster Server.

To change the timeout value set in Veritas Cluster Server or Symantec Cluster Server, run the following commands (in the order shown here):

/opt/VRTSvcs/bin/haconf -makerw
/opt/VRTSvcs/bin/hares -override resource-group-app MonitorTimeout
/opt/VRTSvcs/bin/hares -modify resource-group-app MonitorTimeout <value in seconds>
/opt/VRTSvcs/bin/haconf -dump -makero

To Page Top

(3) Log files on the active cluster node are not updating

This situation is normal. It occurs because the log files have been redirected to the shared disk.

For NNMi, review the log files in the location specified by HA_NNM_LOG_DIR in the ov.conf file.

To Page Top

(4) Cannot start the NNMi HA resource group on a particular cluster node

If the nnmhargconfigure.ovpl or nnmhastartrg.ovpl command does not correctly start, stop, or switch the NNMi HA resource group, review the following information:

WSFC

- Review in Failover Cluster Management the state of the resource group and underlying resources.

- Review the Event Viewer log for any errors.
VCS or SCS

- Run /opt/VRTSvcs/bin/hares -state to review the resource state.

- For failed resources, review the /var/VRTSvcs/log/resource.log file for the resource that is failing. Resources are referenced by the agent type (for example, IP*.log, Mount*.log, and Volume*.log).

If you cannot locate the source of the problem, you can start the NNMi HA resource group manually by using HA product commands:

Mount the shared disk.
Assign the virtual host to the network interface:
- WSFC
  
  - Start Failover Cluster Management.
  
  - Expand the resource group.
  
  - Right-click resource-group-ip, and then click Bring Online.
- VCS or SCS
  
  /opt/VRTSvcs/bin/hares -online resource-group-ip -sys local-host-name
Start the HA resource group.

Example:
- Windows
  
  %NnmInstallDir%misc\nnm\ha\nnmhastartrg.ovpl NNM -start resource-group
- Linux
  
  $NnmInstallDir/misc/nnm/ha/nnmhastartrg.ovpl NNM -start resource-group
Return code 0 indicates that NNMi started successfully.

Return code 1 indicates that NNMi did not start correctly.

To Page Top

(5) The message System error XXXX occurred is displayed (in Windows)

A system (OS or cluster software) error might have occurred. For details, see the OS or cluster software documentation.

Error examples: Examples of errors in WSFC are described below.

Example: System error 5054 occurred (0x000013be). The cluster network is invalid.

If an IP address of the internal network for heartbeat is specified as an IP address for NNMi, this error occurs in the cluster.exe command that was executed to create IP address resources.
Example: System error 5057 occurred (0x000013c1). That cluster IP address is already in use.

If an IP address already in use is specified as an IP address for NNMi, this error occurs in the cluster.exe command that was executed to create IP address resources.

Action: Check the nature of the system error and take appropriate action. When a specified IP address is not valid for NNMi, as is the case in these examples, check the IP address that is to be used.

To Page Top