Hitachi

Job Management Partner 1 Version 10 Job Management Partner 1/Consolidated Management 2/Network Node Manager i Setup Guide


17.8.3 General HA troubleshooting

Organization of this subsection

(1) Resource hosting subsystem process stops unexpectedly (Windows Server 2008 R2)

Starting an HA cluster resource on a computer running the Windows Server 2008 R2 operating system stops the resource hosting subsystem (rhs.exe) process unexpectedly.

For details about this known problem, see the Microsoft Support Website article:

http://support.microsoft.com/kb/978527
Important note

Always run the NNMi resource in a separate resource monitor (rhs.exe) specific to the resource group.

(2) Product startup times out

The system log contains a message similar to the following example:

VCS ERROR V-16-1-13012 Thread(...) Resource(resource-group-app): online procedure did not complete within the expected time.

This message indicates that the product did not start completely within the time set in Veritas Cluster Server or Symantec Cluster Server. The NNMi-provided HA configuration scripts define this timeout value to be 30 minutes.

To change the timeout value set in Veritas Cluster Server or Symantec Cluster Server on the Solaris operating system, run the following commands (in the order shown here):

/opt/VRTSvcs/bin/haconf -makerw
/opt/VRTSvcs/bin/hares -modify resource-group \-app OnlineTimeout value-in-seconds
/opt/VRTSvcs/bin/haconf -dump -makero

(3) Product monitoring times out

The system log contains a message similar to the following example:

VCS ERROR V-16-2-13027 Thread(...) Resource(<resource group>-app) - monitor procedure did not complete within the expected time.

This message indicates that the product could not monitor the resources within the time set in Veritas Cluster Server or Symantec Cluster Server.

A timeout value of 60 seconds is set as the default for Veritas Cluster Server or Symantec Cluster Server.

To change the timeout value set in Veritas Cluster Server or Symantec Cluster Server on the Solaris operating system, run the following commands (in the order shown here):

/opt/VRTSvcs/bin/haconf -makerw
/opt/VRTSvcs/bin/hares -override resource-group-app MonitorTimeout
/opt/VRTSvcs/bin/hares -modify resource-group-app MonitorTimeout <value in seconds>
/opt/VRTSvcs/bin/haconf -dump -makero

(4) Log files on the active cluster node are not updating

This situation is normal. It occurs because the log files have been redirected to the shared disk.

For NNMi, review the log files in the location specified by HA_NNM_LOG_DIR in the ov.conf file.

(5) Cannot start the NNMi HA resource group on a particular cluster node

If the nnmhargconfigure.ovpl or nnmhastartrg.ovpl command does not correctly start, stop, or switch the NNMi HA resource group, review the following information:

If you cannot locate the source of the problem, you can start the NNMi HA resource group manually by using HA product commands:

  1. Mount the shared disk.

  2. Assign the virtual host to the network interface:

    • WSFC

      - Start Failover Cluster Management.

      - Expand the resource group.

      - Right-click resource-group-ip, and then click Bring Online.

    • Serviceguard

      - HP-UX: Run /usr/sbin/cmmodnet to add the IP address.

      - Linux: Run /usr/local/cmcluster/bin/cmmodnet to add the IP address.

    • VCS or SCS

      /opt/VRTSvcs/bin/hares -online resource-group \

      -ip -sys local-host-name

  3. Start the HA resource group.

    Example:

    • Windows

      %NnmInstallDir%\misc\nnm\ha\nnmhastartrg.ovpl \

      NNM -start resource-group

    • UNIX

      $NnmInstallDir/misc/nnm/ha/nnmhastartrg.ovpl \

      NNM -start resource-group

    Return code 0 indicates that NNMi started successfully.

    Return code 1 indicates that NNMi did not start correctly.

(6) The message System error XXXX occurred is displayed (in Windows)

A system (OS or cluster software) error might have occurred. For details, see the OS or cluster software documentation.

Error examples: Examples of errors in WSFC are described below.

Action: Check the nature of the system error and take appropriate action. When a specified IP address is not valid for NNMi, as is the case in these examples, check the IP address that is to be used.