Hitachi

Job Management Partner 1 Version 10 Job Management Partner 1/Consolidated Management 2/Network Node Manager i Setup Guide


17.8.4 NNMi-specific HA troubleshooting

The topics in this subsection apply to HA configuration for NNMi only.

Organization of this subsection

(1) Re-enabling NNMi for HA after all cluster nodes are unconfigured

When all NNMi HA cluster nodes have been unconfigured, the ov.conf file no longer contains any mount point references to the NNMi shared disk. To re-create the mount point reference without overwriting the data on the shared disk, follow these steps on the primary cluster node:

  1. Stop NNMi if it is running:

    ovstop -c

  2. Reset the reference to the shared disk:

    • Windows

      %NnmInstallDir%misc\nnm\ha\nnmhadisk.ovpl \

      NNM -setmount HA-mount-point

    • UNIX

      $NnmInstallDir/misc/nnm/ha/nnmhadisk.ovpl \

      NNM -setmount HA-mount-point

  3. In the ov.conf file, verify the entries related to HA mount points.

    For the location of the ov.conf file, see 17.9.1 NNMi HA configuration files.

(2) NNMi does not start correctly under HA

When NNMi does not start correctly, you must determine whether the issue is a hardware issue with the virtual IP address or the disk, or whether the issue is some form of application failure. During this determination process, put the system in maintenance mode.

To fix this problem:

  1. On the active cluster node in the HA cluster, disable HA resource group monitoring by creating the following maintenance file:

    Windows: %NnmDataDir%\hacluster\resource-group\maintenance

    UNIX: $NnmDataDir/hacluster/resource-group/maintenance

  2. Start NNMi:

    ovstart
  3. Verify that NNMi started correctly:

    ovstatus -c

    All NNMi services must show the state RUNNING. If this is not the case, troubleshoot the process that does not start correctly.

  4. After completing your troubleshooting, delete the maintenance file:

    Windows: %NnmDataDir%\hacluster\resource-group\maintenance

    UNIX: $NnmDataDir/hacluster/resource-group/maintenance

(3) Changes to NNMi data are not seen after failover

The NNMi configuration points to a different system than the one NNMi is running. To fix this problem, verify that the ov.conf file has appropriate entries for the following items:

For the location of the ov.conf file, see 17.9.1 NNMi HA configuration files.

(4) nmsdbmgr does not start after HA configuration

This situation usually occurs as a result of starting NNMi after running the nnmhaconfigure.ovpl command but without having run the nnmhadisk.ovpl command with the -to option specified. In this case, the HA_POSTGRES_DIR entry in the ov.conf file specifies the location on the shared disk, but this location is not available to NNMi.

To fix this problem:

  1. On the active cluster node in the HA cluster, disable HA resource group monitoring by creating the following maintenance file:

    • Windows: %NnmDataDir%\hacluster\resource-group\maintenance

    • UNIX: $NnmDataDir/hacluster/resource-group/maintenance

  2. Copy the NNMi database to the shared disk:

    • Windows: %NnmInstallDir%\misc\nnm\ha\nnmhadisk.ovpl NNM \

      -to HA-mount-point

    • UNIX: $NnmInstallDir/misc/nnm/ha/nnmhadisk.ovpl NNM \

      -to HA-mount-point

  3. Start the NNMi HA resource group:

    • Windows: %NnmInstallDir%\misc\nnm\ha\nnmhastartrg.ovpl \

      NNM resource-group

    • UNIX: $NnmInstallDir/misc/nnm/ha/nnmhastartrg.ovpl NNM \

      resource-group

  4. Start NNMi:

    ovstart
  5. Verify that NNMi started correctly:

    ovstatus -c

    All NNMi services must show the state RUNNING.

  6. After completing your troubleshooting, delete the maintenance file:

    • Windows: %NnmDataDir%\hacluster\resource-group\maintenance

    • UNIX: $NnmDataDir/hacluster/resource-group/maintenance

(5) NNMi runs correctly on only one HA cluster node (Windows)

The Windows operating system requires two different virtual IP addresses, one for the HA cluster and one for the HA resource group. If the virtual IP address of the HA cluster is the same as that of the NNMi HA resource group, NNMi runs correctly only on the node associated with the HA cluster IP address.

To correct this problem, change the virtual IP address of the HA cluster to a unique value within the network.

(6) Disk failover does not occur

This situation can arise when the operating system does not support the shared disk. Review the HA product, operating system, and disk manufacturer documentation to determine whether these products can work together.

When a disk failure occurs, NNMi does not start on failover. Most likely, nmsdbmgr fails because the HA_POSTGRES_DIR directory does not exist. Verify that the shared disk is mounted and that the appropriate files are accessible.

(7) Shared disk is not accessible (Windows)

The nnmhaclusterinfo.ovpl -config NNM -get HA_MOUNT_POINT command returns nothing.

The drive of the shared disk mount point must be fully specified during HA configuration.

Example: Y:

To correct this problem, run the nnmhaconfigure.ovpl command on each node in the HA cluster. Fully specify the drive of the shared disk mount point.

(8) Shared disk files are not found by the secondary cluster node after failover

The most common cause of this situation is that the nnmhadisk.ovpl command was run with the -to option specified while the shared disk was not mounted. In this case, the data files are copied to the local disk, so the files are not available on the shared disk.

To fix this problem:

  1. On the active cluster node in the HA cluster, disable HA resource group monitoring by creating the following maintenance file:

    Windows: %NnmDataDir%\hacluster\resource-group\maintenance

    UNIX: $NnmDataDir/hacluster/resource-group/maintenance

  2. Log on to the active cluster node and verify that the disk is mounted and available.

  3. Stop NNMi:

    ovstop
  4. Copy the NNMi database to the shared disk:

    Windows: %NnmInstallDir%\misc\nnm\ha\nnmhadisk.ovpl NNM \

    -to HA-mount-point

    UNIX: $NnmInstallDir/misc/nnm/ha/nnmhadisk.ovpl NNM \

    -to HA-mount-point

  5. Start the NNMi HA resource group:

    Windows: %NnmInstallDir%\misc\nnm\ha\nnmhastartrg.ovpl NNM \

    resource-group

    UNIX: $NnmInstallDir/misc/nnm/ha/nnmhastartrg.ovpl NNM \

    resource-group

  6. Start NNMi:

    ovstart
  7. Verify that NNMi started correctly:

    ovstatus -c

    All NNMi services must show the state RUNNING.

  8. After completing your troubleshooting, delete the maintenance file:

    Windows: %NnmDataDir%\hacluster\resource-group\maintenance

    UNIX: $NnmDataDir/hacluster/resource-group/maintenance