Hitachi

JP1 Version 12 JP1/Network Node Manager i Setup Guide


19.8.4 NNMi-specific HA troubleshooting

The topics in this subsection apply to HA configuration for NNMi only.

Organization of this subsection

(1) NNMi does not start correctly under HA

When NNMi does not start correctly, you must determine whether the issue is a hardware issue with the virtual IP address or the disk, or whether the issue is some form of application failure. During this determination process, put the system in maintenance mode.

To fix this problem:

  1. On the active cluster node in the HA cluster, disable HA resource group monitoring by creating the following maintenance file:

    Windows: %NnmDataDir%hacluster\resource-group\maintenance

    Linux: $NnmDataDir/hacluster/resource-group/maintenance

  2. Start NNMi:

    ovstart
  3. Verify that NNMi started correctly:

    ovstatus -c

    All NNMi services must show the state RUNNING. If this is not the case, troubleshoot the process that does not start correctly.

  4. After completing your troubleshooting, delete the maintenance file:

    Windows: %NnmDataDir%hacluster\resource-group\maintenance

    Linux: $NnmDataDir/hacluster/resource-group/maintenance

(2) Changes to NNMi data are not seen after failover

The NNMi configuration points to a different system than the one NNMi is running. To fix this problem, verify that the ov.conf file has appropriate entries for the following items:

For the location of the ov.conf file, see 19.9.1  NNMi HA configuration files.

(3) nmsdbmgr does not start after HA configuration

This situation usually occurs as a result of starting NNMi after running the nnmhaconfigure.ovpl command but without having run the nnmhadisk.ovpl command with the -to option specified. In this case, the HA_POSTGRES_DIR entry in the ov.conf file specifies the location on the shared disk, but this location is not available to NNMi.

To fix this problem:

  1. On the active cluster node in the HA cluster, disable HA resource group monitoring by creating the following maintenance file:

    • Windows: %NnmDataDir%hacluster\resource-group\maintenance

    • Linux: $NnmDataDir/hacluster/resource-group/maintenance

  2. Copy the NNMi database to the shared disk:

    • Windows: %NnmInstallDir%misc\nnm\ha\nnmhadisk.ovpl NNM -to HA-mount-point

    • Linux: $NnmInstallDir/misc/nnm/ha/nnmhadisk.ovpl NNM -to HA-mount-point

  3. Start the NNMi HA resource group:

    • Windows: %NnmInstallDir%misc\nnm\ha\nnmhastartrg.ovpl NNM resource-group

    • Linux: $NnmInstallDir/misc/nnm/ha/nnmhastartrg.ovpl NNM resource-group

  4. Start NNMi:

    ovstart
  5. Verify that NNMi started correctly:

    ovstatus -c

    All NNMi services must show the state RUNNING.

  6. After completing your troubleshooting, delete the maintenance file:

    • Windows: %NnmDataDir%hacluster\resource-group\maintenance

    • Linux: $NnmDataDir/hacluster/resource-group/maintenance

(4) NNMi runs correctly on only one HA cluster node (Windows)

The Windows operating system requires two different virtual IP addresses, one for the HA cluster and one for the HA resource group. If the virtual IP address of the HA cluster is the same as that of the NNMi HA resource group, NNMi runs correctly only on the node associated with the HA cluster IP address.

To correct this problem, change the virtual IP address of the HA cluster to a unique value within the network.

(5) Disk failover does not occur

This situation can arise when the operating system does not support the shared disk. Review the HA product, operating system, and disk manufacturer documentation to determine whether these products can work together.

When a disk failure occurs, NNMi does not start on failover. Most likely, nmsdbmgr fails because the HA_POSTGRES_DIR directory does not exist. Verify that the shared disk is mounted and that the appropriate files are accessible.

(6) Shared disk is not accessible (Windows)

If nothing is displayed even after the command nnmhaclusterinfo.ovpl -config NNM -get HA_MOUNT_POINT is run, this indicates that the shared disk cannot be accessed because the specified mount point is incorrect.

The drive of the shared disk mount point must be fully specified during HA configuration.

Example: Y:

To correct this problem, run the nnmhaconfigure.ovpl command on each node in the HA cluster. Fully specify the drive of the shared disk mount point.

(7) Shared disk files are not found on the secondary cluster node after failover

The most common cause of this situation is that the nnmhadisk.ovpl command was run with the -to option specified while the shared disk was not mounted. In this case, the data files are copied to the local disk, so the files are not available on the shared disk.

To fix this problem:

  1. On the active cluster node in the HA cluster, disable HA resource group monitoring by creating the following maintenance file:

    Windows: %NnmDataDir%hacluster\resource-group\maintenance

    Linux: $NnmDataDir/hacluster/resource-group/maintenance

  2. Log on to the active cluster node and verify that the disk is mounted and available.

  3. Stop NNMi:

    ovstop

    Windows: net stop NnmTrapReceiver

    Linux:

    - For distributions that uses systemd to manage services

    /opt/OV/bin/nettrap stop

    - Other distributions

    /etc/init.d/nettrap stop

  4. Copy the NNMi database to the shared disk:

    Windows: %NnmInstallDir%misc\nnm\ha\nnmhadisk.ovpl NNM -to HA-mount-point

    Linux: $NnmInstallDir/misc/nnm/ha/nnmhadisk.ovpl NNM -to HA-mount-point

  5. Start the NNMi HA resource group:

    Windows: %NnmInstallDir%misc\nnm\ha\nnmhastartrg.ovpl NNM resource-group

    Linux: $NnmInstallDir/misc/nnm/ha/nnmhastartrg.ovpl NNM resource-group

  6. Start NNMi:

    ovstart
  7. Verify that NNMi started correctly:

    ovstatus -c

    All NNMi services must show the state RUNNING.

  8. After completing your troubleshooting, delete the maintenance file:

    Windows: %NnmDataDir%hacluster\resource-group\maintenance

    Linux: $NnmDataDir/hacluster/resource-group/maintenance