Hitachi

JP1 Version 12 JP1/Base User's Guide


5.5.4 Registering daemons in the cluster software

In the cluster software used in your system, register the JP1/Base daemons for failovers. For details on the registration procedure, see the documentation for your cluster software. Remember the following points when registering services:

The information needed when registering JP1/Base into cluster software is shown below:

Functionality

Description

Start

Start JP1/Base.

  • Command

    jbs_start.cluster logical-host-name

  • End timing of the start command

    The start command ends after JP1/Base is started. If starting JP1/Base does not complete for any reason after the timeout period (typically 60 seconds) elapsed the command ends before JP1/Base is started. In such a case, an attempt to start JP1/Base is not suspended; the command ends but an attempt to start JP1/Base continues.

  • Result start judgment for the start command

    Determine the result of starting JP1/Base based on the information in the operation monitoring section of this table. Usually, the operation monitor functionality of the clustering software is used. The return value of the start command cannot be used for judgment because it is either 0 (normal end) or 1 (abnormal argument).

Stop

Stop JP1/Base.

  • Command

    jbs_stop.cluster logical-host-name

  • End timing of the stop command

    The stop command ends after JP1/Base is stopped. If stopping JP1/Base does not complete for any reason after the timeout period (typically 60 seconds) elapsed, the command ends before JP1/Base is stopped. In such a case, the attempt to stop JP1/Base is not suspended; the command ends but the attempt to stop JP1/Base continues.

  • Result judgment for the stop command

    Determine the result of stopping JP1/Base based on the information in the operation monitoring section of this table. The return value of the stop command cannot be used for judgment because it is either 0 (normal end) or 1 (abnormal argument).

Remarks:

After the stop command finishes, execute the jbs_spmd_status and jevstat commands to check whether JP1/Base has stopped normally. If JP1/Base has not stopped, execute the command described in the kill functionality below.

Operation monitoring

Use the return values from the jbs_spmd_status and jevstat commands to monitor whether JP1/Base is operating normally. These commands judge the operating status based on whether each process is running or not.

Some clustering software does not support this functionality. Register this functionality only when a failover is required upon a failure in JP1/Base.

  • Command

    jbs_spmd_status -h logical-host-name

    jevstat logical-host-name

  • Result judgment for operation monitoring

    The return values have the following meanings:

    Return value = 0 (all operating)

    JP1/Base is operating normally.

    Return value = 1 (error)

    An unrecoverable error has occurred. Judge this as a failure.

    Note

    If you execute the jbs_spmd_status command on the secondary node with the shared disk offline, it returns 1 because the shared disk is not found.

    Return value = 4 (partial stop)

    Some of the JP1/Base processes have stopped for some reason. Judge this as a failure (for UNIX).#

    Return value = 8 (all stopped)

    All processes of JP1/Base have stopped for some reason. Judge this as a failure.

    Return value = 12 (error but retry possible)

    While the jbs_spmd_status command is checking the operating status, an error has occurred which can be recovered by retry. Retry checking the operating status up to a specified number of times. For the jevstat command, this return value indicates an error for which retry is not possible.

Kill

Kill JP1/Base and release the resources it has been using.

  • Command

    jbs_killall.cluster logical-host-name

When you execute the jbs_killall.cluster command, each process is forcibly stopped without performing any processing for stopping JP1/Base.

Note

Stop JP1/Base using the stop command before executing the kill command. Use the kill command only when a problem has occurred, for example, when executing the stop command cannot terminate processing.

#

In Windows, operation differs from that in UNIX due to the relationship with service control by Windows. If some processes have stopped in Windows, the JP1 process management automatically stops all the processes, placing the service into the stopped state. You can determine a failure by detecting the stop of the service or when the jbs_spmd_status command returns a value of 8.

Remarks: Restarting JP1

If a JP1 failure is detected in a cluster system, the primary server might restart JP1 to attempt recovery before it performs a failover to the secondary server.

In such a case, we recommend that you use the clustering software control to restart JP1 rather than restarting by JP1 process management.

The clustering software attempts to restart JP1 after a failure is detected, so that it might prevent the normal operation of the JP1 restart functionality. To ensure a more reliable restart, restart JP1 under the control of the clustering software.