Job Management Partner 1/Base User's Guide

[Contents][Glossary][Index][Back][Next]


3.5.4 Registering daemons in the cluster software

In the cluster software used in your system, register the JP1/Base daemons for failovers. For details on the registration procedure, see the documentation for your cluster software. Remember the following points when registering services:

The information needed when registering JP1/Base into cluster software is shown below:

Functionality Description
Start Start JP1/Base.
  • Command
    jbs_start.cluster logical-host-name
  • End timing of the start command
    The start command ends after JP1/Base is started. If starting JP1/Base does not complete for any reason after the timeout period (typically 60 seconds) elapsed the command ends before JP1/Base is started. In such a case, an attempt to start JP1/Base is not suspended; the command ends but an attempt to start JP1/Base continues.
  • Result start judgment for the start command
    The result of starting JP1/Base should be determined by the operation monitor method described below. Usually, the operation monitor functionality of the clustering software is used. The return value of the start command cannot be used for judgment because it is either 0 (normal end) or 1 (abnormal argument).
Stop Stop JP1/Base.
  • Command
    jbs_stop.cluster logical-host-name
  • End timing of the stop command
    The stop command ends after JP1/Base is stopped. If stopping JP1/Base does not complete for any reason after the timeout period (typically 60 seconds) elapsed, the command ends before JP1/Base is stopped. In such a case, the attempt to stop JP1/Base is not suspended; the command ends but the attempt to stop JP1/Base continues.
  • Result judgment for the stop command
    The result of stopping JP1/Base should be determined by the operation monitor method described below. The return value of the stop command cannot be used for judgment because it is either 0 (normal end) or 1 (abnormal argument).

Remarks:
We recommend that you execute the kill command, described below, after the stop command ends. This ensures that the process terminates regardless of any problem, thus preventing failovers from failing.
Operation monitoring Use the return values from the jbs_spmd_status and jevstat commands to monitor whether JP1/Base is operating normally. These commands judge the operating status based on whether each process is running or not.
Some clustering software does not support this functionality. Register this functionality only when a failover is required upon a failure in JP1/Base.
  • Command
    jbs_spmd_status -h logical-host-name
    jevstat logical-host-name
  • Result judgment for operation monitoring
    The return values have the following meanings:
    Return value = 0 (all operating)
    JP1/Base is operating normally.
    Return value = 1 (error)
    An unrecoverable error has occurred. Judge this as a failure.
    Note
    If you execute the jbs_spmd_status command on the secondary node with the shared disk offline, it returns 1 because the shared disk is not found.
    Return value = 4 (partial stop)
    Some of the JP1/Base processes have stopped for some reason. Judge this as a failure (for UNIX).#
    Return value = 8 (all stopped)
    All processes of JP1/Base have stopped for some reason. Judge this as a failure.
    Return value = 12 (error but retry possible)
    While the jbs_spmd_status command is checking the operating status, an error has occurred which can be recovered by retry. Retry checking the operating status up to a specified number of times. For the jevstat command, this return value indicates an error for which retry is not possible.
Kill Kill JP1/Base and release the resources it has been using.
  • Command
    jbs_killall.cluster logical-host-name
When you execute the jbs_killall.cluster command, each process is forcibly stopped without performing any processing for stopping JP1/Base.

Note
Stop JP1/Base using the stop command before executing the kill command. Use the kill command only when a problem has occurred, for example, when executing the stop command cannot terminate processing.

# In Windows, operation differs from that in UNIX due to the relationship with service control by Windows. If some processes have stopped in Windows, the JP1 process management automatically stops all the processes, placing the service into the stopped state. You can determine a failure by detecting the stop of the service or when the jbs_spmd_status command returns a value of 8.

Remarks: Restarting JP1
If a JP1 failure is detected in a cluster system, the primary server might restart JP1 to attempt recovery before it performs a failover to the secondary server.
In such a case, we recommend that you use the clustering software control to restart JP1 rather than restarting by JP1 process management.
The clustering software attempts to restart JP1 after a failure is detected, so that it might prevent the normal operation of the JP1 restart functionality. To ensure a more reliable restart, restart JP1 under the control of the clustering software.

[Contents][Back][Next]


[Trademarks]

All Rights Reserved. Copyright (C) 2009, Hitachi, Ltd.