Job Management Partner 1/Base User's Guide

3.5.4 Registering daemons in the cluster software

In the cluster software used in your system, register the JP1/Base daemons for failovers. For details on the registration procedure, see the documentation for your cluster software. Remember the following points when registering services:

Ensure that the secondary node can take over the daemons from the primary node, together with the IP address and shared disk. Also, if the failover of an application program leads to the failover of a service, ensure that the secondary node can also take over the application program.

After the logical IP address and shared disk have become available, start JP1/Base first, and then start JP1/IM and JP1/AJS. When stopping the products, stop them in the reverse order.

The information needed when registering JP1/Base into cluster software is shown below:

Functionality Description

Start Start JP1/Base.

Command
jbs_start.cluster logical-host-name

End timing of the start command
The start command ends after JP1/Base is started. If starting JP1/Base does not complete for any reason after the timeout period (typically 60 seconds) elapsed the command ends before JP1/Base is started. In such a case, an attempt to start JP1/Base is not suspended; the command ends but an attempt to start JP1/Base continues.

Result start judgment for the start command
The result of starting JP1/Base should be determined by the operation monitor method described below. Usually, the operation monitor functionality of the clustering software is used. The return value of the start command cannot be used for judgment because it is either 0 (normal end) or 1 (abnormal argument).

Stop Stop JP1/Base.

Command
jbs_stop.cluster logical-host-name

End timing of the stop command
The stop command ends after JP1/Base is stopped. If stopping JP1/Base does not complete for any reason after the timeout period (typically 60 seconds) elapsed, the command ends before JP1/Base is stopped. In such a case, the attempt to stop JP1/Base is not suspended; the command ends but the attempt to stop JP1/Base continues.

Result judgment for the stop command
The result of stopping JP1/Base should be determined by the operation monitor method described below. The return value of the stop command cannot be used for judgment because it is either 0 (normal end) or 1 (abnormal argument).

Remarks:

We recommend that you execute the kill command, described below, after the stop command ends. This ensures that the process terminates regardless of any problem, thus preventing failovers from failing.

Operation monitoring Use the return values from the jbs_spmd_status and jevstat commands to monitor whether JP1/Base is operating normally. These commands judge the operating status based on whether each process is running or not.
Some clustering software does not support this functionality. Register this functionality only when a failover is required upon a failure in JP1/Base.

Command
jbs_spmd_status -h logical-host-name
jevstat logical-host-name

Result judgment for operation monitoring
The return values have the following meanings:
Return value = 0 (all operating)
JP1/Base is operating normally.
Return value = 1 (error)
An unrecoverable error has occurred. Judge this as a failure.
Note
If you execute the jbs_spmd_status command on the secondary node with the shared disk offline, it returns 1 because the shared disk is not found.
Return value = 4 (partial stop)
Some of the JP1/Base processes have stopped for some reason. Judge this as a failure (for UNIX).^#
Return value = 8 (all stopped)
All processes of JP1/Base have stopped for some reason. Judge this as a failure.
Return value = 12 (error but retry possible)
While the jbs_spmd_status command is checking the operating status, an error has occurred which can be recovered by retry. Retry checking the operating status up to a specified number of times. For the jevstat command, this return value indicates an error for which retry is not possible.

Kill Kill JP1/Base and release the resources it has been using.

Command
jbs_killall.cluster logical-host-name

When you execute the jbs_killall.cluster command, each process is forcibly stopped without performing any processing for stopping JP1/Base.

Note

Stop JP1/Base using the stop command before executing the kill command. Use the kill command only when a problem has occurred, for example, when executing the stop command cannot terminate processing.

# In Windows, operation differs from that in UNIX due to the relationship with service control by Windows. If some processes have stopped in Windows, the JP1 process management automatically stops all the processes, placing the service into the stopped state. You can determine a failure by detecting the stop of the service or when the jbs_spmd_status command returns a value of 8.

Remarks: Restarting JP1

If a JP1 failure is detected in a cluster system, the primary server might restart JP1 to attempt recovery before it performs a failover to the secondary server.

In such a case, we recommend that you use the clustering software control to restart JP1 rather than restarting by JP1 process management.

The clustering software attempts to restart JP1 after a failure is detected, so that it might prevent the normal operation of the JP1 restart functionality. To ensure a more reliable restart, restart JP1 under the control of the clustering software.

Functionality	Description
Start	Start JP1/Base. Command jbs_start.cluster logical-host-name End timing of the start command The start command ends after JP1/Base is started. If starting JP1/Base does not complete for any reason after the timeout period (typically 60 seconds) elapsed the command ends before JP1/Base is started. In such a case, an attempt to start JP1/Base is not suspended; the command ends but an attempt to start JP1/Base continues. Result start judgment for the start command The result of starting JP1/Base should be determined by the operation monitor method described below. Usually, the operation monitor functionality of the clustering software is used. The return value of the start command cannot be used for judgment because it is either 0 (normal end) or 1 (abnormal argument).
Stop	Stop JP1/Base. Command jbs_stop.cluster logical-host-name End timing of the stop command The stop command ends after JP1/Base is stopped. If stopping JP1/Base does not complete for any reason after the timeout period (typically 60 seconds) elapsed, the command ends before JP1/Base is stopped. In such a case, the attempt to stop JP1/Base is not suspended; the command ends but the attempt to stop JP1/Base continues. Result judgment for the stop command The result of stopping JP1/Base should be determined by the operation monitor method described below. The return value of the stop command cannot be used for judgment because it is either 0 (normal end) or 1 (abnormal argument). Remarks: We recommend that you execute the kill command, described below, after the stop command ends. This ensures that the process terminates regardless of any problem, thus preventing failovers from failing.
Operation monitoring	Use the return values from the jbs_spmd_status and jevstat commands to monitor whether JP1/Base is operating normally. These commands judge the operating status based on whether each process is running or not. Some clustering software does not support this functionality. Register this functionality only when a failover is required upon a failure in JP1/Base. Command jbs_spmd_status -h logical-host-name jevstat logical-host-name Result judgment for operation monitoring The return values have the following meanings: Return value = 0 (all operating) JP1/Base is operating normally. Return value = 1 (error) An unrecoverable error has occurred. Judge this as a failure. Note If you execute the jbs_spmd_status command on the secondary node with the shared disk offline, it returns 1 because the shared disk is not found. Return value = 4 (partial stop) Some of the JP1/Base processes have stopped for some reason. Judge this as a failure (for UNIX).^# Return value = 8 (all stopped) All processes of JP1/Base have stopped for some reason. Judge this as a failure. Return value = 12 (error but retry possible) While the jbs_spmd_status command is checking the operating status, an error has occurred which can be recovered by retry. Retry checking the operating status up to a specified number of times. For the jevstat command, this return value indicates an error for which retry is not possible.
Kill	Kill JP1/Base and release the resources it has been using. Command jbs_killall.cluster logical-host-name When you execute the jbs_killall.cluster command, each process is forcibly stopped without performing any processing for stopping JP1/Base. Note Stop JP1/Base using the stop command before executing the kill command. Use the kill command only when a problem has occurred, for example, when executing the stop command cannot terminate processing.

[Contents][Back][Next]

[Trademarks]