Hitachi

Job Management Partner 1 Version 10 Job Management Partner 1/Integrated Management - Manager Configuration Guide


6.3.4 Registering into the cluster software (for UNIX)

To apply cluster operation to JP1/IM - Manager, you must register JP1/IM - Manager and JP1/Base on the logical host into the cluster software, and then set them to be started and terminated by the cluster software.

Start services in the order of resources, JP1/Base, and JP1/IM - Manager.

Organization of this subsection

(1) Creating a script to be registered into the cluster software

When you use UNIX cluster software, you normally use a method such as a script to create a tool to control applications, and then register the script into the cluster software. In general, such a script must provide the start, stop, operation monitoring, and forced termination functions.

This subsection describes the JP1/IM - Manager information that is needed to design a script. You use this information to create a script that controls JP1/IM - Manager according to the cluster software specifications, and then you register the script into the cluster software.

Table 6‒14: Detailed information for script design in cluster registration

Function to be registered

Description

Start

Starts JP1/IM - Manager.

  • Command to be used

    jco_start.cluster logical-host-name

  • Start command termination timing

    The start command waits for JP1/IM - Manager to start before it terminates itself. However if the startup processing is not completed within the timeout period (60 seconds is the default) due to some problem, the command terminates without completing the startup processing. In such a case, the command terminates with the startup processing still underway (the command does not cancel the startup processing).

  • Check the start command result

    The script should determine the result of starting JP1/IM - Manager by the operation monitoring method described below. Normally, the result is determined by the cluster software's operation monitoring. The return value of the start command is 0 (normal termination) or 1 (argument error). Therefore, the result cannot be determined from the return value.

Stop

Terminates JP1/IM - Manager.

  • Command to be used

    jco_stop.cluster logical-host-name

  • Stop command termination timing

    The stop command waits for JP1/IM - Manager to terminate before it terminates itself. However if the stop processing is not completed within the timeout period (60 seconds is the default) due to some problem, the command terminates without completing the stop processing. In such a case, the command terminates with the stop processing still underway (the command does not cancel the stop processing).

  • Check the stop command result

    The script should determine the result of terminating JP1/IM - Manager by the operation monitoring method described below. The return value of the stop command is 0 (normal termination) or 1 (argument error). Therefore, the result cannot be determined from the return value.

We recommend that you execute the forced termination command described below after the stop command has terminated. This enables you to terminate the process and prevent a failover error even in the event of a problem.

JP1/IM - Manager operation monitoring#1

Monitors normal operation of JP1/IM - Manager.

  • Command to be used

    jco_spmd_status -h logical-host-name

To determine whether JP1/IM - Manager is running normally, check the return value of the jco_spmd_status command. This command determines the status from the operating status of each process.

Some cluster software does not provide the operation monitoring function. If there is no need to perform failover in the event of a JP1/IM - Manager failure, do not register this function.

  • Check the operation monitoring result

    The following explains how to interpret the return value:

    Return value = 0 (all running):

    JP1/IM - Manager is running normally.

    Return value = 1 (error):

    An unrecoverable error occurred. Treat this as a failure.

    Note: If you were to execute the jco_spmd_status command at the secondary server whose shared disk is offline, the return value will be 1 because the shared disk is not available.

    Return value = 4 (partially stopped):

    Some JP1/IM - Manager processes are stopped due to a problem. Treat this as a failure.

    Return value = 8 (all stopped):

    All JP1/IM - Manager processes are stopped due to a problem. Treat this as a failure.

    Return value = 12 (retriable error):

    While the jco_spmd_status was checking the operating status, an error that can be recovered by retries has occurred. Retry checking the operating status as many times as specified.

IM database operation status checking#2

Checks to see if the IM databases are running normally.

  • Command to be used

    jimdbstatus -h logical-host-name

To determine the operating status, check the return value of the jimdbstatus command.

  • Check the operating status result

    The following explains how to interpret the return value:

    Return value = 0: Running

    Return value = 1: The jimdbstatus command terminated abnormally.

    Return value = 4: Start or stop processing is underway.

    Return value = 8: Stopped (IM database is in restart-interrupted status and is unstable)

    Return value = 12: Stopped (stopped normally)

    Return value = 16: Not running (Only the Windows service control program is running. IM database core is not running.)

    Return value = 20: Installed HiRDB has not been set up.

    Return values 1 and 4 are subject to retries. Return values 8 and above indicate an error and are subject to failover.

Forced termination

Forcibly terminates JP1/IM - Manager and releases the current resources.

  • Command to be used

    jco_killall.cluster logical-host-name

The jco_killall.cluster command forcibly terminates each process without performing JP1/IM - Manager termination processing.

Note:

Before you execute forced termination, use the stop command to terminate JP1/IM - Manager.

#1

The commands used for JP1 operations related to operation checking are the same between UNIX and Windows, but the operations are different.

Windows operations differ from UNIX operations due to their association with Windows service control. In Windows, when some of the processes terminate, the JP1 process management terminates each process automatically and places the service in stopped status. Treat service stop as an error or detect an error when a command such as jco_spmd_status returns a value of 8.

#2

Executed when the IM databases are used.

Reference note

About JP1 restart

When a JP1 failure is detected in a cluster operation system, restart of JP1 may be retried at the same server before failover to the secondary server is executed.

In such a case, do not perform restart using JP1 process management.

The cluster software attempts restart after detection of the JP1 failure. Depending on the nature of the failure, JP1's restart function may be affected and normal operation may not be achieved. To restart JP1 successfully, use the cluster software to restart JP1.

(2) Setting the resource start and stop sequence

To execute JP1/IM - Manager and JP1/Base on the logical host, the shared disk and logical IP addresses must be available for use.

Set the start and stop sequence or resource dependencies in such a manner that they are controlled by the cluster software as shown below.