Hitachi

JP1 Version 12 JP1/Automatic Job Management System 3 Administration Guide


12.2.2 Procedure for recovery from a failure in a JP1/AJS3 system that connects to an external database

The following figure shows an overview of recovery from a failure occurring in a JP1/AJS3 system that connects to an external database.

Figure 12‒2: Overview of recovery from a failure occurring in a JP1/AJS3 system that connects to an external database

[Figure]

The following shows the procedure for recovery from a failure in a JP1/AJS3 system that connects to an external database.

  1. Restore the JP1/AJS3 system from a backup of a JP1/AJS3 system that has been set up completely.

  2. Apply the environment information that you checked on the manager host before the failure occurred to the restored JP1/AJS3 system.

    For details about the environment information to be checked on the manager host before a failure occurs, see 12.2.1(2) Checking the environment information of the manager host.

  3. Run the jajs_extdb command to check the scheduler services used on the manager host and the databases in use.

    Execute the following command.

    jajs_extdb -v

    For details about the jajs_extdb command, see jajs_extdb in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

  4. Start all the databases that you checked in step 3.

  5. Run the jajs_extdb command with the -p option specified to set up the connection between the manager host and the agent management database again.

    Execute the following command.

    jajs_extdb -p -a
  6. Run the jajs_extdb command with the -p option specified to set up a connection between the scheduler service and the database again.

    Run the following command for each of the scheduler services that you checked in step 3:

    jajs_extdb -p -F scheduler-service-name
  7. Start the JP1/AJS3 services.

  8. Check the jobnets' and jobs' statuses existing when the failure occurred, and then change the statuses or rerun the jobnets and jobs as necessary.

    When the jajs_extdb command is run, the scheduler service starts in disaster-recovery mode. In this case, the scheduler service starts in the status in which execution is suppressed, and the statuses of jobs and jobnets are changed. Use JP1/AJS3 - View or a command to check the status of each job, and then, if necessary, change the statuses of jobs or rerun jobs. Pay particular attention to the statuses of jobs that were running when the failure occurred. Discontinuation of JP1/AJS3 operation might have prevented the restored environment from inheriting the statuses of such jobs.

    For details about the status in which jobnets and jobs are placed when JP1/AJS3 is started in disaster-recovery mode, see the description of the statuses of jobnets and jobs when a disaster recovery start is performed in 6.2.1(3)(a) Statuses when a JP1/AJS3 service on the manager host is restarted.

    For details about how to check and change job execution statuses and how to rerun a job, see the JP1/Automatic Job Management System 3 Operator's Guide.

    When monitoring of a start condition has stopped, generations in Wait for start cond. status that were started by the start condition are processed as follows depending on the status of the waiting generation:

    • A generation of a jobnet is in Wait for start cond. status despite the start conditions being satisfied because concurrent execution is disabled and another instance of the jobnet is already running

      For the generations of a jobnet in Wait for start cond. status, the status remains the same in the restored environment. For the generation that is already running, the status changes to Interrupted. After the status of this generation changes to Interrupted, execution of the generations in Wait for start cond. status will restart.

    • Start conditions are not satisfied or only partially satisfied

      After the failure is corrected, all generations of this jobnet disappear.

  9. Delete the manager host information that is related to event jobs and remains on the agent host.

    Run the following command on the manager host to delete the manager host information that existed when the failure occurred and remains on the agent host.

    For the -a option, specify the execution host name that corresponds to the target execution agent name.

    jpomanevreset -h logical-host-name -F scheduler-service-name -dh manager-host-name -a execution-host-name

    If the manager host to be recovered is a physical host, you do not need to specify the -h option.

    If the manager host to be recovered is a logical host, specify the manager host name of the logical host for the -h option and -dh option.

  10. Run the jajs_extdb command to clear the execution-suppression start status of the scheduler service.

    Run the following command for each of the scheduler services that you checked in step 3:

    jajs_extdb -r -F AJSROOT1
  11. Run the ajsalter command to cancel suppression of the execution of jobs and jobnets, and then resume operations.

    Run the following command for each of the scheduler services that you checked in step 3:

    ajsalter -F scheduler-service-name -s none

    For details about the ajsalter command, see ajsalter in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.