Hitachi

JP1 Version 12 JP1/Automatic Job Management System 3 Administration Guide


12.2.4 Procedure for recovery from a database failure

The following figure shows an overview of recovery from a database failure.

Figure 12‒3: Overview of recovery from a database failure

[Figure]

The following shows the procedure for restoring the system if a failure occurs in instances of the external database.

  1. Stop the scheduler service or the JP1/AJS3 service.

    If you restore a database instance that is used as the agent management database, you must stop the JP1/AJS3 service beforehand.

    If you restore only the database instances for a scheduler service, stop all scheduler services that are connected to the database instances in which a failure occurred. For details on how to stop a scheduler service, see 7.5.2 Stopping the scheduler service.

  2. Restore all database instances in which a failure occurred.

    For details about how to restore a database, see the documentation for the cloud environment that you are using.

  3. Run the jajs_extdb command to check the scheduler services used on the manager host and the database instances in use.

    Execute the following command:

    jajs_extdb -v
  4. Start the restored database instances.

  5. Execute the jajs_extdb command with the -p option specified.

    To restore the database that is managed by the agent, execute the following command, and then set up the connection between the manager host and the agent management database again.

    jajs_extdb -p -a

    To restore the database of the scheduler service, execute the following command for all of the scheduler services in step 3, and then set up a connection between the scheduler service and the database again.

    jajs_extdb -p -F scheduler-service-name
  6. Start the scheduler service or JP1/AJS3 service.

    If you have restored the database instance that is used as the agent management database, you must start the JP1/AJS3 service.

    If you have restored only the database instances for a scheduler service, start only the scheduler service that you stopped in step 1. For details about how to start a scheduler service, see 7.5.1 Starting the scheduler service.

    Note that you need to perform the following steps only if you have restored the database instances for a scheduler service.

  7. Check the status of jobnets and jobs that existed when the failure occurred, and then change the status or rerun the jobnets and jobs as necessary.

    When the jajs_extdb command is run, the scheduler service starts in disaster-recovery mode. In this case, the scheduler service starts in the status in which execution is suppressed, and the statuses of jobs and jobnets are changed. Use JP1/AJS3 - View or a command to check the status of each job, and then, if necessary, change the statuses of jobs or rerun jobs. Pay particular attention to the statuses of jobs that were running when the failure occurred. Discontinuation of JP1/AJS3 operation might have prevented the restored environment from inheriting the statuses of such jobs.

    For details about the status in which jobnets and jobs are placed when JP1/AJS3 is started in disaster-recovery mode, see the description of the statuses of jobnets and jobs when a disaster recovery start is performed in 6.2.1(3)(a) Statuses when a JP1/AJS3 service on the manager host is restarted.

    For details about how to check and change job execution statuses and how to rerun a job, see the JP1/Automatic Job Management System 3 Operator's Guide.

    When monitoring of a start condition has stopped, generations in Wait for start cond. status that were started by the start condition are processed as follows depending on the status of the waiting generation:

    • A generation of a jobnet is in Wait for start cond. status despite the start conditions being satisfied because concurrent execution is disabled and another instance of the jobnet is already running

      For the generations of a jobnet in Wait for start cond. status, the status remains the same in the restored environment. For the generation that is already running, the status changes to Interrupted. After the status of this generation changes to Interrupted, execution of the generations in Wait for start cond. status will restart.

    • Start conditions are not satisfied or only partially satisfied

      After the failure is corrected, all generations of this jobnet disappear.

  8. Delete the manager host information that is related to event jobs and remains on the agent host.

    Run the following command on the manager host to delete the manager host information that existed when the failure occurred and remains on the agent host.

    For the -a option, specify the execution host name that corresponds to the target execution agent name.

    jpomanevreset -h logical-host-name -F scheduler-service-name -dh manager-host-name -a execution-host-name

    If the manager host to be recovered is a physical host, you do not need to specify the -h option.

    If the manager host to be recovered is a logical host, specify the manager host name of the logical host for the -h option and -dh option.

  9. Run the jajs_extdb command to clear the execution-suppression start status of the scheduler service.

    Run the following command for each of the scheduler services that you checked in step 3:

    jajs_extdb -r -F scheduler-service-name

    For details about the jajs_extdb command, see jajs_extdb in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

  10. Run the ajsalter command to cancel suppression of the execution of jobs and jobnets, and then resume operations.

    Run the following command for each of the scheduler services that you checked in step 3:

    ajsalter -F scheduler-service-name -s none

    For details about the ajsalter command, see ajsalter in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.