Hitachi

JP1 Version 12 JP1/Automatic Job Management System 3 Administration Guide


11.2.3 Returning operation to the main site

This subsection describes how to resume operation at the rebuilt main site. The flow of this procedure is shown in the following figure.

Figure 11‒18: Flow of resuming operation at main site

[Figure]

Organization of this subsection

(1) Tasks for resuming operation

The following explains the tasks to be performed before resuming operation, and the procedure for resuming operation.

Cautionary notes
  • The following procedure is applicable whether or not the logical host names at the main site and remote site are the same.

  • If the logical host names at the main site and remote site are the same, specify the logical host name of the main site for logical-host-at-remote-site as a command argument in the procedure.

(a) Tasks to be performed before resuming operation

If the logical host of JP1/Base is used on the same shared disk as the logical host of JP1/AJS3, you need to perform operations for JP1/Base before performing the JP1/AJS3 operation-switch procedure. The procedures for JP1/Base vary depending on whether the main site and remote site use the same logical host names and IP addresses.

The operations for each case are described below. The logical hosts of JP1/Base and JP1/AJS3 can be used on the same shared disk only when JP1/Base is used as a prerequisite product for JP1/AJS3 only.

■ If the main site and remote site use the same logical host names and IP addresses

If the event database is inherited, operations for JP1/Base are not required. However, the event database might be damaged due to a disaster. Recovery operations might be required depending on the situation. The actions to be taken are described for each message that is output to the integrated trace log. Follow the procedure and perform the recovery operations. If the event database is not inherited, initialize the event database by following the procedure described in If KAJP1059-E is displayed.

If KAJP1059-E is displayed:

The event service cannot start because the event database is damaged. Execute the following command to initialize the event DB. The initialization deletes the data of the event database.

jevdbinit {-b|-n}

Executing the jevdbinit command deletes the event database and then creates it again. The serial numbers in the event database before deletion are inherited. If you want to backup the damaged event database before initialization, specify the -b option. If you do not want to backup the database, specify the -n option. You can check the contents of the backup database by using the jevexport command to output the contents to a csv file. For details about the jevdbinit command and the database backup, see the descriptions of the jevdbinit command in the JP1/Base User's Guide.

When the jevdbinit command is executed, if the serial numbers in the event dataset cannot be inherited, the initialization fails. If the KAJP1789-E message is displayed, specify 0 for the start number to be specified by the -s option, and then create the event database gain.

jevdbinit -s 0 {-b | -n}
If KAJP1057-W or KAJP1058-W is displayed:

The event database might contain invalid records and the search performance might degrade. When restart occurs, initialize the event database by following the procedure described in If KAJP1059-E is displayed.

If KAJP1075-W is displayed:

The repetition prevention table is in an invalid state. Stop the event service, and then execute the jevdbmkrep command to reconfigure the repetition prevention table. For details about the jevdbmkrep command, see the descriptions of the jevdbmkrep command in the JP1/Base User's Guide.

■ If the main site and remote site use the same logical host names and different IP addresses

Perform the following operations:

  • Change the settings of JP1/Base according to the IP address change of the logical host.

  • Initialize the JP1/Base event database.

For details about the operations to change the settings of JP1/Base, see the operations when changing IP addresses in the JP1/Base User's Guide. For details about how to initialize the event database of JP1/Base, see the procedure described in If KAJP1059-E is displayed in If the main site and remote site use the same logical host names and IP addresses.

■ If the main site and remote site use different logical host names and IP addresses

Perform the following operations:

  • Change the settings of JP1/Base according to the change in the host names and the IP addresses of the logical host.

  • Initialize the JP1/Base event database.

For details about the operations to change the settings of JP1/Base, see the operations when changing host names and IP addresses in the JP1/Base User's Guide. For details about how to initialize the event database of JP1/Base, see the procedure described in If KAJP1059-E is displayed in If the main site and remote site use the same logical host names and IP addresses.

(b) Procedure for resuming operation

The procedure for resuming operation at the main site is shown below. In a non-cluster environment, perform only the tasks for the primary node.

  1. Make sure that no jobs are running on the primary or secondary nodes of the remote site.

    Make sure that no jobs are running.

    Be sure to also terminate jobnet connectors that are controlling the order of jobnet execution between different scheduler services, and the jobnets at the forwarding destination hosts of those jobnet connectors.

  2. Stop JP1/AJS3 - Manager on the primary and secondary nodes of the remote site.

    If the logical host is attached (connected) to the queueless agent service of the remote site, execute the following command to detach (disconnect) it:

    ajsqldetach -h logical-host-at-remote-site -k

    For details about the ajsqldetach command, see ajsqldetach in 4. Commands Used for Special Operation in the manual JP1/Automatic Job Management System 3 Command Reference.

    After JP1/AJS3 - Manager is stopped, use hardware operations to stop the copy process between the shared disks and write-enable the main volume at the main site.

  3. On the primary node at the main site, set up the logical host that is to operate as the main site.

    Execute the following command:

    jajs_rpsite -m CHANGE -h logical-host-at-main-site

    After changing the definition of the main site, initiate the copy process from the main site to the remote site by way of a hardware operation.

  4. If the logical host names at the main and remote sites are different, remove the logical host name of the remote site from the event action control agent.

    Perform this step only when the logical host names at the main and remotes site are different.

    If you executed event jobs on the logical host of the remote site, the event action control agent retains the logical host name of the remote site.

    Therefore, if the logical host names at the main and remote sites are different, clear the logical host name of the remote site from the event action control agent on the logical host at the main site. Clear the logical host name before you return operation to the main site.

    To do so, perform the following steps on the shared agent or logical host of the main site.

    (1) Display a list of manager host names recorded by the event action control agent.

    Execute the following command to check whether the event action control agent has retained the logical host name of the remote site.

    Execute the command on the logical host at the main site:

    jpoagoec -p -h logical-host-at-main-site

    Execute the command on the shared agent (physical host):

    jpoagoec -p

    Execute the command on the shared agent (logical host):

    jpoagoec -p -h logical-host

    (2) If the logical host name of the remote site appears in the command output in step (1), execute the following command:

    Execute the command on the logical host at the main site:

    jpoagoec -d logical-host-at-remote-site -h logical-host-at-main-site

    Execute the command on the shared agent (physical host):

    jpoagoec -d logical-host-at-remote-site

    Execute the command on the shared agent (logical host):

    jpoagoec -d logical-host-at-remote-site -h logical-host

    For details about the jpoagoec command, see jpoagoec in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

  5. Check the status of ISAM files in the database of the execution environment for QUEUE jobs and submit jobs. If necessary, re-create the ISAM files.

    For operation in the standard configuration, you must confirm that the status of the ISAM files copied to the remote site is valid. For details about how to check the status of ISAM files, see 2.11.1 Procedure for checking the status of ISAM files in the manual JP1/Automatic Job Management System 3 Troubleshooting.

    If the status of the ISAM files is invalid, a problem such as a job startup failure might disable further operation on the remote site. If such a problem occurs, re-create the ISAM files according to 2.11.2 Procedure for re-creating the execution environment database for QUEUE jobs and submit jobs in the manual JP1/Automatic Job Management System 3 Troubleshooting. However, use the following procedure to restart the JP1/AJS3 service.

  6. On the primary node at the main site, perform a disaster recovery restart of JP1/AJS3 - Manager.

    In disaster-recovery mode, JP1/AJS3 - Manager starts with job execution suppressed.

    To perform a disaster recovery start of JP1/AJS3 - Manager:

    In Windows:

    Run JP1/AJS3 with -disaster specified as a startup parameter. For details about how to temporarily change the startup behavior of JP1/AJS3, see 6.2.1 Temporarily changing the start mode of JP1/AJS3.

    In UNIX:

    Execute the following command:

    jajs_spmd -h logical-host-at-main-site -disaster

    For details about the jajs_spmd command, see jajs_spmd in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

    If you specify a hot start or warm start the first time you start the JP1/AJS3 services after switching sites, the startup mode automatically changes to a disaster recovery start.

  7. After starting JP1/AJS3 - Manager, perform the necessary tasks on the primary node at the main site.

    Prepare the JP1/AJS3 system for operation at the main site, by altering the definition of the execution agent to suit the main site environment and checking job statuses.

    The tasks you need to perform after starting JP1/AJS3 - Manager in disaster recovery mode at the main site are the same as those performed when switching operation to the remote site. With reference to 11.2.1(2) Tasks to perform after JP1/AJS3 - Manager startup, configure the main site according to how you use the system.

  8. Cancel the suppression of job execution on the primary node at the main site.

    Execute the following command to cancel the suppression of job execution.

    ajsalter -s none -F scheduler-service-name

    After canceling the suppression of job execution, resume work tasks at the main site.