Hitachi

JP1 Version 12 JP1/Automatic Job Management System 3 Administration Guide


11.2.1 Switching operation to the remote site

If a large-scale disaster or other unexpected event renders the main site inoperative, system operation switches to the remote site and work task activity resumes.

This subsection describes the procedure for switching operation from the main site to the remote site.

The following figure shows the flow of switching operation from the main site to the remote site:

Figure 11‒15: Flow of switching operation to remote site

[Figure]

In the interval after operation is switched but before copying between shared disks resumes, remember to take backups of the JP1/AJS3 data to prepare for any contingency. Until the main site is rebuilt, back up the JP1/AJS3 system at the remote site and restore it from the backup information if the system fails. For details about backup and recovery, see 2. Backup and Recovery.

Organization of this subsection

(1) Tasks for switching operation

The following explains the tasks to be performed before switching operation and the procedure for switching operation.

Cautionary notes
  • The following procedure is applicable whether or not the logical host names at the main site and remote site are the same.

  • If the logical host names at the main site and remote site are the same, specify the logical host name of the main site for logical-host-at-remote-site as a command argument in the procedure.

(a) Tasks to be performed before switching operation

If the logical host of JP1/Base is used on the same shared disk as the logical host of JP1/AJS3, you need to perform operations for JP1/Base before performing the JP1/AJS3 operation-switch procedure. The procedures for JP1/Base vary depending on whether the main site and remote site use the same logical host names and IP addresses.

The operations for each case are described below. The logical hosts of JP1/Base and JP1/AJS3 can be used on the same shared disk only when JP1/Base is used as a prerequisite product for JP1/AJS3 only.

■ If the main site and remote site use the same logical host names and IP addresses

If the event database is inherited, operations for JP1/Base are not required. However, the event database might be damaged due to a disaster. Recovery operations might be required depending on the situation. The actions to be taken are described for each message that is output to the integrated trace log. Follow the procedure and perform the recovery operations. If the event database is not inherited, initialize the event database, according to the procedure described in If KAJP1059-E is displayed.

If KAJP1059-E is displayed:

The event service cannot start because the event database is damaged. Execute the following command to initialize the event DB. The initialization deletes the data of the event database.

jevdbinit {-b | -n}

Executing the jevdbinit command deletes the event database and then creates it again. The serial numbers in the event database before deletion are inherited. If you want to backup the damaged event database before initialization, specify the -b option. If you do not want to backup the database, specify the -n option. You can check the contents of the backup database by using the jevexport command to output the contents to a csv file. For details about the jevdbinit command and the database backup, see the descriptions of the jevdbinit command in the JP1/Base User's Guide.

When the jevdbinit command is executed, if the serial numbers in the event dataset cannot be inherited, the initialization fails. If the KAJP1789-E message is displayed, specify 0 for the start number to be specified by the -s option, and then create the event database gain.

jevdbinit -s 0 {-b | -n}
If KAJP1057-W or KAJP1058-W is displayed:

The event database might contain invalid records and the search performance might degrade. When restart occurs, initialize the event database by following the procedure described in If KAJP1059-E is displayed.

If KAJP1075-W is displayed:

The repetition prevention table is in an invalid state. Stop the event service, and then execute the jevdbmkrep command to reconfigure the repetition prevention table. For details about the jevdbmkrep command, see the descriptions of the jevdbmkrep command in the JP1/Base User's Guide.

■ If the main site and remote site use the same logical host names and different IP addresses

Perform the following operations:

  • Change the settings of JP1/Base according to the IP address change of the logical host.

  • Initialize the JP1/Base event database.

For details about the operations to change the settings of JP1/Base, see the operations when changing IP addresses in the JP1/Base User's Guide. For details about how to initialize the event database of JP1/Base, see the procedure described in If KAJP1059-E is displayed in If the main site and remote site use the same logical host names and IP addresses above.

■ If the main site and remote site use different logical host names and IP addresses

Perform the following operations:

  • Change the settings of JP1/Base according to the change in the host names and the IP addresses of the logical host.

  • Initialize the JP1/Base event database.

For details about the operations to change the settings of JP1/Base, see the operations when changing host names and IP addresses in the JP1/Base User's Guide. For details about how to initialize the event database of JP1/Base, see the procedure described in If KAJP1059-E is displayed in If the main site and remote site use the same logical host names and IP addresses.

(b) Procedure for switching operation

When switching operation, stop the copy processing between shared disks by performing a hardware operation, and then change the status of the remote site so that data can be written to the remote volume. To switch operation:

  1. On the primary node of the remote site, configure a logical host to assume the role of the main site host.

    As a user with administrator permissions (in Windows) or a superuser (in UNIX), log in to the host at the remote site and execute the following command:

    jajs_rpsite -m CHANGE -h logical-host-at-remote-site
  2. If the logical host names at the main and remote sites are different, remove the logical host name of the main site from the event action control agent.

    Perform this step only when the logical host names at the main and remotes site are different.

    If an event job was running that targeted the logical host of the main site, the event action control agent will retain the logical host name of the main site.

    If an event job was running that targeted an agent in an environment in which the agent is shared by the main site and the remote site, the event action control agent on the agent host will retain the logical host name of the main site.

    Therefore, if the logical host names at the main and remote sites are different, you must remove the logical host name of the main site from the event action control agent on the host running the event job.

    To remove the name, perform the following operations on the shared agent or on the logical host at the remote site:

    (1) Display a list of manager host names recorded by the event action control agent.

    Execute the following command to check whether the event action control agent has recorded the logical host name of the main site:

    Execute the command on the logical host at the remote site:

    jpoagoec -p -h logical-host-at-remote-site

    Execute the command on the shared agent (physical host):

    jpoagoec -p

    Execute the command on the shared agent (logical host):

    jpoagoec -p -h logical-host

    (2) If the logical host name of the main site appears in the list displayed in step (1), execute the following command:

    Execute the command on the logical host at the remote site:

    jpoagoec -d logical-host-at-main-site -h logical-host-at-remote-site

    Execute the command on the shared agent (physical host):

    jpoagoec -d logical-host-at-main-site

    Execute the command on the shared agent (logical host):

    jpoagoec -d logical-host-at-main-site -h logical-host

    For details about the jpoagoec command, see jpoagoec in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

  3. Check the status of ISAM files in the database of the execution environment for QUEUE jobs and submit jobs. If necessary, re-create the ISAM files.

    For operation in the standard configuration, you must confirm that the status of the ISAM files copied to the remote site is valid. For details about how to check the status of ISAM files, see 2.11.1 Procedure for checking the status of ISAM files in the manual JP1/Automatic Job Management System 3 Troubleshooting.

    If the status of the ISAM files is invalid, a problem such as a job startup failure might disable further operation on the remote site. If such a problem occurs, re-create the ISAM files according to 2.11.2 Procedure for re-creating the execution environment database for QUEUE jobs and submit jobs in the manual JP1/Automatic Job Management System 3 Troubleshooting. However, use the following procedure to restart the JP1/AJS3 service.

  4. Start JP1/AJS3 - Manager on the primary node at the remote site.

    (i) When performing a disaster recovery start of JP1/AJS3 - Manager

    A disaster recovery start refers to starting JP1/AJS3 - Manager with job execution suspended.

    For some jobs, the job execution status at the main site is not carried over when you start JP1/AJS3 - Manager at the remote site. By starting JP1/AJS3 - Manager with job execution temporarily suspended, you can change the status of jobs and jobnets or rerun them as needed.

    To perform a disaster recovery start of JP1/AJS3 - Manager:

    • In Windows:

      Run JP1/AJS3 with -disaster specified as a startup parameter. For details about how to temporarily change the startup behavior of JP1/AJS3, see 6.2.1 Temporarily changing the start mode of JP1/AJS3.

    • In UNIX:

      Execute the following command:

      jajs_spmd -h logical-host-at-remote-site -disaster

      For details about the jajs_spmd command, see jajs_spmd in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

    If you specify a hot start or warm start the first time you start the JP1/AJS3 services after switching sites, the startup mode automatically changes to a disaster recovery start. At this time, execute the jajs_spmd_status command to confirm that all processes are running normally.

    If an error occurs during the startup of JP1/AJS3 - Manager, depending on the timing of the error, the JP1/AJS3 service might stop or the scheduler service might stop immediately after the JP1/AJS3 has started and a disaster recovery start might be canceled. In this case, correct the cause of the error, and then by either method shown below to explicitly restart the service by using a disaster recovery start:

    To restart the JP1/AJS3 service:

    If you are using Windows, start JP1/AJS3 with the -disaster option specified for the startup parameter. If you are using UNIX, execute the jajs_spmd command with the -disaster option specified.

    To restart only the scheduler service:

    Execute the jajs_spmd -n jajs_schd command with the -disaster option specified. Alternatively, execute the ajsstart command with the -D option specified.

    (ii) When performing a cold start of JP1/AJS3 - Manager

    The procedure for cold-starting JP1/AJS3 - Manager depends on the system configuration. For details about the configuration of a disaster recovery system, see 11.1.3 Configuration of a disaster recovery system compatible with JP1/AJS3.

    • When using the same agent hosts (with agent sharing)

      In Windows:

      Run JP1/AJS3 with -cold specified as a startup parameter. For details about how to temporarily change the startup behavior of JP1/AJS3, see 6.2.1 Temporarily changing the start mode of JP1/AJS3.

      In UNIX:

      Execute the following command:

      jajs_spmd -h logical-host-at-remote-site -cold

      For details about the jajs_spmd command, see jajs_spmd in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

    • When using different agent hosts (without agent sharing)

      Perform the following operations:

      (1) Configure JP1/AJS3 so that the scheduler service does not start automatically when JP1/AJS3 - Manager starts.

      Specify no as the value of the AUTOSTART environment setting parameter for the scheduler service running on the logical host at the remote site.

      If the logical host is configured to run multiple instances of the scheduler service, specify no for the AUTOSTART environment setting parameter of each scheduler service.

      This step is not required if no is already specified for the scheduler service.

      For details about the AUTOSTART environment setting parameter, see 20.4 Setting up the scheduler service environment in the JP1/Automatic Job Management System 3 Configuration Guide.

      (2) Start JP1/AJS3 - Manager.

      In Windows:

      Run JP1/AJS3 with -cold specified as a startup parameter. For details about how to temporarily change the startup behavior of JP1/AJS3, see 6.2.1 Temporarily changing the start mode of JP1/AJS3.

      In UNIX:

      Execute the following command:

      jajs_spmd -h logical-host-at-remote-site -cold

      For details about the jajs_spmd command, see jajs_spmd in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

      (3) Delete the information retained by the event action control manager.

      Execute the following command:

      jpomanevreset -h logical-host-at-remote-site -F scheduler-service-name -all -s

      If the logical host is configured to run multiple instances of the scheduler service, execute the command for each scheduler service.

      For details about the jpomanevreset command, see jpomanevreset in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

      (4) Execute the following command to start the scheduler services.

      jajs_spmd -h logical-host-at-remote-site -n jajs_schd -F scheduler-service-name -cold

      If the logical host is configured to run multiple instances of the scheduler service, execute the command for each scheduler service.

      For details about the jajs_spmd command, see jajs_spmd in 3. Commands Used for Normal Operations in the manual JP1/Automatic Job Management System 3 Command Reference.

      (5) Configure the scheduler services to start automatically again.

      Change the value of the AUTOSTART environment setting parameter back to yes (start automatically) for the scheduler services running on the logical host at the remote site.

      This step is not required for scheduler services for which no change was made in step (1).

  5. After starting JP1/AJS3 - Manager, perform the necessary tasks on the primary node at the remote site.

    Prepare the JP1/AJS3 system for operation at the remote site by altering the definition of the execution agent to suit the remote site environment and checking job statuses.

    For details about the tasks you need to perform after starting JP1/AJS3 - Manager, see 11.2.1(2) Tasks to perform after JP1/AJS3 - Manager startup.

  6. Cancel the suppression of job execution at the remote site.

    Execute the following command to cancel the suppression of job execution.

    ajsalter -s none -F scheduler-service-name

    After canceling the suppression of job execution, resume work tasks at the remote site.

    You do not need to perform this step if you elected to perform a cold start of JP1/AJS3 - Manager.

(2) Tasks to perform after JP1/AJS3 - Manager startup

This subsection describes the tasks that you need to perform after starting JP1/AJS3 - Manager at the remote site when switching operation.

The tasks you need to perform after starting JP1/AJS3 - Manager differ depending on whether the logical host names at the main and remote sites are the same.

Carry out the tasks listed in the following table to suit your particular system.

Table 11‒2: Tasks to perform after JP1/AJS3 - Manager starts

No.

Task

Description

Logical host names are different

Logical host names are the same

Reference

1

Changing the connection-target logical host

Change the logical host to which JP1/AJS3 - View and other products connect to the logical host of the remote site.

Y

C#1

See (a) Changing the connection-target logical host

2

Changing the execution host names of execution agents

Changing the definitions of execution host names for execution agents to match the host configuration at the remote site.

Y

Y

See (b) Defining execution agents

3

Checking and modifying unit statuses

Check the job statuses that were in effect when the main site went down, and change or rerun jobs whose status was not carried over to the remote site.

Y

Y

See (c) Checking job execution statuses

4

If the system contains either of the following remote jobnets, check its status and restart it if necessary:

  • A remote jobnet defined at the main site

  • A remote jobnet that specifies the logical host at the main site as its execution manager

Y

Y

See (d) Checking the status of remote jobnets

5

If you are using jobnet connectors to control the execution order of root jobnets under different scheduler services, check the status of jobnet connectors, connection-target jobnets, and connection-target planning groups, and restart them if necessary.

Y

Y

See (e) Switching sites when controlling the execution order of root jobnets in different scheduler services

6

Deleting information related to the main site retained by the agent hosts

If the main site and remote site use the same agent hosts, execute the jpomanevreset command to delete information about the main site from the agent hosts.

Y

C#2

See (f) Deleting information about the main site from agent hosts

7

Attaching the logical host to the queueless agent service

If you use queueless jobs, execute the ajsqlattach command to attach (connect) the logical host of the remote site to the queueless agent service.

Y

Y

See (g) Attaching the logical host to the queueless agent service

8

Changing the names of the hosts which queueless jobs will be executed

If you use queueless jobs, change the names of the hosts on which queueless jobs will be executed to match the host configuration at the remote site.

Y

Y

See (h) Changing the names of the hosts on which queueless jobs will be executed

Legend:

Y: The task is required.

C: The task is required depending on the condition.

#1

The task is required if the IP address of the connection-target logical host is different after switching operation.

#2

The task is required if an event job was being executed when the main site went down.

Perform the task if either of the following conditions exists:

  • When you check the job statuses in No. 3, the status of an event job is Unknown end status.

  • When you execute the jpoagtjobshow command on the agent host, the status of an event job on the logical host at the remote site is Now running.

(a) Changing the connection-target logical host

When switching operation, change the connection-target logical host for JP1/AJS3 - View and other products to the logical host at the remote site. Whether the change is required depends on whether the logical host names at the main and remote sites are the same.

If the logical host names at the main and remote sites are different:

Switching operation also changes the logical host name. Therefore, change the connection target for JP1/AJS3 - View and other products from the logical host on the main site to the logical host on the remote site.

You need to change the connection-target host for the following connections:

  • Connections from JP1/AJS3 - View

  • Connections from JP1/AJS3 - Definition Assistant

  • Execution of remote commands

  • Connection destinations for logging or jobs that monitor JP1 events

If the logical host names at the main and remote sites are the same:

No operation is required if the IP address of the logical host to be used is the same after switching operation.

If the IP address for the logical host is different, change the connection-target host shared by the main and remote sites (such as a shared agent) to the logical host at the remote site.

To do this, change the communication settings on the shared host, including name resolution by DNS and the hosts file. Note that you might need to restart the system depending on the communication settings.

You need to change the connection-target host for the following connections:

  • Connections from JP1/AJS3 - Manager (execution of remote commands)

  • Connections from JP1/AJS3 - Agent

  • Connections from JP1/AJS3 - View

  • Connections from JP1/AJS3 - Definition Assistant

(b) Defining execution agents

Change the host name definitions for execution agents to match the agent host configuration at the remote site. To check the definition of an execution agent, use the ajsagtprint or ajsagtshow command.

To change the definition of an execution agent, execute the ajsagtalt command. To add an execution agent, use the ajsagtadd.

For details about these commands, see the manual JP1/Automatic Job Management System 3 Command Reference.

(c) Checking job execution statuses

This step is required if you perform a disaster recovery start of JP1/AJS3 - Manager.

Use JP1/AJS3 - View or commands to check the job statuses that were in effect when the main site went down. If a job's status was not carried over to the remote site, change the status manually or rerun the job.

Pay particular attention to jobs that were running when the main site went down. The status of these jobs does not carry over to the remote site.

For details about the job statuses that apply when operation resumes at the remote site, see the description of the statuses of jobnets and jobs when a disaster recovery start is performed in 6.2.1(3)(a) Statuses when a JP1/AJS3 service on the manager host is restarted.

For details about how to check and change job execution statuses and how to rerun a job, see the JP1/Automatic Job Management System 3 Operator's Guide.

When monitoring of a start condition has stopped, generations in Wait for start cond. status that were started by the start condition are processed as follows depending on the status of the waiting generation:

  • A generation of a jobnet is in Wait for start cond. status despite the start conditions being satisfied because concurrent execution is disabled and another instance of the jobnet is already running

    The generation retains its Wait for start cond. status after operation is switched, and the running jobnet enters Interrupted status. At this point, the generation of the jobnet in Wait for start cond. status starts executing.

  • Start conditions are not satisfied or only partially satisfied

    The generation of the jobnet disappears after operation is switched.

(d) Checking the status of remote jobnets

If a remote jobnet is defined on the main site or specifies the logical host of the main site as its transfer destination host, use the following procedure to check the status of the jobnet and resume its operation if needed.

■ If the remote jobnet is defined on the main site

  1. From JP1/AJS3 - View, log in to JP1/AJS3 - Manager on the transfer destination host.

    Log in to the host defined as the execution manager for the remote jobnet.

  2. Search for the unit name that identifies a remote jobnet, and examine the status of the jobnets that correspond to remote jobnets in Now running status.

    If multiple instances of a remote jobnet with the same name are registered for execution, the search produces multiple results. In this case, you can identify the corresponding jobnet from its execution start time and its definition.

  3. Check the execution status of the jobnet that corresponds to the remote jobnet.

    Based on the result, take the appropriate action such as rerunning the remote jobnet.

    To run the jobnet again when the logical host names at the main and remote sites are different, specify the remote site manager host name for Target manager in the remote jobnet definition on the forwarding source host.

■ If the remote jobnet specifies the logical host of the main site as its transfer destination host

If you start JP1/AJS3 - Manager in disaster recovery mode

  1. From JP1/AJS3 - View, log in to JP1/AJS3 - Manager on the remote site.

  2. Search for the unit name that identifies a remote jobnet, and examine the status of the jobnets that correspond to remote jobnets in Now running status.

    If multiple instances of a remote jobnet with the same name are registered for execution, the search produces multiple results. In this case, you can identify the corresponding jobnet from its execution start time and its definition.

  3. Check the execution status of the jobnet that corresponds to the remote jobnet.

    Based on the result, take the appropriate action such as rerunning the remote jobnet.

    To run the jobnet again when the logical host names at the main and remote sites are different, specify the remote site manager host name for Target manager in the remote jobnet definition on the forwarding source host.

If you cold-start JP1/AJS3 - Manager

  1. From JP1/AJS3 - View, log in to the JP1/AJS3 - Manager where remote jobnets are defined.

  2. Forcibly terminate any remote jobnets that specify the logical host of the main site as their execution manager.

    This must be done manually because a remote jobnet has no way of detecting when the corresponding jobnet at the forwarding target host has finished executing.

  3. Rerun the remote jobnet as needed.

    To run the jobnet again when the logical host names at the main and remote sites are different, specify the remote site manager host name for Target manager in the remote jobnet definition on the forwarding source host.

(e) Switching sites when controlling the execution order of root jobnets in different scheduler services

If you use jobnet connectors to control the order of root jobnet execution in different scheduler services, check the statuses of the jobnet connectors and the jobnets or planning groups serving as their connection destinations, and resume their operation as necessary.

The procedure differs depending on whether the unit is scheduled to execute that day or on a later day. The procedure for each scenario is described below.

■ For units scheduled to execute that day

Check the status of the following units that specify the logical host at the main site as their connection host.

  • Jobnet connectors

  • Connection-destination jobnets

  • Connection-destination planning groups

Check the status of jobnets with jobnet connectors and their connection destination jobnets, and manually control the execution sequence as needed.

■ For units scheduled to execute on a later day

If the logical host names at the main and remote sites are different, change the connection host of the following units that specify the logical host at the main site as their connection host:

  • Jobnet connectors

  • Connection-destination jobnets

  • Connection-destination planning groups

To change the connection host of a unit:

  1. Identify the units whose connection host needs changing.

    Check the following units:

    • All root jobnets defined under a connection-destination root jobnet or planning group

    • Jobnet connectors

    You can view the connection-destination jobnets that are registered for execution and jobnet connectors by executing the following command for each scheduler service on the manager host at the remote site:

    ajsshow -v start-date -w end-date -i "%JJ %CH:%CF:%CN" -ER /
    The meaning of the format indicators is as follows:

    %JJ: Full path name of unit

    %CH: Name of connection-destination host

    %CF: Name of scheduler service at connection destination

    %CN: Full path name of connection-destination unit

    Units whose connection host needs changing are those for which another scheduler service appears as the full path name of connection-destination unit in the command output. Units that do not appear in the command output must be identified manually.

  2. Change the unit definitions.

    How you do this depends on the version of JP1/AJS3 - Manager on which execution order is being controlled, and the type of unit that serves as the connection destination. In this context, the JP1/AJS3 - Manager on which execution order is being controlled is the JP1/AJS3 - Manager where the connection-destination unit is registered. This is Host B in the example in the following figure:

    Figure 11‒16: JP1/AJS3 - Manager subject to execution order control

    [Figure]

    When the JP1/AJS3 - Manager subject to execution order control is 09-00 or later and the connection destination is a jobnet connector or root jobnet

    You can use the jobnet release function to replace the definition of any unit scheduled for a future date without canceling its registration.

    First, copy the root jobnet that includes the connection-destination jobnet and the jobnet connector, and then modify the copy by specifying the host name of the remote site as the connection host of the jobnet and jobnet connector.

    You can then change the jobnet definition by performing a release registration to replace the original jobnet definition with the one you modified.

    For details about the jobnet release function, see 7.3 Switching a jobnet definition while the jobnet is registered for execution.

    When the JP1/AJS2 - Manager subject to execution order control is 08-50 or earlier, and the connection destination is a planning group or a unit whose registration has been cancelled by a cold start or other process

    Cancel registration of the unit you want to modify. If the unit is a planning group, cancel all the jobnets in that planning group. Then, change the connection host of the unit to the host name of the remote site, and then register the unit again.

(f) Deleting information about the main site from agent hosts

If the main site and remote site use the same agent hosts, agent hosts that monitor jobs based on requests from the main site retain information about the main site even after operation switches to the remote site.

Execute the following command on the manager host at the remote site to clear the information about the main site from an agent host.

jpomanevreset -h logical-host-at-remote-site -F scheduler-service-name -dh logical-host-at-main-site -a agent-host-name

Make sure that the agent host specified in the -a option is running. If you specify an agent host that is not running, does not exist, or cannot communicate with the manager host, the command might take a long time to execute.

For details about the jpomanevreset command, see jpomanevreset in 3. Commands Used for Normal Operations in the JP1/Automatic Job Management System 3 Command Reference.

(g) Attaching the logical host to the queueless agent service

When using queueless jobs, execute the following command to attach (connect) the logical host of the remote site to the queueless agent service.

ajsqlattach -h logical-host-at-remote-site

(h) Changing the names of the hosts on which queueless jobs will be executed

Change the names of the execution hosts specified in Exec-agent to the names of the host on which the queueless jobs will be executed at the remote site.