Hitachi

JP1 Version 12 JP1/Automatic Job Management System 3 Troubleshooting


2.6.10 Troubleshooting a situation in which the statuses of jobs become unknown on the manager host

The statuses of jobs that were run on an agent host might become unknown on the manager host, for example, in the following cases: when the JP1/AJS3 service is started by warm start or disaster recovery start, or when a communication failure occurs between the manager host and agent host.

In such cases, you can perform job recovery measures by checking the following files that have been output on the agent host:

Note

For JP1/AJS3 for which an upgrade installation was performed from an earlier version and that is used under the default settings, the job execution result log file and event job execution result log file are not output. If you want these files to be output, set 1 or 2 for the JOBEXECRESULTLOG and EVJOBEXECRESULTLOG environment setting parameters. For details, see 3.4.9 Estimating the size of job execution result log files and event job execution result log files in the JP1/Automatic Job Management System 3 System Design (Configuration) Guide.

The following table shows the correspondence between job types and the log files to which log data is to be output.

Table 2‒3: Job types and the log files to which log data is to be output

No.

Job type (unit)

Log file to which log data is to be output

1

PC Job#

Job execution result log file

2

Unix Job#

3

Action job#

4

Submit job

5

QUEUE Job

6

Custom Job

7

HTTP connection job

8

Passing information setting job

9

Flexible Job

10

Event job

Event job execution result log file

11

OR Job

No log data is output.

12

Judgment Job

13

Jobnet Connector

14

Remote Jobnet

#

No log data is output if the job is run by a queueless service.

Organization of this subsection

(1) Recovery procedures for jobs other than event jobs

For jobs other than event jobs, you can perform recovery measures by following one of two procedures: a procedure using the ajsshow command, or a procedure using the scheduler log. Select the appropriate procedure according to the status of the manager host. The following table shows the statuses of the manager host and the corresponding recovery procedures.

Table 2‒4: Statuses of the manager host and the corresponding recovery procedures (for jobs other than event jobs)

Status of the manager host

Recovery procedure

The embedded database can be used (job execution information remains)

Perform recovery by using the ajsshow command.

Only the scheduler log can be used

Perform recovery by using the scheduler log.

The manager host has been set up again (neither the embedded database nor the scheduler log can be used).

Recovery is not possible.

(a) Recovery procedure using the ajsshow command

  1. Run the ajsshow command, and then check the execution ID, job number, and agent host name of the target job.

    Example:

    ajsshow -f "%#,%I %H" /JOBNETNAME/JOBNAME
  2. Search the job execution result log file by using as search conditions the full unit name of the target job in combination with the execution ID and job number that you checked in step 1.

    Example:

    Find the following: AJSROOT1:/JOBNETNAME/JOBNAME:@111,200001

  3. Check whether messages KAVU3613-I, KAVU3614-I (in Windows only), KAVU3615-I (in UNIX only), and KAVU3616-I have been output to determine the status of the job on the agent host.

    For details about the relationship between the messages that were output and the job status, see (3) Relationship between the messages that were output and the job status.

  4. Perform recovery for the job according to the job status on the agent host.

    Rerun the job or change the status of the job.

(b) Recovery procedure using the scheduler log

  1. Search the scheduler log file, and then check the execution ID, job number, and agent host name of the target job.

  2. Search the job execution result log file by using as search conditions the full unit name of the target job in combination with the execution ID and job number that you checked in step 1.

    Example:

    Find the following: AJSROOT1:/JOBNETNAME/JOBNAME:@111,200001

  3. Check whether messages KAVU3613-I, KAVU3614-I (in Windows only), KAVU3615-I (in UNIX only), and KAVU3616-I have been output to determine the status of the job on the agent host.

    For details about the relationship between the messages that were output and the job status, see (3) Relationship between the messages that were output and the job status.

  4. Perform recovery for the job according to the job status on the agent host.

    Rerun the job or change the status of the job.

(2) Recovery procedures for event jobs

Recovery for an event job can be performed only if the embedded database on the manager host can be used and the execution information of the job remains. In other cases, recovery is not possible.

The recovery procedure for an event job is as follows:

  1. On the agent host, check the occurrence status (occurrence time) of the event you want to monitor.

    For example, in the case of the JP1 event reception monitoring job, check the event database.

  2. In the event job execution result log file, search for a start-event log record for which the KAVT0966-I message was output around the time that the monitoring-target event in step 1 occurred. Then, check the time at which the message was output and the unit ID.

  3. On the manager host, run the ajsname command, and then obtain the full unit name from the unit ID.

    Confirm that the full unit name that you obtained is the job in step 1.

  4. Compare the time that the event in step 1 occurred and the time that the message in step 2 was output.

    If the time that the event in step 1 occurred is earlier than the time that the message in step 2 was output, the event occurred before the monitoring of the event started. Manually run the succeeding job, if necessary.

(3) Relationship between the messages that were output and the job status

You can determine the status of a job from the messages that have been output to the job execution result log file. The status determined differs depending on whether the job is a flexible job.

(a) For jobs that are not flexible jobs

The following table shows the relationship between the messages that were output and the job status for a job that is not a flexible job.

Table 2‒5: Relationship between the messages that were output and the job status (for a job that is not a flexible job)

No.

KAVU3613-I

(at reception)

KAVU3614-I

or

KAVU3615-I

(at the start)#

KAVU3616-I

(at the end)

Status of the job

Status of the executable file or script file

Explanation

1

Not output

Not output

Not output

Not run yet

Not started yet

This is the status before the job is accepted from the manager host.

2

Output

Not output

Not output

Not run yet

Not started yet

This is the status in which the job has been accepted from the manager but has not been run yet.

3

Output

Output

Not output

Now running

Now running

This is the status in which the job is running.

4

Output

Not output

Output

Failed to start

Not started yet

In Windows:

This is the status in which the start of the executable file failed (for example, the executable file did not exist).

Determine the cause of the failure from the error messages that have been output to the integrated trace log file on the agent host.

In UNIX:

This is the status in which the start of the script file failed.

Determine the cause of the failure from the error messages that have been output to the integrated trace log file on the agent host.

5

Output

Output

Output

Execution ended

Ended

This is the status in which the job ran successfully and then ended.

#

In Windows, the KAVU3614-I message is output. In UNIX, the KAVU3615-I message is output.

(b) For flexible jobs

For a flexible job, the processing status on the relay agent is output to the job execution result log file. To check the status of the executable file or script file, you need to check the job-destination log file (ajsfxexec{1|2}.log) on the destination agent.

In a configuration that does not use relay agents, the manager host acts as a relay agent. In this configuration, messages are output to the job execution result log file on the manager host.

The following table shows the relationship between the messages that were output and the job status for a flexible job.

Table 2‒6: Relationship between the messages that were output and the job status (for a flexible job)

No.

KAVU3613-I

(at reception)

KAVU3614-I

or

KAVU3615-I

(at the start)#1

KAVU3616-I

(at the end)

Status of the flexible job

Status of the executable file or script file

Explanation

1

Not output

Not output

Not output

Not run yet

Not started yet

This is the status before the flexible job is accepted from the manager host.

2

Output

Not output

Not output

Not run yet

--#2

This is the status in which the flexible job has been accepted from the manager but has not been run yet.

3

Output

Output

Not output

Now running

--#2

This is the status in which the flexible job was started on a relay agent. To check whether the executable file was run, check the log file (ajsfxexec{1|2}.log) on the destination agent.

4

Output

Not output

Output

Failed to start

Not started yet

This is the status in which start of the flexible job failed. Determine the cause of the failure from the error messages that have been output to the integrated trace log file on the agent host.

5

Output

Output

Output

Execution ended

--#2

This is the status in which the flexible job ended on a relay agent. To check whether the executable file was run, check the log file (ajsfxexec{1|2}.log) on the destination agent.

#1

In Windows, the KAVU3614-I message is output. In Linux, the KAVU3615-I message is output.

#2

For details about how to check the status of an executable file or script file, see (4) How to check the status on the destination agent.

(4) How to check the status on the destination agent

The following describes how to check the status of an executable file or script file on the destination agent when a flexible job is used.

(a) If broadcast execution is not used

  1. Check the full unit name and execution ID of the target flexible job from a message that has been output to the job execution result log file on the host that requested the flexible job (relay agent).

  2. In the flexible-job-requester log file (ajsfxreq{1|2}.log) on the host that requested the flexible job (relay agent), check whether a message that meets the following condition has been output:

    • The full unit name and execution ID that are output in the KAVS8115-I message are the same as the full unit name and execution ID that you checked in step 1.

    If a message that meets this condition has been output, check the host name of the destination agent and the maintenance information (uuid) that are output in that message.

  3. In the flexible-job-destination log file (ajsfxexec{1|2}.log) that is on the host on which the flexible job was run (destination agent) and that you checked in step 2, check whether a message that meets the following conditions has been output:

    • The maintenance information that is output in the KAVS8139-I message is the same as the maintenance information (uuid) that you checked in step 2.

    • The maintenance information that is output in the KAVS8140-I message is the same as the maintenance information (uuid) that you checked in step 2.

    Determine the status based on whether the KAVS8139-I and KAVS8140-I messages that meet these conditions have been output:

    • If both messages exist, the user program has ended.

    • If only the KAVS8139-I message exists, the user program is currently running.

    • If neither of the messages exists, the user program has not started.

(b) If broadcast execution is used

  1. Check the full unit name and execution ID of the target flexible job from a message that has been output to the job execution result log file on the host that requested the flexible job (relay agent).

  2. In the flexible-job-requester log file (ajsfxreq{1|2}.log) on the host that requested the flexible job (relay agent), check whether a message that meets the following condition has been output:

    • The full unit name and execution ID that are output in the KAVS8137-I message are the same as the full unit name and execution ID that you checked in step 1.

    If a message that meets this condition has been output, check the maintenance information (uuid) that is output in that message.

  3. In the log file (ajsfxdstr{1|2}.log) on the broadcast agent, check whether a message that meets the following condition has been output:

    • The full unit name and execution ID that are output in the KAVS8148-I message are the same as the full unit name and execution ID that you checked in step 1.

    If a message that meets this condition has been output, run the ajsfxbcstatus command on the broadcast agent, and then check the destination agent.

  4. Perform this step for each destination agent. Check whether a message that meets the following condition has been output in the flexible-job-destination log file (ajsfxexec{1|2}.log) on the destination agent:

    • The maintenance information that is output in the KAVS8139-I message is the same as the maintenance information (uuid) that you checked in step 2.

    • The maintenance information that is output in the KAVS8140-I message is the same as the maintenance information (uuid) that you checked in step 2.

    Determine the status based on whether the KAVS8139-I and KAVS8140-I messages that meet these conditions have been output:

    • If both messages exist, the user program has ended.

    • If only the KAVS8139-I message exists, the user program is currently running.

    • If neither of the messages exists, the user program has not started.