2.6.10 Troubleshooting a situation in which the statuses of jobs become unknown on the manager host
The statuses of jobs that were run on an agent host might become unknown on the manager host, for example, in the following cases: when the JP1/AJS3 service is started by warm start or disaster recovery start, or when a communication failure occurs between the manager host and agent host.
In such cases, you can perform job recovery measures by checking the following files that have been output on the agent host:
-
Job execution result log file
-
Event job execution result log file
- Note
-
For JP1/AJS3 for which an upgrade installation was performed from an earlier version and that is used under the default settings, the job execution result log file and event job execution result log file are not output. If you want these files to be output, set 1 or 2 for the JOBEXECRESULTLOG and EVJOBEXECRESULTLOG environment setting parameters. For details, see 3.4.9 Estimating the size of job execution result log files and event job execution result log files in the JP1/Automatic Job Management System 3 System Design (Configuration) Guide.
The following table shows the correspondence between job types and the log files to which log data is to be output.
No. |
Job type (unit) |
Log file to which log data is to be output |
---|---|---|
1 |
PC Job# |
Job execution result log file |
2 |
Unix Job# |
|
3 |
Action job# |
|
4 |
Submit job |
|
5 |
QUEUE Job |
|
6 |
Custom Job |
|
7 |
HTTP connection job |
|
8 |
Passing information setting job |
|
9 |
Flexible Job |
|
10 |
Event job |
Event job execution result log file |
11 |
OR Job |
No log data is output. |
12 |
Judgment Job |
|
13 |
Jobnet Connector |
|
14 |
Remote Jobnet |
- Organization of this subsection
(1) Recovery procedures for jobs other than event jobs
For jobs other than event jobs, you can perform recovery measures by following one of two procedures: a procedure using the ajsshow command, or a procedure using the scheduler log. Select the appropriate procedure according to the status of the manager host. The following table shows the statuses of the manager host and the corresponding recovery procedures.
Status of the manager host |
Recovery procedure |
---|---|
The embedded database can be used (job execution information remains) |
Perform recovery by using the ajsshow command. |
Only the scheduler log can be used |
Perform recovery by using the scheduler log. |
The manager host has been set up again (neither the embedded database nor the scheduler log can be used). |
Recovery is not possible. |
(a) Recovery procedure using the ajsshow command
-
Run the ajsshow command, and then check the execution ID, job number, and agent host name of the target job.
Example:
ajsshow -f "%#,%I %H" /JOBNETNAME/JOBNAME
-
Search the job execution result log file by using as search conditions the full unit name of the target job in combination with the execution ID and job number that you checked in step 1.
Example:
Find the following: AJSROOT1:/JOBNETNAME/JOBNAME:@111,200001
-
Check whether messages KAVU3613-I, KAVU3614-I (in Windows only), KAVU3615-I (in UNIX only), and KAVU3616-I have been output to determine the status of the job on the agent host.
For details about the relationship between the messages that were output and the job status, see (3) Relationship between the messages that were output and the job status.
-
Perform recovery for the job according to the job status on the agent host.
Rerun the job or change the status of the job.
(b) Recovery procedure using the scheduler log
-
Search the scheduler log file, and then check the execution ID, job number, and agent host name of the target job.
-
Search the job execution result log file by using as search conditions the full unit name of the target job in combination with the execution ID and job number that you checked in step 1.
Example:
Find the following: AJSROOT1:/JOBNETNAME/JOBNAME:@111,200001
-
Check whether messages KAVU3613-I, KAVU3614-I (in Windows only), KAVU3615-I (in UNIX only), and KAVU3616-I have been output to determine the status of the job on the agent host.
For details about the relationship between the messages that were output and the job status, see (3) Relationship between the messages that were output and the job status.
-
Perform recovery for the job according to the job status on the agent host.
Rerun the job or change the status of the job.
(2) Recovery procedures for event jobs
Recovery for an event job can be performed only if the embedded database on the manager host can be used and the execution information of the job remains. In other cases, recovery is not possible.
The recovery procedure for an event job is as follows:
-
On the agent host, check the occurrence status (occurrence time) of the event you want to monitor.
For example, in the case of the JP1 event reception monitoring job, check the event database.
-
In the event job execution result log file, search for a start-event log record for which the KAVT0966-I message was output around the time that the monitoring-target event in step 1 occurred. Then, check the time at which the message was output and the unit ID.
-
On the manager host, run the ajsname command, and then obtain the full unit name from the unit ID.
Confirm that the full unit name that you obtained is the job in step 1.
-
Compare the time that the event in step 1 occurred and the time that the message in step 2 was output.
If the time that the event in step 1 occurred is earlier than the time that the message in step 2 was output, the event occurred before the monitoring of the event started. Manually run the succeeding job, if necessary.
(3) Relationship between the messages that were output and the job status
You can determine the status of a job from the messages that have been output to the job execution result log file. The status determined differs depending on whether the job is a flexible job.
(a) For jobs that are not flexible jobs
The following table shows the relationship between the messages that were output and the job status for a job that is not a flexible job.
No. |
KAVU3613-I (at reception) |
KAVU3614-I or KAVU3615-I (at the start)# |
KAVU3616-I (at the end) |
Status of the job |
Status of the executable file or script file |
Explanation |
---|---|---|---|---|---|---|
1 |
Not output |
Not output |
Not output |
Not run yet |
Not started yet |
This is the status before the job is accepted from the manager host. |
2 |
Output |
Not output |
Not output |
Not run yet |
Not started yet |
This is the status in which the job has been accepted from the manager but has not been run yet. |
3 |
Output |
Output |
Not output |
Now running |
Now running |
This is the status in which the job is running. |
4 |
Output |
Not output |
Output |
Failed to start |
Not started yet |
|
5 |
Output |
Output |
Output |
Execution ended |
Ended |
This is the status in which the job ran successfully and then ended. |
(b) For flexible jobs
For a flexible job, the processing status on the relay agent is output to the job execution result log file. To check the status of the executable file or script file, you need to check the job-destination log file (ajsfxexec{1|2}.log) on the destination agent.
In a configuration that does not use relay agents, the manager host acts as a relay agent. In this configuration, messages are output to the job execution result log file on the manager host.
The following table shows the relationship between the messages that were output and the job status for a flexible job.
No. |
KAVU3613-I (at reception) |
KAVU3614-I or KAVU3615-I (at the start)#1 |
KAVU3616-I (at the end) |
Status of the flexible job |
Status of the executable file or script file |
Explanation |
---|---|---|---|---|---|---|
1 |
Not output |
Not output |
Not output |
Not run yet |
Not started yet |
This is the status before the flexible job is accepted from the manager host. |
2 |
Output |
Not output |
Not output |
Not run yet |
--#2 |
This is the status in which the flexible job has been accepted from the manager but has not been run yet. |
3 |
Output |
Output |
Not output |
Now running |
--#2 |
This is the status in which the flexible job was started on a relay agent. To check whether the executable file was run, check the log file (ajsfxexec{1|2}.log) on the destination agent. |
4 |
Output |
Not output |
Output |
Failed to start |
Not started yet |
This is the status in which start of the flexible job failed. Determine the cause of the failure from the error messages that have been output to the integrated trace log file on the agent host. |
5 |
Output |
Output |
Output |
Execution ended |
--#2 |
This is the status in which the flexible job ended on a relay agent. To check whether the executable file was run, check the log file (ajsfxexec{1|2}.log) on the destination agent. |
(4) How to check the status on the destination agent
The following describes how to check the status of an executable file or script file on the destination agent when a flexible job is used.
(a) If broadcast execution is not used
-
Check the full unit name and execution ID of the target flexible job from a message that has been output to the job execution result log file on the host that requested the flexible job (relay agent).
-
In the flexible-job-requester log file (ajsfxreq{1|2}.log) on the host that requested the flexible job (relay agent), check whether a message that meets the following condition has been output:
-
The full unit name and execution ID that are output in the KAVS8115-I message are the same as the full unit name and execution ID that you checked in step 1.
If a message that meets this condition has been output, check the host name of the destination agent and the maintenance information (uuid) that are output in that message.
-
-
In the flexible-job-destination log file (ajsfxexec{1|2}.log) that is on the host on which the flexible job was run (destination agent) and that you checked in step 2, check whether a message that meets the following conditions has been output:
-
The maintenance information that is output in the KAVS8139-I message is the same as the maintenance information (uuid) that you checked in step 2.
-
The maintenance information that is output in the KAVS8140-I message is the same as the maintenance information (uuid) that you checked in step 2.
Determine the status based on whether the KAVS8139-I and KAVS8140-I messages that meet these conditions have been output:
-
If both messages exist, the user program has ended.
-
If only the KAVS8139-I message exists, the user program is currently running.
-
If neither of the messages exists, the user program has not started.
-
(b) If broadcast execution is used
-
Check the full unit name and execution ID of the target flexible job from a message that has been output to the job execution result log file on the host that requested the flexible job (relay agent).
-
In the flexible-job-requester log file (ajsfxreq{1|2}.log) on the host that requested the flexible job (relay agent), check whether a message that meets the following condition has been output:
-
The full unit name and execution ID that are output in the KAVS8137-I message are the same as the full unit name and execution ID that you checked in step 1.
If a message that meets this condition has been output, check the maintenance information (uuid) that is output in that message.
-
-
In the log file (ajsfxdstr{1|2}.log) on the broadcast agent, check whether a message that meets the following condition has been output:
-
The full unit name and execution ID that are output in the KAVS8148-I message are the same as the full unit name and execution ID that you checked in step 1.
If a message that meets this condition has been output, run the ajsfxbcstatus command on the broadcast agent, and then check the destination agent.
-
-
Perform this step for each destination agent. Check whether a message that meets the following condition has been output in the flexible-job-destination log file (ajsfxexec{1|2}.log) on the destination agent:
-
The maintenance information that is output in the KAVS8139-I message is the same as the maintenance information (uuid) that you checked in step 2.
-
The maintenance information that is output in the KAVS8140-I message is the same as the maintenance information (uuid) that you checked in step 2.
Determine the status based on whether the KAVS8139-I and KAVS8140-I messages that meet these conditions have been output:
-
If both messages exist, the user program has ended.
-
If only the KAVS8139-I message exists, the user program is currently running.
-
If neither of the messages exists, the user program has not started.
-