Job Management Partner 1/Automatic Job Management System 3 Troubleshooting
This section describes how to troubleshoot the problems related to setup, service startup, and JP1/AJS3 operation.
- Organization of this section
- (1) JP1/AJS3 setup does not terminate normally
- (2) A JP1/AJS3 service has not started
- (3) A JP1/AJS3 service takes too much time to start
- (4) JP1/AJS3 does not function normally
(1) JP1/AJS3 setup does not terminate normally
Possible causes are as follows:
- If the KAVU5921-E message (Environment settings or the logical host name is invalid.) is output:
JP1/Base might not have been set up, or a logical host name specified during setup for cluster operation might be invalid.
Check the setup procedure and perform it again. During setup for cluster operation, make sure that you specify both the -mh option and a logical host name in the jpqimport command.
- If the KAVU5950-E message (The same identifier or object name is already specified. (line:line-number)) is output:
An agent definition ($agent), queue definition ($queue), or exclusive execution resource definition ($res) in the configuration definition file for the execution environment (jpqsetup.conf) for QUEUE jobs and submit jobs might be invalid.
Check the definitions in the configuration definition file for the execution environment for QUEUE jobs and submit jobs. Correct any definitions that need to be corrected, and set up JP1/AJS3 again.
The storage location of the configuration definition file for the execution environment for QUEUE jobs and submit jobs is as follows:
- In Windows:
- JP1/AJS3-installation-folder\conf\jpqsetup.conf
Make sure that the definitions in the configuration definition file for the execution environment for QUEUE jobs and submit jobs meet the following conditions:
- In UNIX:
- /etc/opt/jp1ajs2/conf/jpqsetup.conf
- A duplicate ID is not defined in $agent $an (n is an agent ID).
- The same ID is not shared by def_queue $qn (n is a default queue ID) and $queue $qn (n is a queue ID).
- A duplicate ID is not defined in $queue $qn (n is a queue ID).
- A duplicate ID is not defined in $res $rn (n is an exclusive execution resource ID).
- A duplicate agent name is not defined.
- A duplicate queue name is not defined.
- A duplicate exclusive execution resource name is not defined.
For details about the definitions in the configuration definition file for the execution environment for QUEUE jobs and submit jobs (jpqsetup.conf), see jpqimport in 3. Commands Used for Special Operation in the manual Job Management Partner 1/Automatic Job Management System 3 Command Reference 2.
(2) A JP1/AJS3 service has not started
Possible causes are as follows:
- If the KAVU5285-E message (There is no the database table, or it is short of the system resources. (reason-location)) is output to the integrated trace log:
If you are using QUEUE jobs or submit jobs, the job execution environment database for QUEUE jobs and submit jobs might not have been created correctly. Use the jpqimport command to create or re-create the job execution environment database for QUEUE jobs and submit jobs. For details about how to create or re-create the database, see 2.12(2) Procedure for re-creating the execution environment database for QUEUE jobs and submit jobs.
- If the KAVU5284-E message (It is short of the system resources. (reason-location)) is output to the integrated trace log:
System resources, such as semaphores, required for JP1/AJS3 operation might not be sufficient.
Check the estimate for system resources, make sure that system resources are sufficient, and then restart JP1/AJS3.
- If you start a JP1/AJS3 service when memory is insufficient, the KAVU1203-E message (The agent process could not be started. (Reason code: 12)) or the KAVU1204-E message (The manager process could not be started. (Reason code: 12)) might be output to the integrated trace log. If either message is output, reconsider the memory estimate. If any unnecessary applications are running, stop them and restart the JP1/AJS3 service.
- When you start a JP1/AJS3 service, the KAVU1203-E message (The agent process could not be started. (Reason code: 0xffffffff)) or the KAVU1204-E message (The manager process could not be started. (Reason code: 0xffffffff)) might be output to the integrated trace log. If either message is output, initialization of the JP1/AJS3 service might have failed. Check the message that is output immediately before this message in the integrated trace log, eliminate the cause of the error , and then restart the JP1/AJS3 service.
- If you restart a JP1/AJS3 service that has terminated abnormally, the KAVU1103-I message (Process monitor (logical-host-name) is already running on the same host.) or the KAVU4111-E message (Job queuing control (logical-host-name) or jpqimport command is already running on the same host.) might be output to the integrated trace log. In this case, when the JP1/AJS3 service terminated abnormally, some of the JP1/AJS3 processes might have remained because they could not be stopped. Accordingly, perform the following procedure to forcibly terminate JP1/AJS3 processes and then restart the JP1/AJS3 service.
- In Windows:
- Use the jajs_spmd_status command to check the status of JP1/AJS3 processes. If the submitqueue, queuea, or queuem process has not stopped, restart the system.
- In UNIX:
- Use the jajs_spmd_status command to check the status of JP1/AJS3 processes. If the jpqmon process has not stopped, execute the following command to kill the jpqagt process.
- # ps -ef | grep jpqagt
- # kill -KILL jpqagt-process-ID-output-by-previous-command
- Also use the jajs_spmd_status command to check the status of JP1/AJS3 processes.
- If the jpqman_hst or jpqman process has not stopped, execute the following command to kill the jpqman_hst or jpqman process.
- # kill -KILL jpqman_hst-process-ID-or-jpqman-process-ID-output-by- jajs_spmd_status-command
(3) A JP1/AJS3 service takes too much time to start
When JP1/AJS3 starts, it requests the authentication server to perform initialization. Even if the authentication server is not running, JP1/AJS3 can still start, but startup takes time.
To prevent a slow startup, before you start JP1/AJS3, start the authentication server.
(4) JP1/AJS3 does not function normally
Check for the following:
- JP1/AJS3 is not in a status in which programs can stop, such as the standby, resume, and suspended status.
- If you have changed the system time, make sure that you did so by using the procedure described in 8.9.3 Changing the date and time of the system in the Job Management Partner 1/Automatic Job Management System 3 Administration Guide.
Copyright (C) 2009, 2010, Hitachi, Ltd.
Copyright (C) 2009, 2010, Hitachi Solutions, Ltd.