Job Management Partner 1/Automatic Job Management System 3 Troubleshooting
- Organization of this subsection
- (1) Execute the data collection tool
- (2) Obtain the contents of the core file
- (3) Check the status of processes
- (4) Check the operation data
- (5) Collect information about the embedded database
(1) Execute the data collection tool
The following describes the procedure for executing the data collection tool and the data that the tool collects.
(a) Procedure for executing the data collection tool
Execute the data collection tool as shown below. For details about how to set up the data collection tool, see 15. Collecting Log Data in the Job Management Partner 1/Automatic Job Management System 3 Configuration Guide 1.
The following shows an example of executing the data collection tool:
# /home/jp1ajs2/trouble.shBy default, the results of executing the data collection tool are output to the following files under /tmp/jp1ajs2/trouble/. Back up these files.
- For physical hosts:
- JP1_DEFAULT_1st.tar.z
The data for the first reports is output.
- JP1_DEFAULT_2nd.tar.z
All the other data is output.
- For logical hosts:
- logical-host-name_1st.tar.z
The data for the first reports is output.
- logical-host-name_2nd.tar.z
All the other data is output.
When you use the data collection tool in a cluster system, you can collect data by specifying a logical host name. The data collection tool also provides options for limiting the amount of data that is collected. The following describes the syntax of the data collection tool:
- Format
_04 [-h logical-host-name] [-f storage-directory] [-s] [-t] [-u] [additional-file]
- Description
- The data collection tool obtains maintenance information, such as the JP1/AJS3 definitions, operating information, and information about the OS.
- Execution permission
- Superuser
- Arguments
- -h logical-host-name
- Specify the name of the target logical host.
- The data collection tool collects the data on the physical host in addition to the data on the specified logical host.
- If you do not specify a name, the data collection tool obtains the physical host logs.
- -f storage-directory
- Use a full path without space characters to specify the directory used to store the collected information. If the specified directory name contains a space character, the system assumes that the character string before the space is the storage directory name and treats the characters after the space as other arguments.
- When you use a relative path to specify a storage directory, the specified path is created under the root directory and the collected data is stored there.
- If you omit this option, the collected data is output to /tmp/jp1ajs2/trouble/.
- -s
- Specify this option if you do not want to collect information about the database used by JP1/AJS3.
- If you do not specify this option, the data collection tool collects information about the database.
- -t
- Specify this option if you do not want to obtain the hosts, services, and passwd files.
- -u
- Specify this option if you do not want to obtain the core file.
- Although you specify this option, the back trace information is collected.
- additional-file
- Use a full path without space characters to specify a file that is not usually obtained by using the data collection tool, such as the core file of the JP1/AJS3 commands. If the specified file name contains a space character, the system assumes that the character string before the space is an additional file name and treats the characters after the space as other arguments.
- Using this argument, you can collect information that is not automatically collected by the data collection tool.
- If the core file is specified as an additional file, the core file is collected even if you specify the -u option.
- You can specify a directory name for additional-file. If you specify a directory, all the data in the specified directory is collected.
- Cautionary notes
- If you collect data about a logical host in a cluster system, you must mount the shared disk for the logical host.
- The data collection tool compresses the collected data as follows for each OS.
For UNIX
The data collection tool uses the compress command to compress the data. If the compress command is not installed in the environment in which the data collection tool is executed, the tool uses the gzip command instead. If neither the compress command nor the gzip command is available, the data collection tool uses the tar command to archive and output the data. If the tar command is not installed, the data collection tool terminates abnormally, and processing ends.
- Because the user might not have reference permission for a file contained in the script, the superuser must execute the script.
- If you have already created a file containing the results of executing the script, JP1/AJS3 outputs a message asking for permission to overwrite the file. Enter y to overwrite the file. If you do not want to overwrite the file, enter n.
- If no core dump file is output, a message (Status of tar: core? is unknown. The file is not dumped.) appears. This is not a problem.
- If the target product is not installed or is being used by another process, or a file that cannot be accessed because of its file attribute is detected, a message reporting that there is no applicable directory or file or that the target file cannot be accessed might appear during the collection of data. This is not a problem.
- Because the ajs2collectcore command is executed internally while the data collection tool is being executed, some data cannot be collected, depending on the OS. For details, see ajs2collectcore (UNIX only) in 2. Commands in the manual Job Management Partner 1/Automatic Job Management System 3 Command Reference 1.
- Return values
0 Normal end Value other than 0 Abnormal end
- Messages
Message System action Meaning and user action Directory directory-name is created Continues processing. The indicated directory has been created. Overwrite file (file-name) ok? Waits for a response from the user. The system is asking whether it is all right to overwrite the indicated file.
To continue the processing, enter y. To cancel the processing, enter n.[CAUTION]
When a target program is not installed, or when file access fails because some other process is using the file or because a necessary file-access permission is lacking, a message might be output that states that file access failed or a directory or file does not exist. Such a message does not indicate a problem.Continues processing. A warning message. This message might be output during the collection of data that indicates that the file is being used or that the file does not exist. This is not a problem. Output file name :(file-name) Ends processing. The indicated file has been created. Write permission error (directory-name) Terminates the processing. This message reports that the user does not have write permission. Possible reasons are as follows:
Correct the error, and then re-execute the data collection tool.
- The user does not have permission to create directories.
- The directory is being used by another process.
Make directory (directory-name) is unsuccessful Terminates the processing. This message reports that the indicated directory could not be created. Possible reasons are as follows:
Correct the error, and then re-execute the data collection tool.
- The user does not have permission to create directories.
- The directory is being used by another process.
Read permission error(file-name) Terminates the processing. This message reports that the user does not have read permission. A possible reason is as follows:
Correct the error, and then re-execute the data collection tool.
- The user does not have permission to create directories.
File file-name is not found Terminates the processing. The directory or file specified as the additional file does not exist.
Specify a correct path, and then re-execute the data collection tool.[ -s ] [ -f output-file ] [ -h Logical-Host-Name ] [ -t ] [ -u ] [ add-in-file ... ] Terminates the processing. The option is specified incorrectly.
Specify the option correctly, and then re-execute the data collection tool.
- Example 1
- The following command collects data about a physical host:
_04
- Example 2
- The following command collects data about a logical host (cluster):
_04 -h cluster
- Example 3
- The following command outputs information, including the core file (/tmp/core), to a specified file (/tmp/trouble):
_04 -f /tmp/trouble /tmp/core(b) Data that can be collected by using the data collection tool
You can use the data collection tool (_04) to collect the following types of data.
- For physical hosts:
- The data for first reports (/tmp/jp1ajs2/trouble/JP1_DEFAULT_1st.tar.Z#1)
Name of directory or file containing collected data Description /etc/hosts hosts file /etc/passwd passwd file /etc/services services file /etc/.hitachi/pplistd/pplistd Information about installed Hitachi products
- /etc/opt/jp1ajs2/conf
- /etc/opt/jp1ajs2cm/conf
- /etc/opt/jp1base/conf
Directories containing the environment settings files /opt/HIRDB_J/spool/pdlckinf Embedded database deadlock timeout information file /opt/jp1/hcclibcnf/regdir Common definition
- /opt/jp1ajs2/PatchHistory
- /opt/jp1ajs2/PatchLog
- /opt/jp1ajs2v/PatchHistory
- /opt/jp1ajs2v/PatchLog
Patch information
- /var/adm/syslog/syslog.log (for HP-UX)
- /var/adm/messages (for Solaris)
- /var/adm/syslog (for AIX)
syslog and the directories containing syslog
- /opt/hitachi/HNTRLib/spool
- /var/opt/hitachi/HNTRLib2/spool
Integrated trace logs /var/opt/jp1ajs2/jobinf Directory containing the information about jobs /var/opt/jp1ajs2/log Directory containing log files /var/opt/jp1ajs2/log/_04.filelist List of files /var/opt/jp1ajs2/log/_04.osinfo OS-related information /var/opt/jp1ajs2/log/_04.processlist List of processes /var/opt/jp1ajs2/log/_04.backtrace Back trace information /var/opt/jp1ajs2/log/ajsagtshow.txt Execution result of the ajsagtshow command /var/opt/jp1ajs2/log/ajsagtprint.txt Execution result of the ajsagtprint command /var/opt/jp1ajs2/log/jajs_status.txt Execution result of the jajs_status command /var/opt/jp1ajs2/sys Directory containing the system files /var/opt/jp1ajs2/tmp/schedule/pd*.trc Trace information related to the embedded database
- /var/opt/jp1ajs2cm/log
- /var/opt/jp1ajs2v/log
- /var/opt/jp1base/log
Directories containing log files /tmp/jp1ajs2/trouble#1/EMBDB/_JF*#2/conf Embedded database definition file /tmp/jp1ajs2/trouble#1/EMBDB/_JF*#2/spool Embedded database failure investigation file /tmp/jp1ajs2/trouble#1/EMBDB/_JF*#2/etc Other information related to the embedded database that is needed for investigation
- #1
- Output destination when the -f option is omitted.
- #2
- _JF* indicates an embedded database identifier (_JF0, _JF1, _JF2, and so on). A directory is created for each identifier.
- The data for second reports (/tmp/jp1ajs2/trouble/JP1_DEFAULT_2nd.tar.Z#1)
Name of directory or file containing collected data Description
- /tmp/jp1ajs2/trouble#1/CAERDIR/coreinfo-ISAM.shmdump.tar.Z#2
- /tmp/jp1ajs2/trouble#1/CARDIR/coreinfo-Scheduler.shmdump.tar.Z#2
- /tmp/jp1ajs2/trouble#1/CARDIR/../../core.Z#2
- /tmp/jp1ajs2/trouble#1/CARDIR/../../coreinfo-analyze.tar.Z#2
- /tmp/jp1ajs2/trouble#1/CARDIR/ProgMon.shmdump
- /tmp/jp1ajs2/trouble#1/CARDIR/coreinfo-host.shmdump
Shared memory information used by ISAM and the scheduler, core dump files, and shared library information
- /var/opt/jp1ajs2/database
- /var/opt/jp1ajs2cm/database
- /tmp/jp1ajs2/trouble#1\embdatabase/_JF*#3
Database storage directories /additionally-collected-data#4 Additionally collected data
- #1
- Output destination when the -f option is omitted.
- #2
- This file is output to the directory containing the core dump file that has been obtained.
- #3
- _JF* indicates an embedded database identifier (_JF0, _JF1, _JF2, and so on). A directory is created for each identifier.
- #4
- This file is created when additional-file is specified as the argument.
- For logical hosts:
- The data for first reports (/tmp/jp1ajs2/trouble/logical-host-name_1st.tar.Z#1)
Name of directory or file containing collected data Description /shared-directory-name/jp1ajs2/backup Directory containing backup files /shared-directory-name/jp1ajs2/conf Directory containing environment settings files /shared-directory-name/jp1ajs2/jobinf Directory containing job information files /shared-directory-name/jp1ajs2/log Directory containing log files /shared-directory-name/jp1ajs2/sys Directory containing the system files /shared-directory-name/jp1ajs2/tmp Directory containing work files /shared-directory-name/jp1base/conf Directory containing the environment settings files /shared-directory-name/jp1base/log Directory containing log files /shared-directory-name/jp1ajs2/log/ajsagtshow.txt Execution result of the ajsagtshow command /shared-directory-name/jp1ajs2/log/ajsagtprint.txt Execution result of the ajsagtprint command /shared-directory-name/jp1ajs2/log/jajs_status.txt Execution result of the jajs_status command /tmp/jp1ajs2/trouble#1/EMBDB_logical-host-name/_JF*#2/conf Embedded database definition file /tmp/jp1ajs2/trouble#1/EMBDB_logical-host-name/_JF*#2/spool Embedded database failure investigation file /tmp/jp1ajs2/trouble#1/EMBDB_logical-host-name/_JF*#2/etc Other information related to the embedded database that is needed for investigation
- #1
- Output destination when the -f option is omitted.
- #2
- _JF* indicates an embedded database identifier (_JF0, _JF1, _JF2, and so on). A directory is created for each identifier.
- The data for second reports (/tmp/jp1ajs2/trouble/logical-host-name_2nd.tar.Z#1)
Name of directory or file containing collected data Description
- /tmp/jp1ajs2/trouble#1/CARDIR_logical-host-name/ProgMon.shmdump
- /tmp/jp1ajs2/trouble#1/CARDIR_logical-host-name/coreinfo-host.shmdump
Shared memory dumps
- /shared-directory-name/jp1ajs2/database
- /shared-directory-name/jp1ajs2cm/database
- /tmp/jp1ajs2/trouble#1/embdatabase_logical-host-name/_JF*#2
Database storage directories
- #1
- Output destination when the -f option is omitted.
- #2
- _JF* indicates an embedded database identifier (_JF0, _JF1, _JF2, and so on). A directory is created for each identifier.
(2) Obtain the contents of the core file
Obtain the contents of the core file if the file has been output.
The core file is output to one of the following directories:
- /opt/jp1ajs2/bin#1
- /var/opt/jp1ajs2/database#1
- /var/opt/jp1ajs2cm/database#1
- User home directory#2
- Current directory in which the command was executed
- #1
- The data collection tool can be used to collect data.
- #2
- If the core file was output after connection from JP1/AJS3 - View, this directory is the home directory of the mapped OS user.
If you want to collect only the information needed for analysis of the core file, use the ajs2collectcore command. For details about this command, see ajs2collectcore (UNIX only) in 2. Commands in the manual Job Management Partner 1/Automatic Job Management System 3 Command Reference 1.
(3) Check the status of processes
Use the ps command to check the operating status of processes.
For details about JP1/AJS3 processes, see B.3 Processes (for UNIX).
(4) Check the operation data
For the problem that has occurred, check the operation data and record it. You need to check the following information:
- Details about the operation
- Time that the problem occurred
- Machine configuration (the version of each OS, host name, JP1/AJS3 - Manager and JP1/AJS3 - Agent configuration, and JP1/AJS3 Console Manager and JP1/AJS3 Console Agent configuration)
You can check the machine configuration by executing a command. The following table lists the commands you can use to check the machine configuration for each OS.
Table 1-17 UNIX commands that can be used to check the machine configuration
OS Command for checking the OS version Command for checking the size of physical memory on the host Command for checking the process information and required memory size HP-UX /usr/bin/uname -a /usr/sbin/dmesg /usr/bin/ps -elf Solaris /usr/bin/uname -a /usr/sbin/prtconf /usr/bin/ps -elf AIX /usr/bin/uname -a /usr/sbin/bootinfo -r /usr/bin/ps -elf
- Note
- The options used in each command in the above table are typical options of the respective OSs. How the options are specified might vary depending on the environment being used. For details, see the documentation for the applicable OS.
- Whether the problem is reproducible
- Name of the users, if any, who logged in from JP1/AJS3 - View or JP1/AJS3 Console View.
(5) Collect information about the embedded database
You need the following information if an error occurs while you are using the embedded database:
- Data needed to investigate the cause
- Information needed to re-create the environment of the embedded database
The following describes how to collect the above information.
(a) Information required to investigate the cause
To investigate the cause of an error, you mainly need to collect information about the OS and information about the embedded database. Use OS commands to collect information about the OS. Use the embedded database commands to collect information about the embedded database.
If you use the support service to solve problems, you must submit the information listed in the table below. This table describes the type of information needed to investigate the cause of problems and how to collect information for each type of problem. Problems are classified into seven levels, of which level 1 has the highest priority.
Problems are grouped by type as follows:
- Performance
The following processing or operation takes too much time:
- Startup of the embedded database system (including normal startup, restart, and startup after action has been taken for a failure)
- Stopping of the embedded database system (including normal stopping and forced stopping)
- Execution of an operation command for the embedded database
- No response
A response is not returned when the following processing or operation is performed:
- Startup of the embedded database system (including normal startup, restart, and startup after action has been taken for a failure)
- Stopping of the embedded database system (including normal stopping and forced stopping)
- Execution of an operation command for the embedded database
- Abnormal end
One of the following has occurred:
- Abnormal termination of the embedded database system
- Abnormal termination of an embedded database process
- Abnormal termination of an operation command for the embedded database
Table 1-18 Information needed to investigate the cause of failures and how to collect that information
No. Component Information to be collected Collection method Performance No response Abnormal end 1 OS syslog Use an OS function (command). 1 1 1 2 CPU usage rate and device status Use an OS command, such as the sar command, to collect the information. For details about the commands, see the documentation for the OS. 3 4 3 3 CPU running status and memory status for processes Use an OS command, such as the top command, to collect the information. For details about the commands, see the documentation for the OS. 3 4 3 4 Virtual memory Use an OS command, such as the vmstat command, to collect the information. For details about the commands, see the documentation for the OS. 3 4 3 5 Network status Use an OS command, such as the netstat command, to collect the information. For details about the commands, see the documentation for the OS. 3 4 3 6 Embedded database Information about embedded database failures Obtain the files under the following directories and store them on a DAT or another storage device:
An error log file and a command log file are output to the above directories.
- embedded-database-practical-directory/spool
- embedded-database-practical-directory/tmp
2 2 2 7
- Error log file
- The error log is output to a file under embedded-database-practical-directory/spool/errlog.
2 2 2 8
- Command log file
- The command log is output to a file under embedded-database-practical-directory/spool/cmdlog.
2 2 2 9 Embedded database system definitions Obtain the files under embedded-database-practical-directory/conf and store them on a DAT or another storage device. 4 5 4 10 SQL trace file and error log file Obtain the output files and store them on a DAT or another storage device. A file name begins with pderr or pdsql. -- 6 5 11 System log file Use the ajsembdboplog command to unload the system log. Obtain the unload log file and store it on a DAT or another storage device. 6 7 6
- Legend:
- --: Information need not be collected.
- Note
- The size of a file increases at a fixed rate if additional data continues to be stored in the file as a result of redirection. Because such a file takes up disk space, instead create general-purpose shell scripts, which allow files to be switched and reused after several generations.
(b) Information needed to re-create the embedded database environment
If a problem occurs during operation of the embedded database, you might need to re-create the environment that produced the problem to test whether the problem is reproducible or to investigate the cause of the problem. To enable this procedure, collect the following information needed to re-create the embedded database environment.
- conf under the embedded database practical directory (if the user has changed the definition files)
- Environment variables related to the embedded database
- Data in the embedded database
Use the ajsembdbrorg command to collect the data in the embedded database.
To collect the information needed to re-create the embedded database environment:
- Start the embedded database.
- Execute the ajsembdbrorg command with the -k unld option specified.
- Save conf under the embedded database practical directory in a folder of your choice.
- Record the environment variables related to the embedded database.
For details about how to use the ajsembdbrorg command and a description of the command, see 10.2.2 Reorganizing a database in the Job Management Partner 1/Automatic Job Management System 3 Administration Guide.
Copyright (C) 2009, 2010, Hitachi, Ltd.
Copyright (C) 2009, 2010, Hitachi Solutions, Ltd.