Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


7.3.1 Checking the execution log for the monitoring command (when the ptrlcmd_ex or sby_ptrlcmd_ex operand is specified on a monitor-mode server)

The start result, termination result, and execution status of the monitoring command for a server are output to the execution log for the monitoring command. The execution status includes skipped execution when the previously executed monitoring command has not returned the result yet.

You can examine the execution results and statuses of the monitoring command for a server by checking the log for the server. While the system is operating, the contents of the monitoring command execution log are always updated.

If a server failure is detected by the monitoring command for a server, HA Monitor makes a backup file (.bak) of the log. If the .bak file already exists, the extension is changed to .bak2, and the log is backed up to a new .bak file. Three generations of log data are kept in this manner.

Important

If you need to check a server log while the server is operating, only view the contents of the log, and do not save the log. If you save the log, the log data being output while you were viewing the log might be lost.

The following describes the execution log.

Organization of this subsection

(1) Types of files

The following table lists execution log files. Each file name includes a server alias name, which is obtained from the value of the alias operand in the server environment definition (servers).

Table 7‒1: List of execution log files

File name

Description

/opt/hitachi/HAmon/spool/ptrlcmd_ex/server-alias-name_ptrlcmdlog

This is the file of the current execution log for the monitoring command.

/opt/hitachi/HAmon/spool/ptrlcmd_ex/server-alias-name_ptrlcmdlog.bak

This is an execution log backup file created when the most recent server failure occurred.

/opt/hitachi/HAmon/spool/ptrlcmd_ex/server-alias-name_ptrlcmdlog.bak2

This is an execution log backup file created when the second most recent server failure occurred.

(2) File contents

The following describes the contents of each execution log file.

The header section of each file consists of a comment on the file and the offset management information (offset), which is used for wrapping around the file.

File's header section
#    HA monitor ptrlcmd_ex Logging File
#
#      offset:131       
# DATE     TIME            [PID] [PPID] [PGID] [UID] DATA
File's data section

● Entry information

date time [process-ID] [parent-process-ID] [process-group-ID] [UID] [server-identification-name] operand-name command-line-information + entry-identifier

date: The date on which the command information was logged is output in YYYY/MM/DD format.

time: The time at which the command information was logged is output in HH:MM:SS.mmmmmm format. mmmmmm is the number of microseconds.

process-ID: The process ID of the monitoring command is output.

parent-process-ID: The parent process ID of the monitoring command is output.

process-group-ID: The process group ID of the monitoring command is output.

UID: The actual UID of the user who executed the monitoring command is output. If the actual UID cannot be obtained, 0 is output.

server-identification-name: The identification name of the server on which the monitoring command is executed is output.

operand-name: The name of the operand for which the monitoring command to be executed is specified is output.

command-line + entry-identifier: The command line information about the monitoring command and start, which indicates an entry point, are output. The command line part is enclosed in double quotation marks.

● Execution skip information

date time [process-ID] [parent-process-ID] [process-group-ID] [UID] [server-identification-name] operand-name command-line-information + entry-identifier

date: Same as date under Entry point information.

time: Same as time under Entry point information.

process-ID: The process ID of the process that requested output of the information is output.

parent-process-ID: The parent process ID of the process that requested output of the information is output.

process-group-ID: The process group ID of the process that requested output of the information is output.

UID: 0 is output.

server-identification-name: Same as server-identification-name under Entry point information.

operand-name: Same as operand-name under Entry point information.

command-line + entry-point-identifier: skip, which indicates that the previously executed monitoring command had not returned the result yet, and the process ID of that monitoring command (in pid: process-ID format) are output.

● Exit information

date time [process-ID] [parent-process-ID] [process-group-ID] [UID] [server-identification-name] operand-name command-line-information + entry-identifier termination-information

date: Same as date under Entry point information.

time: Same as time under Entry point information.

process-ID: Same as process-ID under Entry point information.

parent-process-ID: Same as parent-process-ID under Entry point information.

process-group-ID: 0 is output.

UID: 0 is output.

server-identification-name: Same as server-identification-name under Entry point information.

operand-name: Same as operand-name under Entry point information.

command-line + entry-identifier: The command line information about the user command and end, which indicates an exit point, are output. The command line part is enclosed in double quotation marks.

termination-information: The termination status of the monitoring command is output. Note that in a case when the command terminated abnormally, the termination status is followed by the string no exit.

(3) Examples of information output to a file

The following shows examples of information that is output to an execution log file.

● Example of information output when the server was operating normally

#    HA monitor ptrlcmd_ex Logging File
#
#      offset:53614     
# DATE     TIME            [PID] [PPID] [PGID] [UID] DATA
...
...
2017/04/01 19:46:15.794243 [9725] [8461] [9725] [0] sv1 ptrlcmd_ex "/bin/sh -c /opt/hitachi/HAmon/etc/patrol_ex.sh 1>./spool/ptrlcmd_ex/sv1.log 2>&1" : start
2017/04/01 19:46:18.324938 [9725] [8461] [0] [0] sv1 ptrlcmd_ex "/opt/hitachi/HAmon/etc/patrol_ex.sh" : end 0
2017/04/01 19:46:23.752926 [9871] [9866] [9871] [0] sv1 ptrlcmd_ex "/bin/sh -c /opt/hitachi/HAmon/etc/patrol_ex.sh 1>./spool/ptrlcmd_ex/sv1.log 2>&1" : start
2017/03/14 19:46:28.253276 [9871] [9866] [0] [0] sv1 ptrlcmd_ex "/opt/hitachi/HAmon/etc/patrol_ex.sh" : end 0
...
...
#####

Output of the string indicating that the monitoring command started (start) and the string indicating that the command terminated normally (end 0) has been repeated.

● Example of information output when the monitoring command timed out due to slowdown of the command or server processing

#    HA monitor ptrlcmd_ex Logging File
#
#      offset:53614     
# DATE     TIME            [PID] [PPID] [PGID] [UID] DATA
...
...
2017/04/01 19:46:15.794243 [9725] [8461] [9725] [0] sv1 ptrlcmd_ex "/bin/sh -c /opt/hitachi/HAmon/etc/patrol_ex.sh 1>./spool/ptrlcmd_ex/sv1.log 2>&1" : start
2017/04/01 19:46:25.811760 [8461] [8385] [8384] [0] sv1 ptrlcmd_ex : skip, pid:9725
2017/04/01 19:46:35.811760 [8461] [8385] [8384] [0] sv1 ptrlcmd_ex : skip, pid:9725
2017/04/01 19:46:45.811760 [8461] [8385] [8384] [0] sv1 ptrlcmd_ex : skip, pid:9725
2017/04/01 19:46:55.811760 [8461] [8385] [8384] [0] sv1 ptrlcmd_ex : skip, pid:9725
...
...
#####

After the string indicating that the monitoring command started (start) was output, the string indicating that the command ended (end) was not output because the command did not terminate. After the monitoring command was started, the string indicating that command execution was skipped (skip) was output several times until the command timed out. In such a case, see also the details log, and determine the reason why the monitoring command did not terminate.

● Example of information output when the monitoring command resulted in an error (with the termination status 10)

#    HA monitor ptrlcmd_ex Logging File
#
#      offset:53614     
# DATE     TIME            [PID] [PPID] [PGID] [UID] DATA
...
...
2017/04/01 19:46:15.794243 [9725] [8461] [9725] [0] sv1 ptrlcmd_ex "/bin/sh -c /opt/hitachi/HAmon/etc/patrol_ex.sh 1>./spool/ptrlcmd_ex/sv1.log 2>&1" : start
2017/04/01 19:46:18.324938 [9725] [8461] [0] [0] sv1 ptrlcmd_ex "/opt/hitachi/HAmon/etc/patrol_ex.sh" : end 10
...
...
#####

The string indicating that the monitoring command started (start) and a string indicating that the monitoring command failed (end 10) have been output. In such a case, see also the details log, and determine the reason why the monitoring command failed.