uCosminexus Application Server, Maintenance and Migration Guide

[Contents][Glossary][Index][Back][Next]

3.3.1 Data acquisition settings using failure detection time commands (Systems for executing J2EE applications)

This subsection describes how to specify the settings to acquire the data for troubleshooting using the failure detection time commands. Note that you can collect the material acquired by the failure detection time commands as the snapshot log.

There are two types of failure detection time commands; commands that the system provides and commands that the user creates.

According to the default settings, when an error occurs in a logical server, the failure detection time commands provided by the system are executed and thread dumps and trace based performance analysis are acquired. The snapshot log is collected before terminating the logical server where the error occurs. For the information that can be acquired by executing the failure detection time commands provided by the system, see 2.3.2(1) Information that can be acquired by executing the failure detection time commands provided by the system.

To change the operation settings of the failure detection time commands provided by the system, (1) Environment settings in the Management Server and (2) Environment settings in the Administration Agent are necessary. Also, when using the user created failure detection time commands, (1) Environment settings in the Management Server, (2) Environment settings in the Administration Agent, and (3) Creating a command file of the user created failure detection time commands are necessary. Respective settings are described in points from (1) to (3).

Note
To collect the material acquired by the user created failure detection time commands as the snapshot log, you need to add the collection destination of that material to the snapshot log collection destination. For details about addition of the snapshot log collection destination, see 3.3.3(3) Customizing the snapshot log collection destination.
Organization of this subsection
(1) Environment settings in the Management Server
(2) Environment settings in the Administration Agent
(3) Creating a command file of the user created failure detection time commands

(1) Environment settings in the Management Server

Use mserver.properties (environment settings file of Management Server) to specify the operation of the failure detection time commands.

Specify the operation of the failure detection time commands in the following keys.

Key Description Setting requirement
System User
com.cosminexus.mngsvr.sys_cmd.abnormal_end.enabled Specify whether to use system provided failure detection time commands. The default setting is true (use). O --
com.cosminexus.mngsvr.usr_cmd.abnormal_end.enabled Specify whether to use the user created failure detection time commands. The default setting is false (do not use). -- R
com.cosminexus.mngsvr.sys_cmd.abnormal_end.timeout Specify the waiting period for termination of system provided failure detection time commands. If the command does not terminate even after the specified time lapses, the user recovery process continues. O --
com.cosminexus.mngsvr.usr_cmd.abnormal_end.timeout Specify the waiting period for termination of the user created failure detection time commands. -- O
com.cosminexus.mngsvr.snapshot.auto_collect.enabled Specify whether to acquire the snapshot log when an error occurs or for batch restart. The default setting is true (acquire the snapshot log). O O
com.cosminexus.mngsvr.snapshot.collect.point Specify one of the following as the snapshot log collection timing:
  • Before terminating the logical server
  • Before restarting the J2EE server
The default timing is before terminating the logical server.
O O

Legend:
System: You need to set the system provided failure detection time commands.
User: You need to set the user created failure detection time commands.
R: Required
O: Required only when changing the default settings.
--: Not required

(2) Environment settings in the Administration Agent

Use adminagent.properties (Administration Agent property file) to specify the material to be acquired by the failure detection time commands.

In the following keys of adminagent.properties, specify the count of the material to be acquired, application of collection using the failure detection time commands, and the path of the failure detection time commands. For details on the files defining the snapshot log collection target, see 3.3.3(3) Customizing the snapshot log collection destination

Key Description Setting requirement
System User
adminagent.snapshotlog.num_snapshots Specify the number of snapshot log files to be collected as the primary delivery data for each logical server. O O
adminagent.snapshotlog.listfile.2.num_snapshots Specify the number of snapshot log files to be collected as the secondary delivery data for each logical server. O O
adminagent.j2ee.sys_cmd.abnormal_end.threaddump Specify whether to acquire thread dumps using the system provided failure detection time commands. O --
adminagent.sys_cmd.abnormal_end.prftrace Specify whether to acquire the trace based performance analysis file using the system provided failure detection time commands. O --
adminagent.logical-server-type.usr_cmd.abnormal_end Specify the path of failure detection time commands to be executed for each type of logical server. -- R

Legend:
System: You need to set the system provided failure detection time commands.
User: You need to set the user created failure detection time commands.
R: Required
O: Required only when changing the default settings.
--: Not required

(3) Creating a command file of the user created failure detection time commands

You can code the user created failure detection time commands in a command file (batch file or shell script file). At this time, you can code the environment variables described in the following table, in the command file to execute the commands using the information of the logical server where the error occurred and the information related to the error.

Table 3-7 Environment variables that you can code in the command file of the user created failure detection time commands

Environment variable Description
COSMI_MNG_LSNAME Logical server name of the logical server where the error occurred. When an error occurs in the naming service of the logical CTM, the logical server name of logical CTM will be set up.
COSMI_MNG_RSNAME Actual server name of the logical server where the error occurred. For a logical server other than a J2EE server or SFO server, the logical server name is set.
COSMI_MNG_LSPID Process IDs to be monitored when the logical server starts. When monitoring multiple process IDs on an indirectly started logical user server, the process IDs are specified, demarcated by commas (,) in the order in which the process IDs are acquired by the command executed for acquiring the process IDs when the logical user server is started.
COSMI_MNG_LSARGS Command line when the logical server is started.
COSMI_MNG_TIME_SUSPENDED Time at which hang up is detected. Time lapsed (unit: ms) from 0 hour before January 1, 1970 of the universal coordinated time (UTC). Note that the value is set only if the response is detected.
COSMI_MNG_TIME_TERMINATED Time at which abnormal termination (process down) is detected. Time lapsed (unit: ms) from 0 hour before January 1, 1970 of the universal coordinated time (UTC). Note that the value is not set if hang up occurs.
COSMI_MNG_WEB_SYSTEM Web system affiliated to the logical server where an error occurs. The value is not required if you do not use the Smart Composer function.
COSMI_MNG_TIER Physical tier affiliated to the logical server where an error occurs. The value is not required if you do not use the Smart Composer function.
COSMI_MNG_UNIT Service unit affiliated to the logical server where an error occurs. The value is not required if you do not use the Smart Composer function.
COSMI_MNG_HWS Cosminexus HTTP Server installation directory.

The Management Server cannot acquire the standard output and standard error output from the commands executed as commands to detect error. To acquire the standard output and standard error output of a command, information must be output to a file during command execution.

(a) Examples of obtaining the user dump or core dump

The following examples describe the execution of the drwtsn32 or kill command when an error is detected in the J2EE server and the collection of user dumps or core dumps:

(b) Example of obtaining the thread dump

The following is an example of the case in which the cjdumpsv command is executed to obtain the J2EE server (real server name: J2EEServer) thread dump when a Web server error occurs.

In this example, the cjdumpsv command is executed multiple times to check the status transition of each thread in accordance with the lapsed time. As a standard, the cjdumpsv command is executed about ten times every three seconds.

(c) Operating user-created failure detection time commands

The logical CTM starts, stops, and monitors two processes; the global CORBA Naming Service and the CTM daemon. There are different execution commands for the case when an error is detected in the global CORBA Naming Service and in the CTM daemon respectively, within the logical server.

Moreover, an error is detected in either of the two processes (CTM daemon or global CORBA Naming Service) in the logical CTM, therefore, the log that reports the startup of the failure detection time commands in the logical server (CTM) will be output in a Management Server log.