Hitachi

JP1 Version 12 JP1/SNMP System Observer Description, Operator's Guide and Reference


2.5.2 Process and service monitoring

Process monitoring is conducted by APM on a server on which the processes to be monitored are running. SSO then determines the process status, service status, and application status based on the received process status change event from APM.

In process and service monitoring, the following functions can be used:

  1. Monitored status management

  2. Threshold monitoring (only process monitoring)

  3. Monitoring of service operating states

  4. Automated action and remote command

Each of the above is explained below.

Organization of this subsection

(1) Monitored status management

The monitored status of a process and service is managed. The following figure shows the monitored status managed by SSO and the timing of monitored status changes.

Figure 2‒59: Monitored statuses of a process

[Figure]

  1. Monitoring has started.

  2. Monitoring has stopped due to the following reasons:

    • The monitoring conditions were not successfully set for monitoring server.

    • The APM of the monitoring server stopped.

    • The health check for the APM of the monitoring server failed.

    • The status change trap from APM disappeared.

  3. The APM of the monitoring server started.

  4. Either of the following events occurred:

    • Either SSO stopped or the ssoapmon daemon process paused.#

    • Monitoring was stopped by the user.

# If monitoring stops because the ssoapmon daemon process paused, a status change event is not issued.

When the monitored status changes, JP1/Cm2/SSO can issue a monitored status change event. For details on events, see G. Events.

(2) Monitoring threshold

When a threshold is set, it is possible to check whether the monitored process exceeds the threshold. The results of threshold monitoring can be treated as the process or application status. To indicate the status of a process or child process, three categories are used: Normal, Critical, and Unknown. To indicate the status of an application, four categories are used: Normal, Warning, Critical, and Unknown.

If the process monitoring status changes, a process status change event is issued. For details on these events, see G. Events.

A number of running processes must be specified for use as the threshold. If multiple instances of a process having the same name run or a wildcard character is used for the process name, the minimum (lower threshold) and maximum (upper threshold) values must be specified in order to define the number of running processes. If the number of running processes moves outside the preselected threshold range, the process status changes. The following table lists how the process status and child process status are determined.

Table 2‒16: Method for determining the status of a process or child process

Status

Determination method

Normal

The number of running processes# of the monitored processes is within the threshold limit.

Critical

The number of running processes# of the monitored processes is outside the threshold limit.

Unknown

No process status change event exists.

#

Zombie processes that can be detected only if the OS of the agent is HP-UX or HP-UX (IPF) are not included in the number of running processes.

SSO determines application status in accordance with process status. The following table lists how application status is determined.

Table 2‒17: Method for determining the status of an application

Status

Determination method

Normal

All the processes are normal.

Warning

There is at least one normal and one critical process. In addition, unknown processes do not exist.

Critical

All the processes are critical.

Unknown

At least one process is unknown.

(a) Notes on threshold monitoring

Generally, in UNIX, when a given process generates a child process, that child process temporarily inherits the process name, command line name, and other execution environment settings of the parent process. For this reason, when SSO monitors processes on UNIX, the number of running monitored processes might include the number of their child processes.

Therefore, for processes monitored on UNIX, if the upper threshold is set without taking into consideration the number of child processes, the threshold might be exceeded, reporting Critical status even if the status is Normal. Therefore, if the OS of the monitoring server is UNIX, you must tune the value of the upper threshold. Set the upper threshold value shown below for any monitoring-target processes and their child processes.

If the maximum number of processes that concurrently exist as child processes of a monitoring-target process (or child process) is known:

Assume that the maximum number of instances of a process (or child process) to be monitored is m, and the maximum number of child processes that concurrently exist per process is n. Then, set the value obtained from the following calculation formula for the upper threshold of that process (or child process):

m x (1 + n)

However, if the result of the above calculation exceeds 9999, set 9999.

If the maximum number of processes that exist simultaneously as the child processes of a monitoring-target process (or child process) is unknown:

Set 9999.

(3) Monitoring of service operating states

If you have mapped service states and service operating states, you can manage the operating status of monitored services by using the service monitoring status and application status. There are three service monitoring statuses (Normal, Critical, and Unknown) and four application statuses (Normal, Warning, Critical, and Unknown).

If the status of a service changes, a service status change event is issued. If the status of an application changes, an application status change event is issued. For details on these events, see G. Events.

The following table lists how the service monitoring status is determined.

Table 2‒18: Method for determining the service monitoring status

Status

Determination method

Normal

The monitored service is operating in the status that was set as Normal in the service operating status mapping.

Critical

The monitored service is operating in the status that was set as Critical in the service operating status mapping.

Unknown

No service status change event has been issued. Alternatively, the specified service does not exist on the monitoring server.

SSO determines application status in accordance with service status. The following table lists how application status is determined.

Table 2‒19: Method for determining the status of an application

Status

Determination method

Normal

All the services are normal.

Warning

There is at least one normal and one critical service. In addition, unknown services do not exist.

Critical

All the services are critical.

Unknown

At least one service is unknown.

(4) Automated actions and remote commands

A command can be automatically executed when an application status changes. Commands can be set individually for the normal, warning, and critical regions. These commands can be executed on the monitoring manager and on the monitoring server. Executing a command automatically on the monitoring manager or monitoring server is called an automated action. Commands that are executed either automatically or on demand on the monitoring server are called remote commands. You can also specify variables in a command. For details on the variables you can define, see H. Variables That Can Be Defined via Automated Action.

From the Process Monitor window, you can execute on an on-demand basis commands that were registered by the monitoring application on the monitoring server. You cannot specify variables for commands that are executed on demand.

To execute commands as automatic actions, you must be a superuser (in Linux) or a member of the Administrators (in Windows). To execute remote commands in Linux, you must be a superuser. To execute remote commands in Windows, you must have permission to log on to SNMP System Observer - Agent for Process, which is an APM service.

If you run batch files on Windows, add cmd /q /c at the beginning of the command line. For example, to execute C:\temp\aaa.bat, specify C:\temp\aaa.bat.

The following lists the triggers for the execution of automatic actions and remote commands.

Note that automatic actions are executed according to the application status determined when the ssoapmon daemon process is started, as shown in the following table.

Table 2‒20: Application statuses and automatic action execution triggers

Application status when ssoapmon is started

Automatic action execution trigger

Normal

Unknown -> Normal

Warning

Unknown -> Warning

Critical

Unknown -> Critical