Job Management Partner 1/Base User's Guide

[Contents][Glossary][Index][Back][Next]


16.5.4 Errors detected by the health check function

The health check function can detect errors in the JP1/Base processes. The following describes the causes and recovery actions for errors detected by the health check function.

Organization of this subsection
(1) There is a large number of system resources (CPU, disk, and other resources) being consumed. Or, the number of process requests exceeds the performance limit.
(2) The command process does not end as expected. Or, the command process does not end and still retains system resources.
(3) A process is in a deadlock or infinite loop
(4) Unable to connect to the host to be monitored

(1) There is a large number of system resources (CPU, disk, and other resources) being consumed. Or, the number of process requests exceeds the performance limit.

Cancel any processing that places a high load on the system.

(2) The command process does not end as expected. Or, the command process does not end and still retains system resources.

Using an OS function such as the kill command, forcibly end the command process.

(3) A process is in a deadlock or infinite loop

If a process goes into a deadlock or infinite loop and fails to end in a timely manner, take the recovery action described in the following table.

No. Function Process name Recovery action
1 Process management jbs_spmd Restart JP1/Base.

In Windows:
Restart the JP1/Base services (process management including user management).

In UNIX:
Restart JP1/Base.#
2 Authentication server jbssessionmgr
3 Configuration management jbsroute
4 Command execution jcocmd
5 Plugin service jbsplugind
6 Event service jevservice

Restart the event service.

In Windows:
Restart the JP1/Base Event service.

In UNIX:
Restart the event service.#
7 Log file trap jevtraplog jevtraplog
jevlogd
Restart the log-file trap management service (daemon).

In Windows:
Restart the JP1/Base LogTrap service.

In UNIX:
Restart the log-file trap management daemon.#
jelparentim Using the jevlogstart command, restart the log file trap that has the ID indicated in the error message.
jelchildim
8 Event log trap jevtrapevt Restart the event log trapping service (JP1/Base EventlogTrap).
9 SNMP trap converter imevtgw Restart NNM.
10 Health check jbshcd
jbshchostd
Restart JP1/Base.

In Windows:
Restart the JP1/Base services (process management including user management).

In UNIX:
Restart JP1/Base.#

#: After terminating the processes with the stop command, use the ps -el command to make sure all the processes have ended. If any processes are still active, end them with the kill command. Then restart the processes using the start command.


(4) Unable to connect to the host to be monitored

[Contents][Back][Next]


[Trademarks]

All Rights Reserved. Copyright (C) 2009, Hitachi, Ltd.