Job Management Partner 1/Base User's Guide

2.4.2 Setup for handling possible errors in JP1/Base

JP1/Base provides the following features to minimize the effects of a failure in JP1/Base on system operations based on JP1/IM or JP1/AJS:

Health check
The function can detect hangups (infinite loops or deadlocks) or abnormal termination (other than forced termination) of processes such as process management, the event service, and event conversion.

Detection of errors by the process management function
The service can detect abnormal termination of a process managed by the process management service and switching of the authentication server.

Restart when a process abnormally terminates
JP1/Base restarts automatically if an error occurs in a process managed by the process management service.

Restart the event service when a process abnormally terminates (UNIX only)
JP1/Base restarts automatically if an error occurs in an event service process on the physical host.

Data collection when a failure occurs
Troubleshooting information can be collected when a problem occurs in JP1/Base.

A process might terminate abnormally due to an error or it might be forcibly terminated by the OS kill command or other means. In the latter case, the health check function detects the process as having stalled, not as having terminated abnormally. To make sure that all process terminations are detected, use the process management function in conjunction with the health check function.

The following figure shows the range of process errors that can be detected by the health check and process management functions.

Figure 2-3 Range of process errors that can be detected by the health check and process management functions

How to set each function is described below.

Organization of this subsection

(1) Detecting process errors using the health check function

(2) Detecting process termination and authentication server switching

(3) Restarting processes managed by the abnormally terminated process management function

(4) Restarting an abnormally-terminated event service process (UNIX only)

(5) Hitachi Network Objectplaza Trace Library (HNTRLib2)

(6) Preparing to collect information when a problem occurs (Windows only)

(1) Detecting process errors using the health check function

Use of the health check function enables early detection of process errors. Message notification enables the operator to identify the process in which the error occurred and take action to minimize the effects. To use the health check function, JP1/Base 07-51 or a later version must be installed on the monitoring host and target hosts.

(a) Enabling the health check function

The health check function is disabled by default. How to enable the health check function is described below. In a cluster system, enable the health check function on both the physical hosts and logical hosts after you complete the setup of the logical hosts.

To enable the health check function:
Register information to enable the health check function in the common definition information.

1-1 Copy the model file (jbshc_setup.conf.model) for the common definition settings file (health check function) using any file name.

1-2 Edit the copied file.

1-3 Execute the following commands:

jbssetcnf file-name-of-copied-file

The health check function information is registered in the common definition information.

For details on the jbssetcnf command, see jbssetcnf in 13. Commands.

For details on the common definition settings file (health check function), see Common definition settings file (health check function) in 14. Definition Files.

Edit the health check definition file (jbshc.conf).
Define the monitoring target host and monitoring interval. For details on the health check definition file, see Health check definition file in 14. Definition Files.
Change the settings for forwarding JP1 events.
Add the following condition to the forwarding settings file (forward) to send JP1 events issued by the health check function to the higher-level management server.
E.OBJECT_TYPE IN JBSHC
For details on the forwarding settings file (forward), see Forwarding settings file in 14. Definition Files.
Restart all JP1/Base services and NNM (if using the SNMP trap converters).
The health check function starts and process monitoring begins.
If the health check definition file contains an error, that line is ignored and the default, if any, applies.
(b) Checking the health check settings

To check the health check settings and whether failovers at error detection are enabled, execute the following command and refer to the common definition information:
jbsgetcnf 
In the output information, locate the section about the health check function and check the settings.

For details on the jbsgetcnf command, see jbsgetcnf in 13. Commands. For details on the common definition information, see Common definition settings file (health check function) in 14. Definition Files.

(c) Changing the health check settings

To add a target host or change the monitoring interval:

Edit the health check definition file (jbshc.conf).
For details on the health check definition file, see Health check definition file in 14. Definition Files.

Apply the new settings in the health check definition file (jbshc.conf).
In Windows, restart the JP1/Base (process management) service.
In UNIX, execute the jbs_spmd_reload command. For details on the jbs_spmd_reload command, see jbs_spmd_reload in 13. Commands.
The reloaded settings apply at the next monitoring round.
If an error occurs at reload due to an error in the health check definition file (jbshc.conf), that line is ignored and the previous setting applies.

Note on reloading settings

If the settings are reloaded after an error has been detected during remote host monitoring, the monitoring status at the target host will be reset. If the failed host has not been restored when next polled, the health check function issues an error message or JP1 event again. If the failed host has been restored, no recovery message or JP1 event is issued.

(d) Disabling the health check function

To disable the health check function:

Edit the common definition settings file (health check function).

1-1 Copy the model file for the common definition settings file (health check function) using any file name.

1-2 Edit the copied file.

For details on the common definition settings file (health check function), see Common definition settings file (health check function) in 14. Definition Files.

Execute the following commands:
jbssetcnf file-name-of-copied-file
The health check function is disabled.
For details on the jbssetcnf command, see jbssetcnf in 13. Commands.

Restart all JP1/Base services and NNM (if using the SNMP trap converters).

(e) Upgrading from JP1/Base Version 7 or earlier in a clustering environment

If you are using a cluster system with JP1/Base version 07-00 or earlier, you must upgrade the logical host environment after performing an overwrite installation of JP1/Base version 07-51 or later. For details on the upgrade procedure, see 2.2.3(5) Overwrite installation (for Windows) or 2.3.4(5) Overwrite installation (for UNIX).

After upgrading the logical host environment, perform the steps described in (a) Enabling the health check function.

(f) Notes

Note the following points when using the health check function.

A process that is forcibly terminated by the kill command or other means is not detected as having terminated abnormally. Instead, the health check function detects that there is no response from the process (error message KAVA7014-E). However, the elapsed time at error detection in this case differs from the time passed since execution of the kill command. Because the health check function determines the error status from the update time of the shared memory used internally by the process, the abnormal status can be detected very soon after the process is forcibly terminated.

When a process is forcibly terminated by the kill command or other means and termination processing does not finish, a message reporting that an error was detected in the aborted process might be issued when you restart the affected service.

When process restart is specified in the extended startup process definition file (jp1bs_service_0700.conf) for a process that ends abnormally, a message (KAVB3605-I or KAVB3616-I) will be output to report that the process has restarted. This might be followed by another message (KAVA7017-E) reporting abnormal termination of the process. Check the process status using the jbs_spmd_status command.

(2) Detecting process termination and authentication server switching

When a process ends abnormally or the authentication server is swapped over automatically in a system with two authentication servers, JP1/Base outputs an error message to the integrated trace log. Such a message can be issued as a JP1 event. For details on the JP1 events issued by JP1/Base, see 15. JP1 Events.

(a) Monitored processes

JP1/Base detects abnormal termination of the following processes managed by the process management service (jbs_spmd):

jbssessionmgr (authentication server)

jbsroute (configuration management)

jcocmd (command execution)

jbsplugind (plugin service)

jbshcd (health check: for monitoring the local host)

jbshchostd (health check: for monitoring remote hosts)

jbssrvmgr (service management control)

jbslcact (local action)

jbscomd (inter-process communication)

(b) Triggering of JP1 events

When JP1 event issuance is enabled, a JP1 event is issued in the following situations:

Process managed by the process management service

When a timeout occurs at process startup

When the process ends abnormally

When no startup notification is received and a timeout occurs at process startup

When restart of a managed process that ended abnormally is completed^#
#: Only if restart has been specified for the process.

Authentication server (in a system with a secondary authentication server)

When connection to the authentication server fails and the connection is automatically blocked

When a blocked status is automatically released

When connection is blocked to both the primary and secondary authentication servers

(c) Setup

To set up this functionality:

Edit the JP1/Base parameter definition file (jp1bs_param_V7.conf).
The Restart or not parameter is the fourth value of the values separated by vertical bars (|). 0 (do not restart; the default) or 1 (restart) can be specified for this parameter. Note that the third of the values separated by vertical bars (|) must not be changed. For details on the JP1/Base parameter definition file, see JP1/Base parameter definition file in 14. Definition Files.

Execute the jbssetcnf command.
The settings in the JP1/Base parameter definition file (jp1bs_param_V7.conf) are reflected in the common definition information.
For details on the jbssetcnf command, see jbssetcnf in 13. Commands.

Restart JP1/Base and the programs that require JP1/Base.
The settings are applied.

(3) Restarting processes managed by the abnormally terminated process management function

Starting JP1/Base causes multiple processes to be generated. JP1/Base Version 7 or a later version can automatically restart a process that ends abnormally.

The process restart functionality described here is intended to restart JP1/Base in a non-cluster system. If you want to restart a process in a cluster system, use the cluster software.

(a) Target processes

The following target processes are managed by the process management function (jbs_spmd):

jbssessionmgr (authentication server)

jbsroute (configuration management)

jcocmd (command execution)

jbsplugind (plugin service)

jbshcd (health check: for monitoring the local host)

jbshchostd (health check: for monitoring remote hosts)

jbssrvmgr (service management control)

jbslcact (local action)

jbscomd (inter-process communication)

(b) Setup

To set up this functionality:
Edit the extended startup process definition file (jp1bs_service_0700.conf).
For details on the extended start process definition file, see Extended startup process definition file in 14. Definition Files.

Enable the setting.
To enable the automatic restart setting, restart JP1/Base or execute the reload command (jbs_spmd_reload).
Disable Dr. Watson error notification (Windows only).
If an error occurs and the Dr. Watson message box is displayed, the process cannot be restarted, so you need to disable the message display.
From the Start menu, choose Run, and then execute drwtsn32. In the Dr. Watson dialog box, clear the Visual Notification check box.
Because the settings for Dr. Watson are common to the whole system, the settings here are applied to the settings of all programs in the system.
From the command prompt, execute the following command to enable the settings for Dr. Watson:
drwtsn32 -i 
This command installs Dr. Watson as the default application debugger.
Disable Microsoft error reporting (Windows only).
When an error occurs, a dialog box for reporting the error to Microsoft appears. This prevents the process from restarting. You must therefore disable such error reporting.
1. In the Control Panel, double-click System.
2. Select the Advanced tab, and then click Error Reporting.
3. Select the Disable error reporting radio button, and make sure that the But notify me when critical errors occur check box is cleared.

(4) Restarting an abnormally-terminated event service process (UNIX only)

The UNIX version of JP1/Base version 9 or later can automatically restart an event service process on the physical host when the process terminates abnormally. This setting is disabled by default.

For the Windows version of JP1/Base, perform the settings for restarting services in the Windows Service Control Manager.

The process restart functionality described here is intended to restart JP1/Base in a non-cluster system. If you want to restart a process in a cluster system, use the cluster software.

(a) Target processes

The target process is the child process jevservice (event service) managed by jevservice (event service).

The child process jevservice (event service) managed by jevservice (event service) has a parent process whose process ID can be viewed by using the jevstat command.

(b) Setup

To set up this functionality:

Define the restart parameter in the event server settings file (conf).

Start the event service.

For details on the event server settings file (conf), see Event server settings file in 14. Definition Files.

(5) Hitachi Network Objectplaza Trace Library (HNTRLib2)

JP1/Base outputs log files using the Hitachi Network Objectplaza Trace Library (HNTRLib2). These log files trace the system processing invoked in JP1/Base and in program products for which JP1/Base is a pre-requisite program. The logged data can be used for investigating the cause of any errors that might occur in a JP1 program.

The following defaults are set for the HNTRLib2:

Size of one log file:256 KB

Maximum number of log files: 4

Output directory:

In Windows:

system-drive\Program Files\Hitachi\HNTRLib2\spool\hntr2*.log

In UNIX:

/var/opt/hitachi/HNTRLib2/spool/hntr2*.log

Usually, there is no need to change the default settings, but you can view and change the default settings by executing the hntr2util, hntr2conf, or hntr2getconf command. For details on the commands, see hntr2util (Windows only), hntr2util (UNIX only), hntr2conf, and hntr2getconf.

Note

From Version 7, the automatic uninstallation functionality has been added to the Hitachi Network Objectplaza Trace Library whose name has been changed from HNTRLib to HNTRLib2. If you have used Version 6 or earlier of JP1/Base, note that information related to the Network Objectplaza Trace Library such as the command names and output destinations differs between Version 7 and Version 6.

(6) Preparing to collect information when a problem occurs (Windows only)

Prepare the supplied tool for collecting data in the event of a problem. When you execute this tool, it will collect all the information for fixing the problem.

The data collection tool can collect memory dumps and crash dumps, among other information. To output these dumps, perform the following setup in advance. Completing this setup enables dump data to be collected by the data collection tool.

(a) Setting up the memory dump output

In the Control Panel, double-click System.

Select the Advanced tab, and then choose Set for Startup and Recovery.

For the Write Debugging Information options, select Complete Memory Dump, and then in the Dump File entry box, specify the file to which you want to output memory dumps.

Note

The size of a memory dump differs depending on the size of real memory. A larger physical memory enables larger memory dumps. Allocate enough disk space for collecting memory dumps. For details, see STOP error in the Windows Help.

(b) Setting up the crash dump output

To set up the crash dump output:

From the Start menu, choose Run.

Type drwtsn32 in the text box and click the OK button.

The Dr. Watson dialog box appears.

Select Create Crash Dump File, and specify an output file in the Crash Dump text box.

Click the OK button.

Note

Crash dumps output not only information on JP1 but also error information on other application programs. When a crash dump is output, the available disk space decreases accordingly. When you set up the crash dump output, make sure that there is enough disk space for it.

[Contents][Back][Next]

[Trademarks]