Job Management Partner 1/Base User's Guide

[Contents][Glossary][Index][Back][Next]


2.4.2 Setup for handling possible errors in JP1/Base

JP1/Base provides the following features to minimize the effects of a failure in JP1/Base on system operations based on JP1/IM or JP1/AJS:

A process might terminate abnormally due to an error or it might be forcibly terminated by the OS kill command or other means. In the latter case, the health check function detects the process as having stalled, not as having terminated abnormally. To make sure that all process terminations are detected, use the process management function in conjunction with the health check function.

The following figure shows the range of process errors that can be detected by the health check and process management functions.

Figure 2-3 Range of process errors that can be detected by the health check and process management functions

[Figure]

How to set each function is described below.

Organization of this subsection
(1) Detecting process errors using the health check function
(2) Detecting process termination and authentication server switching
(3) Restarting processes managed by the abnormally terminated process management function
(4) Restarting an abnormally-terminated event service process (UNIX only)
(5) Hitachi Network Objectplaza Trace Library (HNTRLib2)
(6) Preparing to collect information when a problem occurs (Windows only)

(1) Detecting process errors using the health check function

Use of the health check function enables early detection of process errors. Message notification enables the operator to identify the process in which the error occurred and take action to minimize the effects. To use the health check function, JP1/Base 07-51 or a later version must be installed on the monitoring host and target hosts.

(a) Enabling the health check function

The health check function is disabled by default. How to enable the health check function is described below. In a cluster system, enable the health check function on both the physical hosts and logical hosts after you complete the setup of the logical hosts.

To enable the health check function:

  1. Register information to enable the health check function in the common definition information.

    1-1 Copy the model file (jbshc_setup.conf.model) for the common definition settings file (health check function) using any file name.

    1-2 Edit the copied file.

    1-3 Execute the following commands:
    jbssetcnf file-name-of-copied-file
    The health check function information is registered in the common definition information.
    For details on the jbssetcnf command, see jbssetcnf in 13. Commands.
    For details on the common definition settings file (health check function), see Common definition settings file (health check function) in 14. Definition Files.
  2. Edit the health check definition file (jbshc.conf).
    Define the monitoring target host and monitoring interval. For details on the health check definition file, see Health check definition file in 14. Definition Files.
  3. Change the settings for forwarding JP1 events.
    Add the following condition to the forwarding settings file (forward) to send JP1 events issued by the health check function to the higher-level management server.
    E.OBJECT_TYPE IN JBSHC
    For details on the forwarding settings file (forward), see Forwarding settings file in 14. Definition Files.
  4. Restart all JP1/Base services and NNM (if using the SNMP trap converters).
    The health check function starts and process monitoring begins.
    If the health check definition file contains an error, that line is ignored and the default, if any, applies.

(b) Checking the health check settings

To check the health check settings and whether failovers at error detection are enabled, execute the following command and refer to the common definition information:

jbsgetcnf 

In the output information, locate the section about the health check function and check the settings.

For details on the jbsgetcnf command, see jbsgetcnf in 13. Commands. For details on the common definition information, see Common definition settings file (health check function) in 14. Definition Files.

(c) Changing the health check settings

To add a target host or change the monitoring interval:

  1. Edit the health check definition file (jbshc.conf).
    For details on the health check definition file, see Health check definition file in 14. Definition Files.
  2. Apply the new settings in the health check definition file (jbshc.conf).
    In Windows, restart the JP1/Base (process management) service.
    In UNIX, execute the jbs_spmd_reload command. For details on the jbs_spmd_reload command, see jbs_spmd_reload in 13. Commands.
    The reloaded settings apply at the next monitoring round.
    If an error occurs at reload due to an error in the health check definition file (jbshc.conf), that line is ignored and the previous setting applies.

Note on reloading settings
If the settings are reloaded after an error has been detected during remote host monitoring, the monitoring status at the target host will be reset. If the failed host has not been restored when next polled, the health check function issues an error message or JP1 event again. If the failed host has been restored, no recovery message or JP1 event is issued.

(d) Disabling the health check function

To disable the health check function:

  1. Edit the common definition settings file (health check function).

    1-1 Copy the model file for the common definition settings file (health check function) using any file name.

    1-2 Edit the copied file.
    For details on the common definition settings file (health check function), see Common definition settings file (health check function) in 14. Definition Files.
  2. Execute the following commands:
    jbssetcnf file-name-of-copied-file
    The health check function is disabled.
    For details on the jbssetcnf command, see jbssetcnf in 13. Commands.
  3. Restart all JP1/Base services and NNM (if using the SNMP trap converters).

(e) Upgrading from JP1/Base Version 7 or earlier in a clustering environment

If you are using a cluster system with JP1/Base version 07-00 or earlier, you must upgrade the logical host environment after performing an overwrite installation of JP1/Base version 07-51 or later. For details on the upgrade procedure, see 2.2.3(5) Overwrite installation (for Windows) or 2.3.4(5) Overwrite installation (for UNIX).

After upgrading the logical host environment, perform the steps described in (a) Enabling the health check function.

(f) Notes

Note the following points when using the health check function.

(2) Detecting process termination and authentication server switching

When a process ends abnormally or the authentication server is swapped over automatically in a system with two authentication servers, JP1/Base outputs an error message to the integrated trace log. Such a message can be issued as a JP1 event. For details on the JP1 events issued by JP1/Base, see 15. JP1 Events.

(a) Monitored processes

JP1/Base detects abnormal termination of the following processes managed by the process management service (jbs_spmd):

(b) Triggering of JP1 events

When JP1 event issuance is enabled, a JP1 event is issued in the following situations:

Process managed by the process management service
  • When a timeout occurs at process startup
  • When the process ends abnormally
  • When no startup notification is received and a timeout occurs at process startup
  • When restart of a managed process that ended abnormally is completed#
    #: Only if restart has been specified for the process.

Authentication server (in a system with a secondary authentication server)
  • When connection to the authentication server fails and the connection is automatically blocked
  • When a blocked status is automatically released
  • When connection is blocked to both the primary and secondary authentication servers

(c) Setup

To set up this functionality:

  1. Edit the JP1/Base parameter definition file (jp1bs_param_V7.conf).
    The Restart or not parameter is the fourth value of the values separated by vertical bars (|). 0 (do not restart; the default) or 1 (restart) can be specified for this parameter. Note that the third of the values separated by vertical bars (|) must not be changed. For details on the JP1/Base parameter definition file, see JP1/Base parameter definition file in 14. Definition Files.
  2. Execute the jbssetcnf command.
    The settings in the JP1/Base parameter definition file (jp1bs_param_V7.conf) are reflected in the common definition information.
    For details on the jbssetcnf command, see jbssetcnf in 13. Commands.
  3. Restart JP1/Base and the programs that require JP1/Base.
    The settings are applied.

(3) Restarting processes managed by the abnormally terminated process management function

Starting JP1/Base causes multiple processes to be generated. JP1/Base Version 7 or a later version can automatically restart a process that ends abnormally.

The process restart functionality described here is intended to restart JP1/Base in a non-cluster system. If you want to restart a process in a cluster system, use the cluster software.

(a) Target processes

The following target processes are managed by the process management function (jbs_spmd):

(b) Setup

To set up this functionality:

  1. Edit the extended startup process definition file (jp1bs_service_0700.conf).
    For details on the extended start process definition file, see Extended startup process definition file in 14. Definition Files.
  2. Enable the setting.
    To enable the automatic restart setting, restart JP1/Base or execute the reload command (jbs_spmd_reload).
  3. Disable Dr. Watson error notification (Windows only).
    If an error occurs and the Dr. Watson message box is displayed, the process cannot be restarted, so you need to disable the message display.
    From the Start menu, choose Run, and then execute drwtsn32. In the Dr. Watson dialog box, clear the Visual Notification check box.
    Because the settings for Dr. Watson are common to the whole system, the settings here are applied to the settings of all programs in the system.
    From the command prompt, execute the following command to enable the settings for Dr. Watson:
    drwtsn32 -i 
    This command installs Dr. Watson as the default application debugger.
  4. Disable Microsoft error reporting (Windows only).
    When an error occurs, a dialog box for reporting the error to Microsoft appears. This prevents the process from restarting. You must therefore disable such error reporting.
    1. In the Control Panel, double-click System.
    2. Select the Advanced tab, and then click Error Reporting.
    3. Select the Disable error reporting radio button, and make sure that the But notify me when critical errors occur check box is cleared.

(4) Restarting an abnormally-terminated event service process (UNIX only)

The UNIX version of JP1/Base version 9 or later can automatically restart an event service process on the physical host when the process terminates abnormally. This setting is disabled by default.

For the Windows version of JP1/Base, perform the settings for restarting services in the Windows Service Control Manager.

The process restart functionality described here is intended to restart JP1/Base in a non-cluster system. If you want to restart a process in a cluster system, use the cluster software.

(a) Target processes

The target process is the child process jevservice (event service) managed by jevservice (event service).

The child process jevservice (event service) managed by jevservice (event service) has a parent process whose process ID can be viewed by using the jevstat command.

(b) Setup

To set up this functionality:

  1. Define the restart parameter in the event server settings file (conf).
  2. Start the event service.

For details on the event server settings file (conf), see Event server settings file in 14. Definition Files.

(5) Hitachi Network Objectplaza Trace Library (HNTRLib2)

JP1/Base outputs log files using the Hitachi Network Objectplaza Trace Library (HNTRLib2). These log files trace the system processing invoked in JP1/Base and in program products for which JP1/Base is a pre-requisite program. The logged data can be used for investigating the cause of any errors that might occur in a JP1 program.

The following defaults are set for the HNTRLib2:

Usually, there is no need to change the default settings, but you can view and change the default settings by executing the hntr2util, hntr2conf, or hntr2getconf command. For details on the commands, see hntr2util (Windows only), hntr2util (UNIX only), hntr2conf, and hntr2getconf.

Note
From Version 7, the automatic uninstallation functionality has been added to the Hitachi Network Objectplaza Trace Library whose name has been changed from HNTRLib to HNTRLib2. If you have used Version 6 or earlier of JP1/Base, note that information related to the Network Objectplaza Trace Library such as the command names and output destinations differs between Version 7 and Version 6.

(6) Preparing to collect information when a problem occurs (Windows only)

Prepare the supplied tool for collecting data in the event of a problem. When you execute this tool, it will collect all the information for fixing the problem.

The data collection tool can collect memory dumps and crash dumps, among other information. To output these dumps, perform the following setup in advance. Completing this setup enables dump data to be collected by the data collection tool.

(a) Setting up the memory dump output

  1. In the Control Panel, double-click System.
  2. Select the Advanced tab, and then choose Set for Startup and Recovery.
  3. For the Write Debugging Information options, select Complete Memory Dump, and then in the Dump File entry box, specify the file to which you want to output memory dumps.

Note
The size of a memory dump differs depending on the size of real memory. A larger physical memory enables larger memory dumps. Allocate enough disk space for collecting memory dumps. For details, see STOP error in the Windows Help.

(b) Setting up the crash dump output

To set up the crash dump output:

  1. From the Start menu, choose Run.
  2. Type drwtsn32 in the text box and click the OK button.
  3. The Dr. Watson dialog box appears.
  4. Select Create Crash Dump File, and specify an output file in the Crash Dump text box.
  5. Click the OK button.

Note
Crash dumps output not only information on JP1 but also error information on other application programs. When a crash dump is output, the available disk space decreases accordingly. When you set up the crash dump output, make sure that there is enough disk space for it.

[Contents][Back][Next]


[Trademarks]

All Rights Reserved. Copyright (C) 2009, Hitachi, Ltd.