Hitachi

Job Management Partner 1 Version 10 Job Management Partner 1/Integrated Management - Manager Configuration Guide


2.18.4 Settings for handling JP1/IM - Manager failures

JP1/IM - Manager provides functions to protect against its own failures, such as the tool for collecting data needed for resolving problems and the function for automatic restart in the event of abnormal process termination.

This subsection describes the settings for handling JP1/IM - Manager failures.

Organization of this subsection

(1) Preparations for collecting data in the event of a failure

JP1/IM - Manager provides a shell script (jim_log.sh) as a tool for collecting data in the event of a problem. This tool enables you to collect data needed for resolving problems in batch mode.

The data collection tool of JP1/IM - Manager can collect troubleshooting data for JP1/IM - Manager and JP1/Base. For details about the data that can be collected, see 10.3 Data that needs to be collected when a problem occurs in the Job Management Partner 1/Integrated Management - Manager Administration Guide.

About the data collection tool
  • About jim_log.sh

    See jim_log.sh (UNIX only) in 1. Commands in the manual Job Management Partner 1/Integrated Management - Manager Command and Definition File Reference.

In the event of a problem, it may be advisable to obtain a core dump to facilitate investigation of the cause. Output of a core dump depends on the user environment settings. Check the settings described below.

(a) Setting the size of a core dump file

The maximum size of a core dump file depends on the root user's core dump file size setting (ulimit -c). In JP1/IM - Manager, the following setting is specified in the jco_start and jco_start.cluster scripts so that output of core dump files does not depend on the user's environment settings:

ulimit -c unlimited

If this setting violates your machine's security policies, comment out the setting in the scripts, as shown below:

# ulimit -c unlimited

Important note

If the setting is commented out, you may not be able to investigate problems because no core dump will be output in the event of a segmentation failure in a JP1/IM - Manager process or a bus failure, or when a core dump file is to be output by the jcogencore command.

(b) Setting the kernel parameters regarding core dump (Linux only)

When the output destination for a core dump file and the name of the core dump file are changed from the default settings in the kernel parameters of Linux (kernel.core_pattern), the data collection tool might not be able to acquire the core dump file when the tool is executed. To prevent this problem, we recommend that you do not change the settings of the Linux kernel parameters (kernel.core_pattern).

The data collection tool acquires files with file names beginning with core from the following default output directories.

  • For physical hosts: /var/opt/jp1cons/log/

  • For logical hosts: shared-directory/jp1cons/log/

If you have changed the settings of kernel.core_pattern, you need to perform the following before you execute the data collection tool.

  • When the output directory for a core dump file is changed

    Make a copy of the core dump file in the default output directory.

  • When the file name of a core dump file is changed

    Change the file name of the core dump file to a name beginning with core.

(2) Restart settings in the event of abnormal process termination

To specify restart settings in the event of abnormal process termination:

  1. Define process restart.

    Edit the following extended startup process definition file (jp1co_service.conf) so that process restart is enabled:

    /etc/opt/jp1cons/conf/jp1co_service.conf

    The restart parameter is the fourth value that is separated by the vertical bars (|). Set either 0 (do not restart (default)) or 1 (restart).

  2. Apply the definition information.

    If JP1/IM - Manager is running, execute JP1/IM - Manager's reload command so that the process restart setting is enabled:

    /opt/jp1cons/bin/jco_spmd_reload

About process restart definition
  • About the extended startup process definition file (jp1co_service.conf)

    See Extended startup process definition file (jp1co_service.conf) in 2. Definition Files in the manual Job Management Partner 1/Integrated Management - Manager Command and Definition File Reference.

Note:

In a cluster system, do not enable process restart in the event of abnormal process termination. If JP1/IM - Manager fails, the process restart function might also be affected. If you want to restart processes in the event of an abnormal process termination in a cluster system, use the cluster software (not JP1/IM - Manager) to control the restart.

(3) Setting JP1 event issuance in the event of abnormal process termination

To set JP1 event issuance in the event of abnormal process termination:

  1. Set JP1 event issuance.

    Edit the following IM parameter definition file (jp1co_param_V7.conf):

    /etc/opt/jp1cons/conf/jp1co_param_V7.conf

    In this file, SEND_PROCESS_TERMINATED_ABNORMALLY_EVENT and SEND_PROCESS_RESTART_EVENT are JP1 event issuance setting parameters. To issue JP1 events, change the value to dword:1.

  2. Execute the jbssetcnf command to apply the definition information.

    /opt/jp1base/bin/jbssetcnf /etc/opt/jp1cons/conf/jp1co_param_V7.conf

  3. Restart JP1/Base and the products that require JP1/Base.

    The specified settings take effect after the restart.

About JP1 event issuance settings
  • About the IM parameter definition file (jp1co_param_V7.conf)

    See IM parameter definition file (jp1co_param_V7.conf) in 2. Definition Files in the manual Job Management Partner 1/Integrated Management - Manager Command and Definition File Reference.

(4) Setting the health check function

To set the health check function in order to detect JP1/IM - Manager process hang-ups:

  1. Open the health check definition file (jcohc.conf) and specify parameters.

    To enable the health check function, specify ENABLE=true.

    Specify EVENT=true to issue a JP1 event and COMMAND=command-to-be-executed to execute a notification command when a hang-up is detected.

  2. Use the jco_spmd_reload command to reload JP1/IM - Manager, or restart JP1/IM - Manager.

  3. If you specified the notification command, execute the jcohctest command to check the notification command's execution validity.

    Execute the jcohctest command to determine whether the command specified in COMMAND executes correctly. If the operation is not valid, check and, if necessary, revise the specification.

About the health check function settings
  • About the health check definition file (jcohc.conf)

    See Health check definition file (jcohc.conf) in 2. Definition Files in the manual Job Management Partner 1/Integrated Management - Manager Command and Definition File Reference.

  • About the jcohctest command

    See jcohctest in 1. Commands in the manual Job Management Partner 1/Integrated Management - Manager Command and Definition File Reference.

(5) Automatic backup and recovery settings for a monitoring object database

You specify these settings when you will be using the functions of Central Scope.

If the OS shuts down while the monitoring tree is being updated or a failover occurs during cluster operation, the monitoring object database may be corrupted. Therefore, you must set the monitoring object database to be backed up and recovered automatically when the monitoring tree is being updated.

These settings are enabled when you have performed a new installation, and they are disabled if the settings were disabled in the old version of JP1/IM - Manager or JP1/IM - Central Scope. Change the settings as appropriate to your operation.

To specify automatic backup and recovery settings for a monitoring object database:

  1. Terminate JP1/IM - Manager.

  2. Execute the jbssetcnf command using the following file for the parameters:

    To enable the automatic backup and recovery functions for the monitoring object database: auto_dbbackup_on.conf

    To disable the automatic backup and recovery functions for the monitoring object database: auto_dbbackup_off.conf

    When you execute the jbssetcnf command, the settings are applied to the JP1 common definition information.

    For details about the jbssetcnf command, see the Job Management Partner 1/Base User's Guide.

    About the settings in the file

    For details about the settings in the file, see Automatic backup and recovery settings file for the monitoring object database (auto_dbbackup_xxx.conf) in 2. Definition Files in the manual Job Management Partner 1/Integrated Management - Manager Command and Definition File Reference.

  3. Start JP1/IM - Manager.