Hitachi

JP1 Version 12 JP1/Integrated Management 2 - Manager Configuration Guide


2.17.5 Specifying settings for handling JP1/IM - Manager failures (for UNIX)

JP1/IM - Manager provides functions to protect against its own failures, such as the tool for collecting data needed for resolving problems and the function for automatic restart in the event of abnormal process termination.

This subsection describes the settings for handling JP1/IM - Manager failures.

Organization of this subsection

(1) Preparations for collecting data in the event of a failure

JP1/IM - Manager provides a shell script (jim_log.sh) as a tool for collecting data in the event of a problem. This tool enables you to collect data needed for resolving problems in batch mode.

The data collection tool of JP1/IM - Manager can collect troubleshooting data for JP1/IM - Manager and JP1/Base. For details about the data that can be collected, see 11.3 Data that needs to be collected when a problem occurs in the JP1/Integrated Management 2 - Manager Administration Guide.

About the data collection tool
  • About jim_log.sh

    See jim_log.sh (UNIX only) in Chapter 1. Commands in the manual JP1/Integrated Management 2 - Manager Command, Definition File and API Reference.

In the event of a problem, you might need to obtain a core dump to facilitate investigation of the cause. Output of a core dump depends on the user environment settings. Check the settings described below.

(a) Setting the size of a core dump file

The maximum size of a core dump file depends on the root user's core dump file size setting (ulimit -c). In JP1/IM - Manager, the following setting is specified in the jco_start and jco_start.cluster scripts so that output of core dump files does not depend on the user's environment settings:

ulimit -c unlimited

If this setting violates your machine's security policies, edit the scripts to set an acceptable value as shown below.

  • The following example limits the size to 8,388,608 blocks:

ulimit -c 8388608

Important

If the setting is commented out or a value other than unlimited is set, you might not be able to investigate problems because no dump or a limited core dump will be output in case of core dump output events such as a segmentation failure in a JP1/IM - Manager process, a bus failure, or the execution of the jcogencore command.

(b) Setting the kernel parameters regarding core dump (Linux only)

In kernel parameter of Linux (kernel.core_pattern), when the output destination of core dump file is set to directory other than collection-target log file directory, or when the name of the core dump file is changed from the default setting, the data collection tool would not be able to acquire the core dump file when the tool is executed.

The data collection tool collects files whose file names start with core in the following default collection-target log file directories.

For physical hosts:

  • /var/opt/jp1cons/log/

  • /var/opt/jp1scope/log/

  • /var/opt/jp1imm/log/imcf/

  • /var/opt/jp1imm/log/imdb/

  • /var/opt/jp1imm/log/imdd/

For logical hosts:

  • shared-directory/jp1cons/log/

  • shared-directory/jp1socpe/log/

  • shared-directory/jp1imm/log/imcf/

  • shared-directory/jp1imm/log/imdb/

  • shared-directory/jp1imm/log/imdd/

Depending on the setting of kernel.core_pattern, it might be necessary to check and address the following points before executing the data collection tool. The default setting values vary depending on the OS version, so be sure to check the setting values.

  • When the output directory for a core dump file is not the collection-target log file directory

    Make a copy of the core dump file in the default output directory.

  • When the file name of a core dump file is changed

    Change the file name of the core dump file to a name beginning with core.

  • When core dump files are compressed

    Uncompress the core dump files.

(c) Setting ABRT for core dump files (Linux only)

In a Linux with Automatic Bug Reporting Tool (ABRT) installed, ABRT can be configured to allow limited processes, OS user accounts, or user groups to generate core dump files. In such a case, you cannot investigate problems because a core dump file might not be generated in case of core dump output events such as a segmentation failure in a JP1/IM - Manager process, a bus failure, or the execution of the jcogencore command.

Depending on your operation, you should change the ABRT settings to ensure that processes or OS user accounts or user groups that run JP1/IM - Manager are allowed to generate core dump files. For details, see the documentation for your Linux.

(d) The systemd settings related to core dump files (Linux only)

These settings apply to Linux environments where the settings file for core dump file names (/proc/sys/kernel/core_pattern) begins with the character string "|/usr/lib/systemd/systemd-coredump".

If the operation settings file for core dump files (/etc/systemd/coredump.conf) includes a setting that specifies that no core dump files are to be created, no core dump file will be output and users will not be able to investigate the failure in situations such as when a segmentation fault or a bus failure occurs in a JP1/IM - Manager process, or when the jcogencore command is executed.

Based on operations to be performed, revise the settings in the operation settings file for core dump files (/etc/systemd/coredump.conf) so that core dump files are created. For details, see the documentation for your Linux.

(e) Notes for SUSE Linux

In SUSE Linux Enterprise Server 12 SP2, when the jcogencore command is used to output a core dump file, the core dump file is not output to the directory storing the command, and either

of the following messages is output:

- KAVB8428-W The core dump file was not found.

- KAVB8408-E The specified process is not running.

The following procedure assumes that core dump files are set to be output to /var/lib/systemd/coredump/.

  1. From /var/lib/systemd/coredump/, copy the files that contain the JP1/IM - Manager execution file name with the timestamp corresponding to the time when the jcogencore command was executed. Copy these files to the applicable copy-direction directory as follows.

    No

    JP1/IM - Manager execution file name

    Copy-destination directory for each execution file

    Physical host

    Logical host

    1

    evflow

    /var/opt/jp1cons/log/

    <shared-directory>/jp1cons/log/

    2

    jcamain

    3

    evtcon

    4

    evgen

    5

    jcdmain

    6

    jcfmain

    /var/opt/jp1imm/log/imcf/

    <shared-directory>/jp1cons/log/

    7

    jcfallogtrap

  2. If the file is a compressed file, decompress the file. (If the file is not a compressed file, go to step 3.) If the extension of the copied file is .xz, use the following command to decompress the file:

    unxz <file-path-copied-in-step-1>

  3. Rename the files as follows.

    No

    Original file name (by default)

    New file name

    1

    File name that begins with "core.evflow."

    core.evflow

    2

    File name that begins with "core.jcamain."

    core.jcamain

    3

    File name that begins with "core.evtcon"

    core.java

    4

    File name that begins with "core.evgen"

    core.evgen

    5

    File name that begins with "core.jcdmain "

    core.jcdmain

    6

    File name that begins with "core.jcfmain"

    core.jcfmain

    7

    File name that begins with "core.jcfallogtrap"

    core.<PID>.jcfallogtrap#

#: For <PID>, specify the PID of the original file. The original file names conform to the following naming convention. (Exclude the extension ".xz" if the file is not a compressed file.) core.<execution-file-name>.<real-UID>.<boot-ID>.<PID>.<total-seconds>.xz

In the following file name, the PID is 1378.

core.jcfallogtrap.0.71abdba3becd450a8ac5c4469dfcd648.1378.1493089252000000.xz

(2) Restart settings in the event of abnormal process termination

To specify restart settings in the event of abnormal process termination:

  1. Define process restart.

    Edit the following extended startup process definition file (jp1co_service.conf) so that process restart is enabled:

    /etc/opt/jp1cons/conf/jp1co_service.conf

    The restart parameter is the fourth value that is separated by the vertical bars (|). Set either 0 (do not restart (default)) or 1 (restart).

  2. Apply the definition information.

    If JP1/IM - Manager is running, execute JP1/IM - Manager's reload command so that the process restart setting is enabled:

    /opt/jp1cons/bin/jco_spmd_reload

About process restart definition
  • About the extended startup process definition file (jp1co_service.conf)

    See Extended startup process definition file (jp1co_service.conf) in Chapter 2. Definition Files in the manual JP1/Integrated Management 2 - Manager Command, Definition File and API Reference.

Note:

In a cluster system, do not enable process restart in the event of abnormal process termination. If JP1/IM - Manager fails, the process restart function might also be affected. If you want to restart processes in the event of an abnormal process termination in a cluster system, use the cluster software (not JP1/IM - Manager) to control the restart.

(3) Setting JP1 event issuance in the event of abnormal process termination

To set JP1 event issuance in the event of abnormal process termination:

  1. Set JP1 event issuance.

    Edit the following IM parameter definition file (jp1co_param_V7.conf):

    /etc/opt/jp1cons/conf/jp1co_param_V7.conf

    In this file, SEND_PROCESS_TERMINATED_ABNORMALLY_EVENT and SEND_PROCESS_RESTART_EVENT are JP1 event issuance setting parameters. To issue JP1 events, change the value to dword:1.

  2. Execute the jbssetcnf command to apply the definition information.

    /opt/jp1base/bin/jbssetcnf /etc/opt/jp1cons/conf/jp1co_param_V7.conf

  3. Restart JP1/Base and the products that require JP1/Base.

    The specified settings take effect after the restart.

About JP1 event issuance settings
  • About the IM parameter definition file (jp1co_param_V7.conf)

    See IM parameter definition file (jp1co_param_V7.conf) in Chapter 2. Definition Files in the manual JP1/Integrated Management 2 - Manager Command, Definition File and API Reference.

(4) Setting the health check function

To set the health check function in order to detect JP1/IM - Manager process hang-ups:

  1. Open the health check definition file (jcohc.conf) and specify parameters.

    To enable the health check function, specify ENABLE=true.

    Specify EVENT=true to issue a JP1 event and COMMAND=command-to-be-executed to execute a notification command when a hang-up is detected.

  2. Use the jco_spmd_reload command to reload JP1/IM - Manager, or restart JP1/IM - Manager.

  3. If you specified the notification command, execute the jcohctest command to check the notification command's execution validity.

    Execute the jcohctest command to determine whether the command specified in COMMAND executes correctly. If the operation is not valid, check and, if necessary, revise the specification.

About the health check function settings
  • About the health check definition file (jcohc.conf)

    See Health check definition file (jcohc.conf) in Chapter 2. Definition Files in the manual JP1/Integrated Management 2 - Manager Command, Definition File and API Reference.

  • About the jcohctest command

    See jcohctest in Chapter 1. Commands in the manual JP1/Integrated Management 2 - Manager Command, Definition File and API Reference.

(5) Automatic backup and recovery settings for a monitoring object database

You specify these settings when you will be using the functions of Central Scope.

If the OS shuts down while the monitoring tree is being updated or a failover occurs during cluster operation, the monitoring object database might be corrupted. Therefore, you must set the monitoring object database to be backed up and recovered automatically when the monitoring tree is being updated.

These settings are enabled when you have performed a new installation, and they are disabled if the settings were disabled in the previous version of JP1/IM - Manager. Change the settings as appropriate to your operation.

To specify automatic backup and recovery settings for a monitoring object database:

  1. Terminate JP1/IM - Manager.

  2. Execute the jbssetcnf command using the following file for the parameters:

    To enable the automatic backup and recovery functions for the monitoring object database: auto_dbbackup_on.conf

    To disable the automatic backup and recovery functions for the monitoring object database: auto_dbbackup_off.conf

    When you execute the jbssetcnf command, the settings are applied to the JP1 common definition information.

    For details about the jbssetcnf command, see the JP1/Base User's Guide.

    About the settings in the file

    For details about the settings in the file, see Automatic backup and recovery settings file for the monitoring object database (auto_dbbackup_xxx.conf) in Chapter 2. Definition Files in the manual JP1/Integrated Management 2 - Manager Command, Definition File and API Reference.

  3. Start JP1/IM - Manager.