Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


6.20.1 Settings in the files required for monitoring a LAN

If you will be monitoring LANs and performing hot standby processing automatically in the event of failure, you must create LAN monitor definition files and specify the files. There are two types of LAN monitor definition files:

Create these files under the directory for HA Monitor environment settings for each server. This subsection explains how to specify the LAN monitor definition files.

For details about monitoring LANs, see 3.4.1 LAN monitoring and automatic hot standby in the event of a failure.

Organization of this subsection

(1) Specifying a LAN monitor definition file

You specify in a LAN monitor definition file the IP addresses of the check packet transmission targets for monitoring a LAN. If you monitor only the hbonding status, the LAN monitor definition file must be empty (0 bytes).

You must specify this file for each monitored target before you start HA Monitor. A LAN monitor definition file cannot be added, changed, or deleted while HA Monitor is running.

The file name to be specified is LAN-interface-name.conf. For LAN-interface-name, specify the switchbyfail operand value that was specified in the server environment definition. Specifying a period (.) at the beginning disables the file.

The following rules apply to creating LAN monitor definition files:

If you specify IP labels in a LAN monitor definition file, specify each IP address and the corresponding IP label in the /etc/hosts file.

You must not create in the directory for HA Monitor environment settings any file with the extension .conf other than a LAN monitor definition file.

An example of a LAN monitor definition file is shown below. In this example, the LAN interface name is eth0, the physical IP address is 192.168.0.1, and the network is 192.168.0.

File name: eth0.conf
192.168.0.2
192.168.0.3
192.168.0.4

(2) Specifying a LAN monitoring script

Prepare this file if HA Monitor monitors a LAN by monitoring ping responses or the number of received packets. The name of the file to be set is lanpatrol.sh. If the monitored target is hbonding and only the hbonding status is to be monitored, there is no need to specify this file.

A sample is provided for this file under the directory for HA Monitor's sample files. Copy the sample file to the directory for HA Monitor environment settings and then use the copy. The following shows the contents of the sample file:

#!/bin/sh
#******************************************************************************
#*                                                                            *
#*    Linux(x86) HA Monitor                                                   *
#*    This is a sample of the LAN patrol command.                             *
#*                                                                            *
#*    [SYNOPSIS]                                                              *
#*        lanpatrol.sh interface [-r]                                         *
#*    [OPTIONS]                                                               *
#*        interface                                                           *
#*            The name of the LAN interface.                                  *
#*        -r                                                                  *
#*            Route monitoring method                                         *
#*    [DIAGNOSTICS]                                                           *
#*        0 : normal                                                          *
#*        1 : abnormal                                                        *
#*        2 : error                                                           *
#*                                                                            *
#*    All Rights Reserved. Copyright (C) 2011, 2018, Hitachi, Ltd.            *
#*                                                                            *
#******************************************************************************
#set -x

# retry interval(seconds) for receive packet number method. or
# return wait(seconds) of ping command for route monitoring method.
WAIT_TIME=3

# retry counter
RETRY=3

# environment variables
PATH=/usr/bin:/usr/sbin:/bin
export PATH

:
(Omitted)
Notes

Be careful if you create a LAN monitoring script with a sample file provided by HA Monitor whose version is earlier than 01-69. In this case, HA Monitor monitors the number of received packets independently of the setting of the lancheck_mode operand in the HA Monitor environment settings.

You can check the version of the sample file as follows:

  • If the [OPTIONS] section does not include a description of the -r option, the version is earlier than 01-69.

  • If the [OPTIONS] section includes a description of the -r option, the version is 01-69 or later.

The processing executed by this script differs depending on the monitoring method. The following shows the processing for each method.

If HA Monitor monitors ping responses:

Ping commands are issued in parallel for all IP addresses specified in the LAN monitor definition file by using the wait time specified (in seconds) in the WAIT_TIME environment variable. HA Monitor checks the return values from all ping commands that were issued in parallel, and if at least one of the return values is 0, HA Monitor judges that the LAN is in a normal state. If none of the return values is 0, HA Monitor retries issuing ping commands for all IP addresses specified in the LAN monitor definition file. The retry count can be set by using the RETRY environment variable. After all retries are completed, if none of the return values is 0, HA Monitor judges that the LAN is in an abnormal state.

If HA Monitor monitors the number of received packets:
  • The script issues the ping command to the IP address specified in the LAN monitor definition file.

  • When the time specified (in seconds) for the WAIT_TIME environment variable passes, the script determines repetitively the number of packets received by the monitored LAN interface and then compares the current value with the previous value. If the numbers of packets received do not match, the script determines that the LAN is functioning normally.

    If they do match, the script retries issuance of the ping command and again determines the number of packets received. The number of retries can be set for the RETRY environment variable. If the number of packets received remains the same over all retries, the script determines that an error has occurred.

The command line to be specified in the LAN monitoring script differs depending on the monitoring method. The following shows the command line for each method.

If HA Monitor monitors ping responses:
lanpatrol.sh∆LAN-interface-name∆-r
If HA Monitor monitors the number of received packets:
lanpatrol.shΔLAN-interface-name
Legend:

Δ: Indicates a single-byte space.

You can edit the sample file, if necessary. The following subsections provide notes about the sample file when it is used as is and when it is edited.

(a) Using the sample file as is

If you use the sample file as is and specify a value in the lancheck_patrol operand that is smaller than the default (15) in the HA Monitor environment settings, you must reduce the LAN monitoring script execution time. You must specify a LAN monitoring script execution time that satisfies the following formula:

lancheck_patrol operand value > LAN monitoring script execution time + 2

The execution time of the LAN monitoring script differs depending on the monitoring method. The following shows the calculation expressions for each method.

If HA Monitor monitors ping responses:
WAIT_TIME variable value × RETRY variable value

For example, if a sample file for the LAN monitoring script is used, the execution time is 3 × 3 = 9 seconds.

If HA Monitor monitors the number of received packets:
(WAIT_TIME variable value + 1) × RETRY variable value

In the sample file, the LAN monitoring script execution time is (3 + 1) × 3 = 12 seconds.

To reduce the LAN monitoring script execution time, reduce the value of the WAIT_TIME or the RETRY variable. The minimum values of WAIT_TIME and RETRY variables are 0 and 1, respectively.

Note

If HA Monitor monitors the number of received packets, it takes 1 second to execute the ping command in the LAN monitoring script. The time required for monitoring received packets equals this value plus the time specified in the WAIT_TIME variable. Monitoring of received packets is repeated as many times as specified in the RETRY variable. Therefore, the LAN monitoring script execution time equals the received packet monitoring time times the count specified in the RETRY variable.

(b) Editing the sample file

The following notes apply to editing the sample file:

  • A LAN monitoring script is executed with the root permissions.

  • Grant execution permissions for a LAN monitoring script.

  • In a LAN monitoring script, specify any environment variable that is required.

  • Create a LAN monitoring script in such manner that it terminates with the EXIT code shown in the table below according to the status of the monitored LAN.

    Table 6‒21: EXIT code returned according to the LAN status

    Status of monitored LAN

    EXIT code

    Description

    Normal

    0

    If the script determines that the LAN can be used successfully, it returns 0 for the EXIT code. HA Monitor determines that the LAN is normal or has recovered.

    Failure

    1

    If the script determines that the LAN cannot be used, it returns 1 for the EXIT code. HA Monitor determines that a failure has occurred on the LAN. If the corresponding LAN interface is specified in the switchbyfail operand in the server environment definition, HA Monitor switches the active server.

    Not monitorable

    Value other than 0 or 1

    If the script cannot identify the LAN status, it returns a value other than 0 or 1 for the EXIT code. Because the LAN cannot be monitored, HA Monitor will not switch the active server even if the corresponding LAN interface name is specified in the switchbyfail operand in the server environment definition.

  • When HA Monitor terminates normally or when the function for controlling hot standby based on the availability of LAN communications is used and a host failure is detected, SIGTERM is sent to the executing LAN monitoring script.