OpenTP1 Version 7 Description

[Contents][Glossary][Index][Back][Next]

3.2.3 Node management in OpenTP1

Multiple instances of OpenTP1 communicate with one another using RPCs implemented through TCP/IP. TCP/IP establishes a connection between the servers, enabling them to communicate.

If communication fails due to a network error, OpenTP1 instances cannot detect the loss of connection. For this reason, any RPCs issued after a network error may fail. The following describes OpenTP1 facilities for preventing RPC errors after a connection failure.

Organization of this subsection
(1) Startup notification facility
(2) Node monitoring facility
(3) Facility for monitoring nodes registered in the RPC suppression list
(4) Node information display

(1) Startup notification facility

When OpenTP1 starts, startup on the local node is reported to the name service of the OpenTP1 instances running on another node, and the connection already established is forcibly closed. This functionality can be used at system switchover, for example.

To enable OpenTP1 startup to be notified to another node, specify Y in the name_notify operand in the system common definition on both the sending and receiving nodes.

OpenTP1 on both nodes must be version 05-02 or later to use this functionality.

Figure 3-22 shows an example of a system configuration when using the startup notification facility at system switchover.

Figure 3-22 Example of using the startup notification facility at system switchover

[Figure]

  1. OpenTP1-B goes down due to a server failure or other error, and a system switchover occurs. OpenTP1-A cannot detect the failure in OpenTP1-B, so the connection remains open.
  2. The systems are switched, and OpenTP1-C starts on the standby node.
  3. If the startup notification facility is enabled, notification that OpenTP1-C has started is sent to OpenTP1-A.
  4. OpenTP1-A forcibly closes its connection to OpenTP1-B.

Because communication among OpenTP1-A, OpenTP1-B, and OpenTP1-C resumes in this way from the establishment of a new connection, processing continues without any communication errors.

If startup fails to be notified to OpenTP1-A for any reason, message KFCA00642-W is output on OpenTP1-C. In this case, you must execute the namunavl command on OpenTP1-A. By specifying the -l option in the namunavl command, you can find out which nodes could not be notified that OpenTP1-C had started.

Note
The startup notification facility cannot be used when multiple instances of OpenTP1 are running on a monitored host, or when multiple instances of OpenTP1 run with the same IP address after a system switchover (an environment with only one LAN board).

(2) Node monitoring facility

The node monitoring facility polls nodes at regular intervals and detects communication failures.

Using the node monitoring facility, you can monitor the status of OpenTP1 on nodes specified in the all_node operand and all_node_ex operand in the system common definition. If an OpenTP1 node cannot be detected as active, this facility deletes all cached service information relating to the node and closes the connection.

Node monitoring minimizes errors because failures are detected and failed nodes are forcibly disconnected.

Figure 3-23 shows an example of monitoring other nodes by using the node monitoring facility.

Figure 3-23 Monitoring other nodes by using the node monitoring facility

[Figure]

The node monitoring facility at OpenTP1-A periodically polls OpenTP1-B, OpenTP1-C, and OpenTP1-D.

  1. If the OpenTP1-C node goes down, the node monitoring facility detects that OpenTP1-C cannot be reached.
  2. OpenTP1-C is disconnected and message KFCA00650-I is output.
  3. The failed node is registered in the RPC suppression list#. Service information about the node is deleted from the cached service information.
    #
    An RPC suppression list contains information about nodes on which the OpenTP1 system is inactive.

The node monitoring facility checks whether nodes are active at the intervals specified in the name_audit_interval operand of the name service definition. To use the node monitoring facility, specify 1 or 2 in the name_audit_conf operand of the name service definition.

The node monitoring facility behaves as follows according to the value specified in the name_audit_conf operand:

While the node monitoring facility is enabled, the following action occurs according to the status of the monitored node.

The namalivechk command is another way of checking whether nodes are active. Table 3-4 describes the differences between using the node monitoring facility and the namalivechk command.

Table 3-4 Comparison of node monitoring using the node monitoring facility and the namalivechk command

Item Node monitoring facility namalivechk command
Monitored nodes
  • All nodes specified in the all_node operand of the system common definition (whether active or not)
  • All nodes specified in the all_node_ex operand of the system common definition (whether active or not)

  • Nodes specified in the all_node operand of the system common definition, on which OpenTP1 has not been detected as inactive
  • All nodes specified in the all_node_ex operand of the system common definition (whether active or not)
Operation when an inactive node is detected
  • If the node is specified in the all_node operand, information about the node is entered in the RPC suppression list if not already entered. If the node has already been entered, no action is taken.
  • The connection with the inactive node is closed.
  • Cached service information about the inactive node is deleted.

  • Information about any node specified in the all_node operand that is found to be inactive is entered in the RPC suppression list.
  • The connection with the inactive node is closed.
  • Cached service information about the inactive node is deleted.
Operation when an active node is detected If the node is specified in the all_node operand and has been entered in the RPC suppression list, information about the node is deleted from the list. No action

Note
  • The node monitoring facility cannot be used when multiple instances of OpenTP1 are running on a monitored host, or when multiple instances of OpenTP1 run with the same IP address after a system switchover (an environment with only one LAN board).
  • Change the following operands to tune the sensitivity with which a node-down condition is detected by the node monitoring facility:
    If 1 is specified in the name_audit_conf operand:
    Change the ipc_conn_interval operand in the system common definition.
    If 2 is specified in the name_audit_conf operand:
    Change the name_audit_watch_timeg
  • A maximum of 60 nodes can be monitored concurrently by the node monitoring facility. If more than 60 nodes are specified in the all_node and all_node_ex operands in the system common definition, monitoring is performed for each group of 60 or less in turn.
  • If a large number of nodes are specified in the all_node and all_node_ex operands in the system common definition, use of the node monitoring facility may affect the RPCs issued by UAPs. In this case, the value specified in the name_audit_interval operand should not be too small. Also, do not use the namalivechk command too soon after the previous execution.

(3) Facility for monitoring nodes registered in the RPC suppression list

The name service can check at 180-second intervals whether nodes registered in the RPC suppression list are active again. This facility is separate from the node monitoring facility. Specify whether to use this facility in the name_rpc_control_list operand of the name service definition.

Decide whether to use the facility for monitoring nodes registered in the RPC suppression list according to the setting for the node monitoring facility. For example, monitoring of nodes registered in the RPC suppression list should be disabled if either of the following occur:

If the facility for monitoring nodes registered in the RPC suppression list is disabled and a value of 180 seconds or longer is specified in the name_audit_interval operand, it takes longer than usual for a recovered node to be deleted from the RPC suppression list.

We recommend that you set the node monitoring facility and the facility for monitoring nodes registered in the RPC suppression list as follows:

(4) Node information display

By executing the namsvinf command, you can view the IP address, activity status, and the port number of the name service for OpenTP1 nodes. This information can be displayed for OpenTP1 nodes specified in the all_node operand and all_node_ex operand in the system common definition.