Hitachi

For Linux(R) (x86) Systems HA Monitor Cluster Software


4.1.5 Processing flow for order control of server switchover

If you organize multiple servers in a group, you can control the order in which the servers in the group are switched over. This subsection explains planned hot standby, the processing flow for controlling the order in which servers are switched over in the event of a server failure and in the event of a host failure, and the HA Monitor processing that occurs if the hot standby operation fails.

Organization of this subsection

(1) Planned hot standby and the processing flow for controlling the switchover order after a server failure in the monitor mode

The figure below shows the order in which the servers are terminated during planned hot standby and the order in which the servers are started on the target host. For the parent-child relationships among the servers shown in this figure, see (1) Correspondence between server start order and parent-child relationships.

Figure 4‒6: Processing flow for controlling the order in which servers are switched over during a planned hot standby operation

[Figure]

The following are details of the processing flow shown in the figure. The numbers correspond to the numbers in the figure.

  1. Active servers 1 to 4 have started in the active system and standby servers 1 to 4 have started in the standby system. All of them are set up for grouped-system switchover.

  2. The server hot-standby switchover command (monswap command) is executed to start planned hot-standby switchover.

  3. HA Monitor starts termination processing on servers 3 and 4 that are at the bottom of the parent-child relationship chain. Servers 1 and 2 are placed in termination wait state.

    If you use the server and host status display command (monshow command), ONL is displayed as the status of servers 1 and 2. ONL means that active server startup processing has been completed, but the servers themselves are actually in termination wait state. In other words, the status displayed by the command is not the same as the actual server status.

  4. HA Monitor performs termination processing on servers 3 and 4 in parallel. During this termination processing, HA Monitor first terminates the servers, and then disconnects the shared resources. When the shared resources have been disconnected, HA Monitor notifies the standby system to prepare for hot standby. Next, servers 3 and 4 are placed on wait state until the required server is started because servers 3 and 4 must wait for completion of the hot standby operation for server 2, which is their parent server.

    If you use the server and host status display command (monshow command), ONL is displayed as the status of servers 3 and 4. ONL means that the active servers are running, but the servers themselves are actually on wait state until the required server has started. In other words, the status displayed by the command is not the same as the actual server status.

  5. When termination processing is completed on both servers 3 and 4, HA Monitor starts termination processing on server 2, which is their parent server.

  6. When termination processing is completed on server 2, HA Monitor starts termination processing on server 1, which is server 2's parent server.

  7. When termination processing is completed on the highest-level server (server 1), HA Monitor notifies the standby system and then starts the hot standby operation first on server 1.

  8. When server 1 has started as the active server pursuant to the hot standby operation, HA Monitor starts the hot standby operation on its child server (server 2).

  9. When the hot standby operation has been completed on server 2, HA Monitor starts the hot standby operation on its child servers (servers 3 and 4) in parallel. When the hot standby operation has been completed on servers 3 and 4, grouped-system switchover is complete.

(2) Processing flow for controlling switchover after a server failure in the server mode

This subsection explains the order in which the servers are terminated after a server failure and the order in which the servers are started on the target host for the following two cases:

For the parent-child relationships among the servers shown in this figure, see (1) Correspondence between server start order and parent-child relationships.

(a) For a failure on a server that has both parent and child servers

The following figure shows the processing flow for controlling the switchover order after a failure on a server that has both parent and child servers.

Figure 4‒7: Processing flow for controlling the switchover order after a server failure (when the failure has occurred on a server that has both parent and child servers)

[Figure]

The following explains the processing flow shown in the figure, where the item numbers correspond to the numbered sections in the figure:

  1. Active servers 1 to 4 have started in the active system and standby servers 1 to 4 have started in the standby system. All of them are set up for grouped-system switchover.

  2. If a failure occurs on active server 2, a grouped-system switchover starts.

  3. HA Monitor starts termination processing on servers 3 and 4 that are at the bottom of the parent-child relationship chain when active server 2 resulted in the failure. Server 1 is placed in termination wait state.

    If you use the server and host status display command (monshow command), ONL is displayed as the status of server 1. ONL means that active server startup processing has been completed, but the server itself is actually in termination wait state. In other words, the status displayed by the command is not the same as the actual server status.

  4. HA Monitor performs termination processing on servers 2 through 4. During this termination processing, HA Monitor first terminates the servers, and then disconnects the shared resources. When the shared resources have been disconnected, HA Monitor notifies the standby system that that it can prepare for hot standby. Next, the servers 2 through 4 are placed on wait state until the required server has started.

    If you use the server and host status display command (monshow command), ONL is displayed as the status of servers 2 through 4. ONL means that the active servers are running, but the servers themselves are actually on wait state until the required server has started. In other words, the status displayed by the command is not the same as the actual server status.

  5. When termination processing is completed on servers 2 through 4, HA Monitor starts termination processing on server 1, which is the highest-level server.

  6. When termination processing is completed on server 1, HA Monitor notifies the standby system and then starts the hot standby operation first on server 1

  7. When the hot standby operation is completed on server 1, HA Monitor starts the hot standby operation on its child server (server 2).

  8. When the hot standby operation is completed on server 2, HA Monitor starts the hot standby operation on its child servers (servers 3 and 4) in parallel. When the hot standby operation is completed on servers 3 and 4, grouped-system switchover is complete.

(b) For a failure on the lowest-level server in a chain of servers with parent-child relationships

The following figure shows the processing flow for controlling the switchover order after a failure on the lowest-level server in a chain of servers with parent-child relationships.

Figure 4‒8: Processing flow for controlling the switchover order after a server failure (when the failure has occurred on the lowest-level server in a chain of servers with parent-child relationships)

[Figure]

The following explains the processing flow shown in the figure, where the item numbers correspond to the numbered sections in the figure:

  1. Active servers 1 to 4 have started in the active system and standby servers 1 to 4 have started in the standby system. All of them are set up for grouped-system switchover.

  2. If a failure occurs on active server 3, grouped-system switchover starts.

  3. HA Monitor starts termination processing on servers 3 and 4 that are at the bottom of the parent-child relationship chain. Servers 1 and 2 are placed in termination wait state.

    If you use the server and host status display command (monshow command), ONL is displayed as the status of servers 1 and 2. ONL means that active server startup processing is completed, but the servers themselves are actually in termination wait state. In other words, the status displayed by the command is not the same as the actual server status.

  4. HA Monitor performs termination processing on servers 3 and 4 in parallel. During this termination processing, HA Monitor first terminates the servers, and then disconnects the shared resources. When the shared resources have been disconnected, HA Monitor notifies the standby system so that it can prepare for hot standby. Next, servers 3 and 4 are placed on wait state until the required server is started because servers 3 and 4 must wait for completion of the hot standby operation for server 2, which is their parent server.

    If you use the server and host status display command (monshow command), ONL is displayed as the status of servers 3 and 4. ONL means that the active servers are running, but the servers themselves are actually on wait state until the required server has started. In other words, the status displayed by the command is not the same as the actual server status.

  5. When termination processing is completed on both servers 3 and 4, HA Monitor starts termination processing on server 2, which is their parent server.

  6. When termination processing is completed on server 2, HA Monitor starts termination processing on server 1, which is its parent server.

  7. When termination processing is completed on the highest-level server (server 1), HA Monitor notifies the standby system, and then starts the hot standby operation first on server 1.

  8. When the hot standby operation has been completed on server 1, HA Monitor starts the hot standby operation on its child server (server 2).

  9. When the hot standby operation has been completed on server 2, HA Monitor starts the hot standby operation on its child servers (servers 3 and 4) in parallel. When the hot standby operation has been completed on servers 3 and 4, grouped-system switchover is complete.

(3) Processing flow for controlling the switchover order after a host failure

The figure below shows the order in which the servers are terminated after a host failure and the order in which the servers are started on the target host. For the parent-child relationships among the servers shown in this figure, see (1) Correspondence between server start order and parent-child relationships.

Figure 4‒9: Processing flow for controlling the switchover order after a host failure

[Figure]

The following explains the processing flow shown in the figure, where the item numbers correspond to the numbered sections in the figure:

  1. Active servers 1 to 4 have started in the active system and standby servers 1 to 4 have started in the standby system. All of them are set up for grouped-system switchover.

  2. If a failure occurs on an active server, HA Monitor resets the active system and starts the hot standby operation.

  3. HA Monitor starts the hot standby operation on server 1, the highest-level server in the parent-child relationships.

  4. When the hot standby operation has been completed on server 1, HA Monitor starts the hot standby operation on its child server (server 2).

  5. When the hot standby operation has been completed on server 2, HA Monitor starts the hot standby operation on its child servers (servers 3 and 4) in parallel. When the hot standby operation has been completed on servers 3 and 4, grouped-system switchover is complete.

(4) Processing when the hot standby operation fails

If the hot standby operation fails on a parent server, all of its child servers result in a hot standby error, in which case the standby servers of the corresponding child servers are terminated forcibly. If there is no parent server at the switchover destination, HA Monitor resumes the hot standby operation.

If the start retry function is used during hot standby processing, the parent server's hot standby processing fails, the parent server is placed in the start retry state, and the child server is placed in the wait state until the required server is started. If the parent server is placed in the start retry state due to a failure after hot standby processing has been completed, the child server does not retry startup processing.

(5) About servers on the source host

(6) Notes on performing order control for TP1/Server Base and HiRDB

If all of the following conditions are met, specify the settings so that TP1/Server Base and HiRDB do not automatically restart when they terminate abnormally:

Note

Be careful if switch is specified for the switchtype operand in the server environment definition on a server of a server group for which order control is enabled. In this case, to prevent TP1/Server Base or HiRDB from being automatically restarted when they terminate abnormally, specify MANUAL2 for the following operand:

  • In TP1/Server Base: mode_conf operand in the system environment definition

  • For HiRDB: pd_mode_conf operand in the system common definition

For details about these operands, see the manual OpenTP1 System Definition or the manual HiRDB System Definition.