Nonstop Database, HiRDB Version 9 System Operation Guide
If an error occurs in HiRDB while it is performing business processing, the system switches over to another unit and an active back-end server takes over the business processing. This is called the standby-less system switchover facility. Unlike with the standby system switchover facility, you do not need to prepare a standby system unit for the standby-less system switchover facility.
The standby-less system switchover facility consists of the following two facilities:
A standby-less system switchover facility is applicable to back-end server units of a HiRDB parallel server configuration. It cannot be applied to a unit that contains any server other than back-end servers.
The standby-less system switchover (1:1) facility switches from a unit in which an error has occurred to a pre-designated back-end server unit, which assumes the processing. That is, there is a one-to-one relationship between the original unit and the unit to which processing is switched in the event of an error.
A back-end server that releases a process when an error occurs is called a normal BES, and a back-end server that takes over the process is called an alternate BES. Also, the unit of the normal BESs is called the normal BES unit, and the unit of the alternate BESs is called the alternate BES unit. The following figure provides an overview of the standby-less system switchover (1:1) facility.
Figure 26-3 Overview of the standby-less system switchover (1:1) facility
To use the standby-less system switchover (1:1) facility, all of the following conditions must be satisfied:
The standby-less system switchover (1:1) facility provides the following advantages over the standby system switchover facility:
The following table lists the resources that are needed when a standby system unit is standing by, and after a system switchover is performed.
Table 26-5 Resources needed when a standby system unit is standing by and after a system switchover is performed
Item | HiRDB system server processes | HiRDB server processes | Shared memory for unit controller | Shared memory for lock pool | Shared memory for global buffers | |
---|---|---|---|---|---|---|
Standby-less system switchover (1:1) facility | Yes#1 | --#2, #3 | Yes#4 | Yes | --#5 | |
Standby-less system switchover (effects distributed) facility | #6, #7 | --#3, #8 | Yes#9 | Yes | --#10 | |
Standby system switchover facility | User server hot standby | No | Yes | No | No | No |
Rapid system switchover facility | Yes | Yes | Yes | Yes | Yes | |
Other | No | No | No | No | No |
For details about a back-end server's resource usage status when the standby-less system switchover (effects distributed) facility is used, see 26.1.3(2) Standby-less system switchover (effects distributed) facility.
The following explains the rules for defining normal BES units and alternate BES units:
Figure 26-4 Examples of valid configurations of a normal BES unit and an alternate BES unit shows examples of valid configurations of a normal BES unit and an alternate BES unit. Figure 26-5 Examples of invalid configurations of a normal BES unit and an alternate BES unit shows examples of invalid configurations.
Figure 26-4 Examples of valid configurations of a normal BES unit and an alternate BES unit
An alternate BES is defined with the -c option in the pdstart operand. The following examples show the pdstart operand specification for Examples 1 and 2 in Figure 26-4 Examples of valid configurations of a normal BES unit and an alternate BES unit.
Example 1
pdstart -t BES -s bes11 -u UNT1 -c bes21 pdstart -t BES -s bes21 -u UNT2 |
Example 2
pdstart -t BES -s bes11 -u UNT1 -c bes21 pdstart -t BES -s bes12 -u UNT1 -c bes22 pdstart -t BES -s bes21 -u UNT2 pdstart -t BES -s bes22 -u UNT2 |
Figure 26-5 Examples of invalid configurations of a normal BES unit and an alternate BES unit
When an error occurs, the standby-less system switchover (effects distributed) facility distributes processing requests intended for the back-end servers in the unit where the error occurred to multiple running system units, where these processing requests can be executed. The standby-less system switchover (effects distributed) facility does not require standby server machines or standby system units, and thus uses system resources more efficiently. After an error occurs, the processing workload increases at each unit that assumes server processing for the failed node. As a result, transaction-processing performance might be impacted negatively. However, because the processing requests intended for the servers in the failed unit are shared and executed by multiple units, the additional load per unit is kept low and degradation of system performance is minimized.
The standby-less system switchover (effects distributed) facility switches over back-end servers by distributing them, and the switchover destinations can be distributed among multiple units. Moreover, if an error occurs in a unit to which the original unit was switched, processing can be switched again to another running system unit, where it can continue. This is called a multi-step system switchover. Multi-step system switchovers cannot be performed in a system that uses the standby-less system switchover (1:1) facility. Therefore, if an error occurs in a unit to which processing was switched for the standby-less system switchover (1:1) facility, processing for the failed unit cannot be assumed and continued elsewhere.
The standby-less system switchover (effects distributed) facility is appropriate for a system whose resources must always be used efficiently, and in which performance degradation in the event of an error must be minimized.
In the standby-less system switchover (effects distributed) facility, a back-end server defined in the original unit is called a host BES, and a back-end server that is accepted by another unit is called a guest BES. The unit where the host BESs are defined is called the regular unit, and the unit where a guest BES is located is called the accepting unit. All accepting units must be defined as an HA group. The back-end server resources that correspond to a guest BES constitute a guest area.
The following figure provides an overview of the standby-less system switchover (effects distributed) facility (distributed workload transfer and multi-step system switchover).
Figure 26-6 Overview of the standby-less system switchover (effects distributed) facility (distributed workload transfer and multi-step system switchover)
To use the standby-less system switchover (effects distributed) facility, all of the following conditions must be satisfied:
The following table lists and describes the usage status of back-end server resources when the standby-less system switchover (effects distributed) facility is applied.
Table 26-6 Usage status of back-end server resources when the standby-less system switchover (effects distributed) facility is applied
Back-end server type | Back-end server status | Resource usage status |
---|---|---|
Host BES | Accepting status | An area of the size required by the back-end server's definition is created. |
Running | An area of the size required by the back-end server's definition is used. | |
Guest BES | Accepting status | For each resource, a guest area of the largest resource size is created in the guest server. |
Running | Within the prepared guest area, an area that matches the size required by the back-end server's definition is used. |
When the standby-less system switchover (effects distributed) facility is used and an error occurs in a regular unit, that unit's primary BESs are moved automatically to various accepting units, where they perform their processing as guest BESs. If a BES at the unit where the error occurs is itself a guest BES, it also is moved automatically to an accepting unit, where it continues to perform processing as a guest BES. As is the case with the standby system switchover facility, no intervention is required from the HiRDB administrator.
The following table lists the various types of errors that can occur, and whether a system switchover occurs when the standby-less system switchover (effects distributed) facility is used.
Table 26-7 System switchover depending on the cause of the error when the standby-less system switchover (effects distributed) facility is used
Unit or server status | Starting or Stopping | Running | |
---|---|---|---|
Starting or Stopping | Running | ||
Slow-down detected | Not applicable | Unit terminates abnormally. System switchover occurs. |
Unit terminates abnormally. System switchover occurs. |
System log full | Not applicable | Unit terminates abnormally. System switchover does not occur. |
Unit terminates abnormally. System switchover does not occur. |
Database path error | Not applicable | Unit terminates abnormally. System switchover occurs (only the first time). |
Unit terminates abnormally. System switchover occurs (only the first time). |
Back-end server terminated forcibly | Back-end server terminates abnormally. System switchover does not occur. |
Back-end server terminates abnormally. System switchover does not occur. |
Back-end server terminates abnormally. System switchover does not occur. |
System terminated forcibly | Unit terminates abnormally. System switchover does not occur. |
Unit terminates abnormally. System switchover does not occur. |
Unit terminates abnormally. System switchover does not occur. |
System failure | Unit terminates abnormally. System switchover does not occur. |
Unit terminates abnormally. System switchover does not occur. |
Unit terminates abnormally. System switchover occurs. |
In the event of a system switchover, the host BESs and any guest BESs that are running in the unit are switched over to other units. The back-end servers might be switched to different destinations.
The standby-less system switchover (effects distributed) facility also switches systems automatically when multiple errors occur. If an error occurs in an accepting unit after an error had occurred in a regular unit, the back-end servers of the primary system and the guest BESs running in the failed accepting unit move to remaining running system units and perform their processing as guest BESs. In this case, no intervention is required from the HiRDB administrator. The move destination of each back-end server is determined by the HA Monitor definition (the cluster software definition when Hitachi HA Toolkit Extension is used).
When a unit runs out of its free guest area, the standby-less system switchover (effects distributed) facility cancels the accepting status of all guest BESs that are not running. The acceptability of a guest area is not affected by the operation of the host BES. The table below shows automatic cancellation and resetting of acceptability depending on the free space in the guest area.
When acceptability is reset automatically, all servers that are acting as running systems in other units within the HA group enter accepting status. During this process, even those back-end servers whose acceptability was stopped intentionally by entry of a command (monsbystp or pdstop -q -s back-end-server-name) also become accepting. If the number of BESs that can be accepted within an HA group is exceeded, resulting in reduced-mode operation, any server that is stopped is not returned to accepting status.
Table 26-8 Automatic cancellation and resetting of acceptability depending on the free space in the guest area
Unused guest area in the unit | Guest BES acceptability | |
---|---|---|
Guest BESs active in other units | Guest BESs inactive in other units | |
Disappeared | Canceled automatically | No change (cancellation underway) |
Generated | Reset automatically | No change |
Figure 26-7 Example of a system switchover during normal operation
Figure 26-8 Example of a system switchover at a host that has accepted guest BESs
Figure 26-9 Example of a system switchover when a series of errors occurs
Figure 26-10 Example of a system switchover when a series of errors occurs but the number of BESs that can be accepted is insufficient
Figure 26-11 Example of the action to take when multiple errors occur while the number of BESs that can be accepted is insufficient
Figure 26-12 Example of how to avoid a shortage in the number of BESs that can be accepted
Figure 26-13 Example in which a system switchover cannot be executed when a series of errors occurs
All Rights Reserved. Copyright (C) 2011, 2015, Hitachi, Ltd.