There are two types of system switchover. One is standby system switchover; the standby system switchover facility was discussed above. The other is standby-less system switchover, which consists of two facilities:
A standby-less system switchover facility can be applied only to a HiRDB/Parallel Server's back-end servers; it cannot be applied to a unit that contains servers other than back-end servers.
In contrast to the standby system switchover facility, a standby-less system switchover facility does not require that standby system units be prepared. When an error occurs, instead of switching over to a standby system unit, the system is switched over to another unit on the running system so that the work processing is taken over by an active back-end server. This is the function of the standby-less system switchover facilities.
The standby-less system switchover (1:1) facility switches from a unit in which an error has occurred to a pre-designated back-end server unit that assumes the processing; i.e., there is a one-to-one relationship between the original unit and the unit to which processing is switched in the event of an error.
A back-end server that releases a process when an error occurs is called a normal BES, and a back-end server that takes over the process is called an alternate BES. Also, the unit of the normal BESs is called the normal BES unit, and the unit of the alternate BESs is called the alternate BES unit. Figure 25-2 provides an overview of the standby-less system switchover (1:1) facility.
Figure 25-2 Overview of the standby-less system switchover (1:1) facility
All the following conditions must be satisfied to use the standby-less system switchover (1:1) facility:
The standby-less system switchover (1:1) facility provides the following advantages over the standby system switchover facility:
Table 25-1 lists the resources that are needed when a standby system unit is standing by and after system switchover is performed.
Table 25-1 Resources needed when a standby unit is standing by and after system switchover is performed
Item | HiRDB system server processes | HiRDB server processes | Shared memory for unit controller | Shared memory for lock pool | Shared memory for global buffer | |
---|---|---|---|---|---|---|
Standby-less system switchover (1:1) facility | Yes1 | ![]() | Yes4 | Yes | ![]() | |
Standby-less system switchover (effects distributed) facility | ![]() | ![]() | Yes9 | Yes | ![]() | |
Standby system switchover facility | User server host standby | No | Yes | No | No | No |
Rapid system switchover facility | Yes | Yes | Yes | Yes | Yes | |
All others | No | No | No | No | No |
1 Some processes of system server processing generate processes while they are standing by. Because other system servers share system server processes of the alternate BES unit, no resources are needed specifically for the alternate portion.
2 The maximum number of back-end server processes is the value for pd_max_bes_process of the alternate BES. This value is the sum of the alternating processes and the non-alternating processes. Therefore, only a limited number of users may be able to connect after a system switchover.
3 If the value of pd_process_count (the number of resident processes) and the number of back-end server processes already activated when system switchover was performed is less than the value of pd_max_bes_process, additional back-end server processes can be activated. Be sure to set the OS's operating system parameters so there will be enough processes, virtual memory, ports, etc., for the operating system after system switchover is performed. Note also that activating additional back-end server processes may cause a temporary drop in performance after system switchover has been performed.
4 Shared memory of the alternate portion is secured when the alternate BES unit starts.
5 The global buffers used by alternate BESs are shared when alternating processes. Therefore, these buffer are not secured after system switchover occurs. For details about allocation of global buffers during alternating, see 25.5.7 Definition of global buffers (standby-less system switchover (1:1) facility only).
6 Because system server processes are shared on a unit-by-unit basis with the accepting units, no resources are required exclusively for the guest areas.
7 A system server process for a back-end server generates a process when it becomes the running system.
8 The maximum permissible number of HiRDB server (back-end server) processes in a unit after system switchover can normally be defined as the combined total of the number of processes for each back-end server and the number of processes for the guests (pd_ha_max_server_process).
9 When an accepting unit is started, shared memory is allocated for the guest areas.
10 Shared when the global buffer normally used by the back-end server is shared with the accepting unit. Therefore, it is not allocated after system switchover. For details about sharing a global buffer, see 25.5.8 Definition of global buffers (standby-less system switchover (effects distributed) facility only).
For details about a back-end server's resource usage status when the standby-less system switchover (effects distributed) facility is used, see 25.1.2(2) Standby-less system switchover (effects distributed) facility.
The rules for defining normal BES units and alternate BES units are explained below.
Figure 25-3 shows examples of valid configurations of a normal BES unit and alternate BES unit. Figure 25-4 shows examples of invalid configurations.
Figure 25-3 Examples of valid configurations of a normal BES unit and an alternate BES unit
An alternate BES is defined with the -c option in the pdstart operand. Example specifications of the pdstart operand are shown in Examples 1 and 2 below.
Example 1
pdstart -t BES -s bes11 -u UNT1 -c bes21 |
Example 2
pdstart -t BES -s bes11 -u UNT1 -c bes21 |
Figure 25-4 Examples of invalid configurations of a normal BES unit and an alternate BES unit
When an error occurs, the standby-less system switchover (effects distributed) facility distributes processing requests intended for the back-end servers in the unit where the error occurred to multiple running units, where these processing requests can be executed. The standby-less system switchover (effects distributed) facility does not require standby server machines or standby units, and thus uses system resources more efficiently. After an error occurs, the processing workload increases at each unit that assumes server processing for the failed node; as a result, transaction-processing performance may be impacted negatively. However, because the processing requests intended for the servers in the failed unit are shared and executed by multiple units, the additional load per unit is kept low and degradation of system performance is minimized.
The standby-less system switchover (effects distributed) facility switches over back-end servers by distributing them, and the switchover destinations can be distributed among multiple units. Moreover, if an error occurs in a unit to which the original unit was switched, switching can be performed again to other running units, where processing can be continued; this is called multi-step system switchover. Multi-step system switchover cannot be performed in a system that uses the standby-less system switchover (1:1) facility; if an error occurs at a unit to which processing was switched in the case of the standby-less system switchover (1:1) facility, processing for the failed unit cannot be assumed and continued elsewhere.
The standby-less system switchover (effects distributed) facility is appropriate for a system whose resources must always be used efficiently and in which performance degradation because of an error must be minimized.
In the standby-less system switchover (effects distributed) facility, a back-end server defined in the original unit is called a host BES, and a back-end server that is accepted by another unit is called a guest BES. The unit where the host BESs are defined is called the regular unit, and the unit where a guest BES is located is called the accepting unit. All accepting units must be defined as an HA group. The back-end server resources that correspond to a guest BES constitute a guest area.
Figure 25-5 provides an overview of the standby-less system switchover (effects distributed) facility (distributed workload transfer and multi-step system switchover).
Figure 25-5 Overview of the standby-less system switchover (effects distributed) facility (distributed workload transfer and multi-step system switchover)
All the following conditions must be satisfied to use the standby-less system switchover (effects distributed) facility:
Table 25-2 shows the usage status of back-end server resources when the standby-less system switchover (effects distributed) facility is being applied.
Table 25-2 Usage status of back-end server resources when the standby-less system switchover (effects distributed) facility is applied
Back-end server type | Back-end server status | Resource usage status |
---|---|---|
Host BES | Accepting status | An area of the size required by the back-end server's definition is created. |
Running | An area of the size required by the back-end server's definition is used. | |
Guest BES | Accepting status | For each resource, a guest area of the largest resource size is created in the guest server. |
Running | Within the prepared guest area, an area that matches the size required by the back-end server's definition is used. |
When the standby-less system switchover (effects distributed) facility is used and an error occurs in a regular unit, that unit's primary BESs are moved automatically to various accepting units where they execute their processing as guest BESs. If a BES at the unit where the error occurs is itself a guest BES, it also is moved automatically to an accepting unit where it continues to execute processing as a guest BES. As is the case with the standby system switchover facility, no intervention is required from the HiRDB administrator.
Table 25-3 lists the various types of errors that can occur and whether or not system switchover occurs when standby-less system switchover (effects distributed) is used.
Table 25-3 System switchover depending on error cause when standby-less system switchover (effects distributed) facility is used
Unit's status | Starting or Stopping | Running | |
---|---|---|---|
Server's status | Starting or Stopping | Starting or Stopping | Running |
Slow-down detected | Not applicable | Unit terminates abnormally. System switchover occurs. | Unit terminates abnormally. System switchover occurs. |
System log full | Not applicable | Unit terminates abnormally. System switchover does not occur. | Unit terminates abnormally. System switchover does not occur. |
Database path error | Not applicable | Unit terminates abnormally. System switchover occurs (only the first time). | Unit terminates abnormally. System switchover occurs (only the first time). |
Back-end server terminated forcibly | Back-end server terminates abnormally. System switchover does not occur. | Back-end server terminates abnormally. System switchover does not occur. | Back-end server terminates abnormally. System switchover does not occur. |
System terminated forcibly | Unit terminates abnormally. System switchover does not occur. | Unit terminates abnormally. System switchover does not occur. | Unit terminates abnormally. System switchover does not occur. |
System failure | Unit terminates abnormally. System switchover does not occur. | Unit terminates abnormally. System switchover does not occur. | Unit terminates abnormally terminated System switchover occurs. |
In the event of system switchover, the host BESs and any guest BESs that are running in the unit are switched over to other units. The back-end servers may be switched to different destinations.
The standby-less system switchover (effects distributed) facility switches systems automatically when various types of errors occur. If an error occurs in an accepting unit after an error had occurred in a regular unit, the back-end servers of the primary system and the guest BESs running in the failed accepting unit move to remaining running units and execute their processing as guest BESs; no intervention is required from the HiRDB administrator. The move destination of each back-end server is determined by the HA monitor definition (cluster software definition when Hitachi HA Toolkit Extension is used).
When a unit runs out of its free guest area, the standby-less system switchover (effects distributed) facility cancels the accepting status of all guest BESs that are not running. The acceptability of a guest area is not affected by the operation of the host BES. Table 25-4 shows automatic cancellation and resetting of acceptability depending on the free space in the guest area.
When acceptability is reset automatically, all servers that are acting as running systems in other units within the HA group enter accepting status. During this process, even those back-end servers whose acceptability was stopped intentionally by entry of a command (monsbystp or pdstop -q -s back-end-server-name) also become accepting. If the number of BESs that can be accepted within an HA group is exceeded, resulting in reduced-mode operation, any server that is stopped is not returned to accepting status.
Table 25-4 Automatic cancellation and resetting of acceptability depending on the free space in the guest area
Unused guest area in the unit | Guest BES acceptability | |
---|---|---|
Guest BESs active in other units | Guest BESs inactive in other units | |
Disappeared | Cancelled automatically | No change (being cancelled) |
Generated | Reset automatically | No change |
Figure 25-6 Example of system switchover during normal operations
Figure 25-7 Example of system switchover at a host that has accepted guest BESs
Figure 25-8 Example of system switchover when a series of errors occurs
Figure 25-9 Example of system switchover when a series of errors occurs but the number of BESs that can be accepted is insufficient
Figure 25-10 Example of the action to take when an error occurs while the number of BESs that can be accepted is insufficient
Figure 25-11 Example of how to avoid a shortage in the number of BESs that can be accepted
Figure 25-12 Example in which system switchover cannot be executed when a series of errors occurs