Overview of the system switchover facility

If a failure occurs on the HiRDB that is actively processing jobs, job processing can be automatically switched over to the standby HiRDB. This ability is called the system switchover facility. Job processing is interrupted from the time the failure occurs to the time processing is switched over to the standby HiRDB. The system switchover facility is used to keep system downtime to a minimum when a failure occurs.

There are two types of system switchover facilities: the standby system switchover facility and the standby-less system switchover facility. Moreover, the system switchover facility can be operated in monitor mode or server mode, depending on the types of failures to be monitored. The following table shows the system switchover facilities and the modes in which they can be used.

Table 8-1 Types of system switchover facility and operation modes

System switchover facility type		Operation method
System switchover facility type		Monitor mode	Server mode
Standby system switchover facility	Normal system switchover	Y	Y
	User server hot standby	N	Y
	Rapid system switchover	N	Y
Standby-less system switchover facility^#	Standby-less system switchover (1:1) facility	N	Y
Standby-less system switchover facility^#	Standby-less system switchover (effects distributed) facility	N	Y

Legend:: Y: Can be used.; N: Cannot be used.

#: To use the standby-less system switchover facility, you must have HiRDB Advanced High Availability.

Organization of this subsection: (1) Overview of the standby system switchover facility; (2) Overview of the standby-less system switchover facility; (3) Cluster software supported by HiRDB; (4) Shared disk unit

(1) Overview of the standby system switchover facility

By deploying a standby HiRDB that is separate from the HiRDB actively processing jobs, if a failure occurs on the HiRDB that is actively processing jobs, job processing is automatically switched over to the standby HiRDB. This ability is called the standby system switchover facility.

To implement the standby system switchover facility, you need a cluster system configuration that uses multiple server machines. For a HiRDB single server configuration, the system is switched over on a per-system basis. However, note that system switchover cannot be performed for utility special units (UNIX edition only). For a HiRDB parallel server configuration, the system is switched over on a per-unit basis. Note that using the user server hot standby or rapid system switchover facility can shorten the time required for system switchover. For details about the user server hot standby or rapid system switchover facility, see Reducing system switchover time (user server hot standby, rapid system switchover facility) in the HiRDB Version 9 System Operation Guide.

The system on which jobs are currently being processed is called the running system, and the system that is currently in reserve is called the standby system. Whenever a system switchover occurs, the running system and the standby system are swapped. In addition, to distinguish between the two systems while you are building them and configuring the environments, the system that is initially started as the running system is called the primary system, and the system that is first started as the standby system is called the secondary system. Although the running and the standby systems change whenever a system switchover occurs, the primary system and secondary system do not. The following figure provides an overview of the system switchover facility (standby system switchover facility).

Figure 8-1 Overview of the system switchover facility (standby system switchover facility)

[Figure]

#1: In this manual, the product that performs the system switchover is referred to as cluster software. For details about the cluster software products supported by HiRDB, see (3) Cluster software supported by HiRDB below.

#2: For details about the shared disk unit, see (4) Shared disk unit below.

Explanation: If a failure occurs on the running system while it is processing jobs, the failure is reported to the standby system, and the system is switched over. The standby system becomes the running system and resumes job processing.

(2) Overview of the standby-less system switchover facility

If a failure occurs on the HiRDB that is actively processing jobs, the system is switched over to another unit whose currently running back-end server takes over the processing. This is called the standby-less system switchover facility. In contrast with the standby system switchover facility, with the standby-less system switchover facility you do not have to allocate a standby unit.

The standby-less system switchover facility is further classified as follows:

Standby-less system switchover (1:1) facility
Standby-less system switchover (effects distributed) facility

The standby-less system switchover facility can be used in a back-end server unit of a HiRDB parallel server configuration. It cannot be used in a unit in which a server other than a back-end server resides.

(a) Standby-less system switchover (1:1) facility

With the standby-less system switchover (1:1) facility, there is a one-to-one relationship between the unit on which the failure occurs and the unit to whose back-end server processing is switched.

A back-end server whose processing is transferred to another unit when a failure occurs is called a normal BES, and a back-end server that takes over processing is called an alternate BES. Similarly, the unit containing the normal BES is called the normal BES unit, and the unit containing the alternate BES is called the alternate BES unit. The following figure provides an overview of the standby-less system switchover (1:1) facility.

Figure 8-2 Overview of the standby-less system switchover (1:1) facility

[Figure]

Explanation

Normally, both BES1 and BES2 perform processing.
If a failure occurs on the normal BES unit (UNT1), the system switches over, and processing is taken over by the alternate BES. The area in which processing is taken over is called the alternate portion and, when the alternate portion is performing processing, it is said to be alternating.
After the failure is resolved and the normal BES unit is started, the processing taken over by the alternate BES is switched over to the normal BES, and returned to normal status. This is called reactivating the system.

Remarks

Using the concepts of the primary and other systems in the standby system switchover facility, consider the following with respect to the standby-less system switchover (1:1) facility:

Think of the normal BES unit as the primary system, and the alternate BES unit as the secondary system.
Under normal conditions, think of the normal BES unit as the running system, and the alternate portion as the standby system. During alternating, think of the alternate portion as the running system, and the normal BES unit as the standby system.

Prerequisites

To use the standby-less system switchover (1:1) facility, all of the following must be satisfied:

HiRDB Advanced High Availability is installed.
Hitachi HA Toolkit Extension is installed (not required if HA Monitor is the cluster software).
The system switchover facility is running in server mode.

Advantages of the standby-less system switchover facility

The following describes the advantages of the standby-less system switchover facility over the standby system switchover facility:

You do not need to set aside a standby system unit, which means you can use system resources more efficiently. However, remember that the load increases on the back-end server that takes over processing when the system is switched over, which might adversely affect processing performance.
The server processes are already running, which allows you to reduce system switchover time so that it is about the same as when the rapid system switchover facility is used. For details about the rapid system switchover facility, see 8.1.5 Functions that reduce system switchover time (user server hot standby and the rapid system switchover facility).

(b) Standby-less system switchover (effects distributed) facility

When a failure occurs, processing requests directed to back-end servers in the failed unit can be distributed to and executed in multiple active units. This is called the standby-less system switchover (effects distributed) facility. This facility enables you to use your system resources more efficiently, without having to allocate a standby server machine or a standby unit. Of course, there might be adverse effects on transaction processing performance because of the increased processing load on the units that have taken over processing for the servers in the failed unit. However, because the processing requests directed to the failed servers are distributed to and executed in a number of units, the workload increase for each unit is minimized, reducing overall degradation of system performance.

The standby-less system switchover (effects distributed) facility distributes the workload to and switches over among multiple back-end servers. The workload can also be distributed among multiple units. If another failure occurs, this time on a unit that was a switchover destination from the previous error, processing can be continued by again switching to a running unit (this is called multi-stage system switchover). Multi-stage system switchover cannot be performed with the standby-less system switchover (1:1) facility, so if a failure occurs at a switchover destination under that facility, processing for the failed unit cannot be continued.

It is appropriate to use the standby-less system switchover (effects distributed) facility in a system in which system resources must always be used at high efficiency and for which degradation of system performance must be minimized.

With the standby-less system switchover (effects distributed) facility, a back-end server that relinquishes processing when a failure occurs is called a host BES, and a back-end server that takes over processing is called a guest BES. The unit containing the host BESs is called the regular unit, and a unit containing a guest BES is called an accepting unit. All accepting units must be pre-defined as an HA group. The resources for back-end servers associated with guest BESs are called guest areas.

The following figure provides an overview of the standby-less system switchover (effects distributed) facility (with distribution alternates and multi-stage system switchover).

Figure 8-3 Overview of standby-less system switchover (effects distributed) facility (with distribution alternates and multi-stage system switchover)

[Figure]

Prerequisites

To use the standby-less system switchover (effects distributed) facility, the following conditions must be satisfied:

HiRDB Advanced High Availability is installed
The standby-less system switchover (effects distributed) facility can switch only to a unit dedicated to back-end servers (a unit that consists only of back-end servers).
A unit that uses the standby-less system switchover (effects distributed) facility must consist of one or more back-end servers for the primary system. It cannot be used as a dedicated accepting unit.

Cluster software supported by HiRDB	OS
Cluster software supported by HiRDB	HP-UX	Solaris	AIX	Linux	Windows
HA Monitor	Y	N	Y	Y	N
MC/ServiceGuard	Y	N	N	N	N
VERITAS Cluster Server	N	Y	N	N	N
Sun Cluster	N	Y	N	N	N
HACMP	N	N	Y	N	N
PowerHA	N	N	Y	N	N
ClusterPerfect	N	N	N	Y	N
LifeKeeper	N	N	N	Y	N
Microsoft Cluster Server (MSCS) or Microsoft Failover Cluster (MSFC)	N	N	N	N	Y

8.1.1 Overview of the system switchover facility

(1) Overview of the standby system switchover facility

(2) Overview of the standby-less system switchover facility

(a) Standby-less system switchover (1:1) facility

(b) Standby-less system switchover (effects distributed) facility

(3) Cluster software supported by HiRDB

(4) Shared disk unit