13.3.5 Design considerations

Organization of this subsection
(1) Considerations common to both HiRDB/Single Server and HiRDB/Parallel Server
(2) HiRDB/Parallel Server

(1) Considerations common to both HiRDB/Single Server and HiRDB/Parallel Server

Following are the design considerations that are common to HiRDB/Single Servers and HiRDB/Parallel Servers:

(a) Row partitioning taking into account contention among disk accesses

If multiple UAPs access separate tables concurrently, these tables should be partitioned and stored in separate user RDAREAs on separate disks.

Figure 13-9 provides an overview of row partitioning taking into account contention among disk accesses.

Figure 13-9 Overview of row partitioning taking into account contention among disk accesses

[Figure]

Explanation:
Tables A and B are partitioned and stored in two sets of user RDAREAs, USR01-USR02 and USR03-USR04, which are on separate disks. If UAP1 and UAP2 attempt to access tables A and B concurrently, no contention occurs, thereby reducing their wait time.
If these tables are stored in user RDAREAs on the same disk, access contention occurs on the disk when multiple UAPs attempt to access the tables concurrently. In this case, one of the UAPs is placed in wait status until the other UAP completes its access processing, resulting in an increase in wait time.
(b) Row partitioning taking into account operability

Figure 13-10 provides an overview of row partitioning taking into account operability.

Figure 13-10 Overview of row partitioning taking into account operability

[Figure]

Explanation:
  • Storing table and index in the same user RDAREAs
    If operability for table creation, table reorganization, backing up of user RDAREAs, RDAREA recovery, etc., is more important than retrieval performance, a row-partitioned table and its indexes should be stored in the same user RDAREAs. Operations on individual user RDAREAs become easy.
    In the example shown in Figure 13-10, the portion of table AB for application A is stored together with its index in dedicated user RDAREA USR01. This enables the pdhold command (for shutting down RDAREAs) to be used to terminate application A. Additionally, it simplifies backup processing for each application that uses the database copy utility (pdcopy).
  • Placing related user RDAREAs on the same disk
    If a row-partitioned table and its indexes are stored in multiple user RDAREAs, the related user RDAREAs should be placed on the same disk. This enables user RDAREAs to be used individually by disk.
    In the example shown in Figure 13-10, the portion of table CD for application D is stored together with its index in user RDAREAs USR04 and USR06 on the same disk. This enables applications to be executed by disk.

(2) HiRDB/Parallel Server

Following are the design considerations for HiRDB/Parallel Servers:

(a) Row partitioning taking into account workload for disk accesses
(b) Row partitioning taking into account the degree of parallel input/output processing

If a table is partitioned and stored in as many server machines as possible, the input/output processing time can be reduced by parallel processing. If there is a limit to the number of server machines available for table row partitioning, the same effects can be achieved by increasing the number of back-end servers and disks for each server machine. Figure 13-11 provides an overview of input/output processing performance based on the number of back-end servers used for table row partitioning.

Figure 13-11 Overview of input/output processing performance based on number of back-end servers used for table row partitioning

[Figure]

If a table is partitioned and stored in too many back-end servers, there is an increase in the amount of communication required to return each back-end server's processing results to the front-end server. Therefore, the appropriate number of back-end servers must be determined, taking into account the type of database operation and SQL processing (whether SQLs are used to retrieve a large amount of data from a large table).

(c) Row partitioning taking into account table access frequency

A table must be partitioned so that table access frequency becomes uniform at each back-end server.

To do this, the considerations discussed below should be taken into account.

Key range partitioning
  • When table row partitioning is defined, specify UNIQUE for the partitioning key so that the amount of data becomes uniform.
  • When a table is partitioned and the number of accesses to the data in a specific key range is expected to be higher than in the other key ranges, divide the data in the heavy-accesses key range by finer key ranges.
Flexible hash partitioning or FIX hash partitioning
  • Change the hash function so that the data is distributed uniformly.
  • Select a partitioning key without uneven distribution or duplication so that the data is distributed uniformly.

Even when a table is partitioned and stored in multiple back-end servers, the performance of parallel processing of the table can vary depending on whether the table is partitioned so that uniform access frequency can be achieved.

Figure 13-12 shows the performance of parallel processing depending on the table access frequency.

Figure 13-12 Performance of parallel processing depending on table access frequency

[Figure]

Explanation:
The processing time that can be saved depends on whether or not the table is partitioned so that uniform access frequency is achieved. If the access frequency is not uniform, table processing does not terminate until processing at back-end server BES2 is completed, so the benefit of parallel processing is not obtained.
(d) Row partitioning taking into account complicated retrieval processing

For purposes of table partitioning taking into account complicated retrieval processing, such as retrieving or joining a large amount of data, the table should be designed using the following procedure:

  1. Determining the disk processing time and the number of disks to be used
    Obtain the disk access frequency (utilization factor) from the size of the data and the processing patterns, distribute data to disks on the basis of this value, and determine the object value for disk processing time. If join processing is to be executed, exclude the work disk required for join processing (the number of HiRDB file system areas where work table files are created for sort/merge processing) when distributing data. Exclude the time required for join processing from the object value of disk processing time. Determine the number of disks on the basis of data distribution to disks.
  2. Determining the number of server machines
    Obtain the overhead time for processing at the server machines on the basis of the data processing patterns. Determine the number of server machines (where back-end servers are installed) so that the disk processing time and overhead time at the server machines become uniform.
  3. Determining the number of server machines used for join processing
    Obtain the overhead time for join processing at the server machines on the basis of the data processing patterns. Then, determine the number of floating machines on the basis of this value and the disk processing time.
    A floating machine is a server machine where a floating server is installed, which is a back-end server dedicated to complicated retrieval processing such as join processing. User RDAREAs cannot be allocated to a back-end server defined as a floating server.
  4. Determining the number of work disks
    The data subject to join processing is distributed uniformly from each back-end server to the floating server. Determine the number of work disks (number of HiRDB file system areas used to create work table files) on the basis of the expected amount of this data.
  5. Determining the system configuration
    Determine the overall system configuration on the basis of the numbers of server machines and disks determined above.

Figure 13-13 shows the concept of a system configuration involving table row partitioning taking into account complicated retrieval processing.

Figure 13-13 System configuration involving table row partitioning taking into account complicated retrieval processing

[Figure]

Explanation:
Back-end servers BES1-BES3 and BES4-BES6 read the data subject to join processing from Tables A and B, respectively. Floating servers BES7-BES9 receive data from back-end servers BES1-BES6 and execute parallel match processing.
This type of system configuration can reduce the workload of back-end servers BES1-BES6 and reduce processing time. If no floating server is installed, one of the back-end servers BES1-BES6 must execute the join processing.