Preparing for application of hash join and subquery hash execution

Before hash join or hash execution of a subquery can be applied, the following items must be set:

Hash table size
Method for allocating the work table buffer
Work table buffer size
Hashing mode

(a) Hash table size

Use the pd_hash_table_size operand in the system definition or PDHASHTBLSIZE in the client environment definition to set the hash table size. Calculate the maximum hash table row length, and then set the hash table size to a value that is equal to or greater than the value obtained from the following formula:

hash-table-size (in kilobytes) [Figure]

(maximum-hash-table-row-length (in bytes) [Figure]

2 + 32)

128

maximum-hash-table-row-length

For each SELECT statement, calculate the hash table row lengths for the following units. Then select the largest calculated value (maximum hash table row length).

Query specifications that join multiple tables with =
Subqueries that correspond to one of the following:
Table subquery specified on the right side of an =ANY quantified predicate
Table subquery specified on the right side of an =SOME quantified predicate
Table subquery specified on the right side of an IN predicate
Other subquery that specifies an external reference column with = in a search condition

The calculation methods for hash table row length are shown as follows.

Query specification that joins multiple tables with =

For columns specified in the selection expressions and search conditions of the tables that are linked with =, calculate the row length in each table from the following formula:
From the table row lengths that were calculated in 1, use one that is not the smallest value and calculate the hash table length from the following formula:

Subquery

For columns specified in subquery selection expressions and columns specified in predicates that include an external reference column in the search condition, calculate the hash table row length from the following formula:

ai: Data length of the i-th data item. For details about data length, see the HiRDB Version 8 Installation and Design Guide. However, for character data (including national character data and mixed character data) that is specified only in a selection expression of a table joined with = and has a defined length of 256 bytes or more, the data length becomes 12.

Hash tables of the size calculated previously can store 1,500 to 2,000 rows. If the number of inner tables to be joined or the number of subquery searches is high, bucket partitioning is executed several times, and the performance may not improve. In this case, either calculate and set the hash table size for batch hash join shown as follows or see (2) to tune the hash table size.

hash-table-size-for-batch-hash-join (in kilobytes) = [Figure]

(number-of-hash-table-data-pages + number-of-hash-table-management-table-pages) [Figure]

number-of-one-segment-pages [Figure]

128

number-of-hash-table-data-pages = [Figure]

number-of-hash-table-rows [Figure]

MIN {

(hash-table-page-length 48) [Figure]

hash-table-row-length [Figure]

, 255}

+ 63

number-of-hash-table-management-table-pages = [Figure]

(16

number-of-hash-table-rows + ( [Figure]

(number-of-hash-table-data-pages [Figure]

hash-table-page-length + 16 [Figure]

number-of-hash-table-rows) [Figure]

(number-of-one-segment-pages [Figure]

hash-table-page-length) [Figure]

8) + 8)

hash-table-page-length [Figure]

hash-table-page-length

number-of-one-segment-pages = [Figure]

(128

1024)

hash-table-page-length [Figure]

Determine the hash table page length from the hash table row length as shown in the following table.

Hash table row length	Hash table page length
0 to 1,012	4,096
1,013 to 2,036	8,192
2,037 to 4,084	16,384
4,085 to 16,360	32,768
16,361 to 32,720	(hash-table-row-length + 48) 2048 2048 hash-table-row-length: Number of inner tables to be joined when the targets of batch hash join are joined. If the targets are subqueries, this value is the number of subquery searches excluding the predicates that include outer reference columns in the search conditions.

(b) Method for allocating the work table buffer

The method for allocating the work table buffer must be set to buffer batch allocation (pool) in server process units. Therefore specify pool in the pd_work_buff_mode operand of the system definition.

(c) Work table buffer size

Hash tables are allocated in the work table buffer. If the work table buffer size or the upper limit size for expansion allocation of the work table buffer is smaller than the specified hash table size, an error occurs because of insufficient space in the work table buffer. Therefore, in the pd_work_buff_size or pd_work_buff_expand_limit operand of the system definition, set a value that is equal to or larger than the value calculated with the following formula:

work-table-buffer-size (in kilobytes)  (hash-table-size (in kilobytes)  2 + 256)  maximum-number-of-hash-joins-in-SELECT-statement + 128

maximum-number-of-hash-joins-in-SELECT-statement

Calculate the number of hash joins in each SELECT statement from the following formula, and set the largest value as the maximum number of hash joins in a SELECT statement. The number of hash joins is determined by counting the items that have HASH JOIN as the join type in the join processing information that is output by the access path display utility (pdvwopt).

number-of hash-joins-in-SELECT-statement = ((number-of-tables-joined-with-=)  (number-of-query-specifications-joined-with-=)), + (number-of-=ANY-quantified-predicates) + (number-of-=SOME-quantified-predicates) + (number-of-IN-(subquery)-specifications) + (number-of-other-subqueries-that-specify-external-reference-columns-with-=-in-search-conditions)

If multiple cursors are to be opened at the same time and searched, total the values that are calculated for the individual cursors.

Example

SELECT A.A1,B.B2,C.C3 FROM A,B,C                  3-1
  WHERE A.A1=B.B1 AND A.A2=B.B2
    AND B.B3=C.C3
    AND A.A1=C.C1
    AND A.A4=ANY(SELECT D.D4 FROM D)               1
    AND A.A5=SOME(SELECT E.E5 FROM E)              1
    AND A.A6 IN(SELECT F.F6 FROM F
             WHERE F.F1=A.A1)                      1
    AND EXISTS(SELECT G.G1 FROM G WHERE G.G1=B.B1) 1

In this example, the values are (3-1) + 1 + 1 + 1 + 1, so the number of hash joins in this SELECT statement is 6.

Adding about 4,096 extra kilobytes to the value calculated from the work table buffer size formula shown previously increases the input/output unit size during bucket partitioning, which in turn improves the performance.

If batch hash join without bucket partitioning is to be executed on all data, the operation can be executed if the following formula is satisfied:

work-table-buffer-size (in kilobytes)  hash-table-size (in kilobytes)  maximum-number-of-hash-joins-in-SELECT-statement + 384

(d) Hashing mode

A retrieval that accompanies a hash join or subquery execution is processed by hashing.

You can select the hashing mode with the pd_hashjoin_hashing_mode operand of the system definition or with PDHJHASHINGMODE client environment definition. The default is TYPE1.

TYPE1 is the hashing mode of HiRDB versions before 07-02. If you use hash join for the first time in HiRDB version 07-02 or a more recent version, specify TYPE2.

If you specify TYPE1 in HiRDB version 07-02 or a more recent version, you may not obtain the desired performance. If this happens, specify TYPE2 as the hashing mode, or see (3) Tuning the hashing mode, and tune the mode.

(2) Tuning methods for hash table size

(a) Tuning information used

The hash table size can be tuned based on either of the following types of tuning information:

UAP statistical report (specify client environment definition PDUAPREPLVL)
UAP statistical information from the statistics analysis utility

For details about the UAP statistical report, see 10.1.4 UAP statistical report facility. For details about the statistics analysis utility, see the HiRDB Version 8 Command Reference manual.

(b) Items derived from tuning information

When tuning information about the hash table size is obtained, the following items can be determined:

Whether batch hash join, which expands data into a hash file all at once, or bucket partitioning hash join, which expands data into a hash table in bucket units, is set
Whether bucket repartitioning is being executed when bucket partitioning hash join is set
How large the hash table size should be set to execute batch hash join when bucket partitioning hash join is set
How large the hash table size should be set to avoid bucket repartitioning when bucket partitioning hash join is set

Bucket repartitioning refers to the repetition of bucket partitioning recursively for up to three levels when the bucket size exceeds the hash table size. An example is shown as follows.

[Figure]

The number of partitions in one bucket partitioning is determined from the following formula:

number-of-bucket-partitions = MIN { (hash-table-size  2)  hash-table-page-length , 64}

Even with batch hash join, level 1 bucket partitioning is executed for inner tables to be joined.

(c) Tuning methods

Table 4-14 describes the tuning methods for hash table size.

Table 4-14 Tuning methods for hash table size

Tuning information (unit: kilobytes)	Tuning method
Maximum batch hash table size	If a value that is equal to or greater than this value was set as the hash table size, batch hash join without bucket partitioning is set for all data.¹ If this value exceeds the maximum hash table size that can be specified, batch hatch join cannot be used. If this value is 0, hash join or subquery hash execution has not been performed.
Level 1 maximum bucket size	If a value that is equal to or greater than this value was set as the hash table size, bucket partitioning has terminated at level 1. If bucket partitioning is set to level 2 or higher, specifying this value as the hash table size terminates bucket partitioning at level 1.² If batch hash join without bucket partitioning was executed on all data, 0 is displayed for this value.
Level 2 maximum bucket size	If a value that is equal to or greater than this value was set as the hash table size, bucket partitioning has terminated at level 2. If bucket partitioning is set to level 3 or higher, specifying this value as the hash table size terminates bucket partitioning at level 2.² If level 2 bucket partitioning was not executed even once, 0 is displayed for this value.
Level 3 maximum bucket size	If a value that is equal to or greater than this value was set as the hash table size, bucket partitioning has terminated at level 3. If the hash table size is smaller than this value, each bucket is partially expanded into the hash table, and the processing efficiency becomes worse. In this case, set the hash table size to this value or larger.² In some cases, not applying hash join or subquery hash execution improves the performance. If level 3 bucket partitioning was not executed even once, 0 is displayed for this value.

¹ When the hash table size is increased, the number of bucket partitions at each partitioning execution may increase according to the formula for calculating the number of bucket partitions. Consequently, a hash table size that is larger than the size that was used during tuning information acquisition may be necessary.

If you used tuning information and increased the hash table size, get the tuning information again. If the intended results are not realized, increase the hash table size again according the new tuning information that was acquired.

² When the hash table size is increased, the number of bucket partitions at each partitioning execution may increase according to the formula for calculating the number of bucket partitions. Consequently, bucket partitioning may terminate at the intended level even if the hash table size is smaller than the size that was used during tuning information acquisition.

On the other hand, when the hash table size is decreased, the number of bucket partitions at each partitioning execution may decrease. Consequently, bucket partitioning may not terminate at the same level used during tuning information acquisition. You should therefore use this tuning information when the hash table size is increased.

(3) Tuning the hashing mode

(a) Tuning information used

The hash table size can be tuned based on either of the following types of tuning information:

UAP statistical report (specify client environment definition PDUAPREPLVL)
UAP statistical information from the statistics analysis utility

For details about the UAP statistical report, see 10.1.4 UAP statistical report facility. For details about the statistics analysis utility, see the manual HiRDB Version 8 Command Reference.

(b) Item derived from tuning information

By obtaining tuning information about the hash table size, you can determine the following item:

Retrieval performance of the specified hashing mode

(c) Tuning mode

Table 4-15 shows tuning information for the hashing mode.

Table 4-15 Tuning information for the hashing mode

Tuning information (unit: number)	Description
Maximum number of comparisons	Maximum number of comparisons during hash searching
Total number of comparisons	Total number of comparisons during hash searching
Number of searches	Number of hash table searches
Average number of comparisons	Total number of comparisons / number of searches

Note: If the tuning information value is 0, hash join or subquery hash execution has not been performed.

Tuning method example: Set TYPE1 and TYPE2 in client environment definition PDHJHASHINGMODE, and get the average number of comparisons from the statistical information for each mode. Compare the average number of comparisons, and set the hashing mode with the smaller value to the pd_hashjoin_hashing_mode operand.

4.5.10 Preparing for application of hash join and subquery hash execution

(1) Items to be preset

(a) Hash table size

(b) Method for allocating the work table buffer

(c) Work table buffer size

(d) Hashing mode

(2) Tuning methods for hash table size

(a) Tuning information used

(b) Items derived from tuning information

(c) Tuning methods

(3) Tuning the hashing mode

(a) Tuning information used

(b) Item derived from tuning information

(c) Tuning mode