Hitachi

Hitachi Advanced Database Application Development Guide


5.5.2 About hash join

HADB joins tables by matching the hash table created based on the joined column of the outer table with the results of hashing the joined column of the inner table. This joining method is called hash join.

Indexes defined for columns specified in the join condition used to join the outer and inner tables will not be used when evaluating the join condition. The join condition will be evaluated using hashing. However, if a range index is defined for a column of the inner table that is specified in a join condition, if conditions allow, the range index might be used.

Organization of this subsection

(1) Joining tables using the hash join method

The following explains how tables are joined by a hash join method, by way of an example in which the hash join is applied when the following SELECT statement is executed.

Example:
SELECT * FROM "T1","T2" WHERE "T1"."C2">10 AND "T1"."C1"="T2"."C1"

HADB determines which table is to be the outer table and which table is to be the inner table. In this example, table T1 is the outer table and table T2 is the inner table. The columns T1.C1 and T2.C1 specified in the underlined join condition become the joined column.

Figure 5‒7: Joining tables using the hash join method

[Figure]

Explanation:
  1. HADB creates a hash table and hash filter based on the values in the joined column in the outer table (table T1).

  2. Next, HADB matches the result of hashing the value in the joined column of the inner table (table T2) with the hash table, and joins the tables. Before matching the joined column in the inner table with the hash table, HADB performs filtering by using the hash filter. This reduces the number of times the joined column in the inner table is matched with the hash table.

For a hash join where table T1 is the outer table and table T2 is the inner table, even if an index is defined for the columns ("T1"."C1" and "T2"."C1") specified in the join condition, that index is not used when evaluating the join condition ("T1"."C1"="T2"."C1").

If an index is defined for "T1"."C2", that index might be used when evaluating "T1"."C2">10.

The range index might be used when a hash join is processed if both of the following two conditions are met:

If both of the preceding two conditions are met, when a hash table is created from a joined column of the outer table during hash join processing, the maximum and minimum values of that joined column are obtained. Then, when the inner table is searched, the range index is used to skip the inner table's chunks or segments that are not within the obtained maximum and minimum values of the joined column.

HADB creates the hash table in the hash table area. The size of the hash table area is specified in the adb_sql_exe_hashtbl_area_size operand in the server definition or the client definition.

A hash filter is created in the hash filter area. The size of the hash filter area is specified in the adb_sql_exe_hashflt_area_size operand in the server definition or the client definition.

(2) Example where hash join is applied

Example 1:
SELECT * FROM "T1","T2" WHERE "T1"."C1"="T2"."C1"

A hash join is applied when a single column specification is specified on each side of a join condition specified by a = operator.

Example 2:
SELECT * FROM "T1","T2" WHERE "T1"."C1"="T2"."C1"+10

A hash join is applied when the following condition is met:

  • A single column is specified on one side of a join condition specified by a = operator, and the other side specifies a scalar operation that includes a column specification.

Example 3:
SELECT * FROM "T1" INNER JOIN "T2"
    ON "T1"."C1"=CAST("T2"."C1" AS INTEGER)

A hash join is applied when the following condition is met:

  • A single column is specified on one side of a join condition specified by a = operator, and the other side specifies a scalar function that includes a column specification.

Example 4:
SELECT * FROM "T1" LEFT JOIN "T2"
    ON "T1"."C1"="T2"."C1"||"T2"."C2"

A hash join is applied when the following condition is met:

  • A single column is specified on one side of a join condition specified by a = operator, and the other side specifies a concatenation operation that includes a column specification.

(3) Notes on applying hash joins

Try to make the data type and data length of the value expressions on the left and right of the = of the join condition the same if possible. When the data type and data length of the value expressions on the left and right of the = of the join condition differ, they are converted to the same data type and length before creating the hash table. Hashing takes place after this process has finished. This conversion process incurs an overhead.

For details about the data types after conversion, see Data types that can be converted, assigned, and compared in the manual HADB SQL Reference.

Note that if the data type after conversion is DECIMAL, precision and scaling are determined based on the following equations:

Equations
Precision = Pmax + Smax
Scaling = Smax
Pmax = MAX (p1s1,p2s2)
Smax = MAX (s1,s2)

p1, s1: The precision and scaling of the value expression specified on the left side of the join condition specified by =

p2, s2: The precision and scaling of the value expression specified on the right side of the join condition specified by =

Note that if the data type prior to conversion is INTEGER, these equations are calculated as DECIMAL(20,0). If the data type prior to conversion is SMALLINT, these equations are calculated as DECIMAL(10,0).

Example:

The following shows an example of determining precision and scaling when the data type after conversion is DECIMAL.

Table definitions

CREATE TABLE "T1"("C1" INTEGER,"C2" CHAR(3),"C3" DATE) IN "DBAREA01"
CREATE TABLE "T2"("C1" DECIMAL(7,3),"C2" CHAR(3),"C3" DATE) IN "DBAREA01"

Example of SQL statement

SELECT * FROM "T1","T2" WHERE "T1"."C1"="T2"."C1"

The column specified on the left side ("T1"."C1") of the underlined join condition specified by the = operator has a different data type and data length from the column specified on the right side ("T2"."C1"). This means that the data type and data length undergo conversion. The column "T1"."C1" is INTEGER type data, and the column "T2"."C1" is DECIMAL type data.

In this scenario, the data types of the columns specified on the left and right sides of the join condition specified by the = operator are converted to DECIMAL type data. Column "T1"."C1" is treated as if it were DECIMAL(20,0). The precision and scaling of the converted DECIMAL type data are calculated as follows:

Pmax = MAX(p1s1,p2s2) = MAX (20−0,7−3) = 20
Smax = MAX (s1,s2) = MAX (0,3) = 3
Precision = Pmax + Smax = 23
Scaling = Smax = 3

The data type and data length of the value expressions on the left and right of the = of the join condition are converted to DECIMAL(23,3), the hash table is created, and hashing takes place.

(4) Action to take when the hash table area has insufficient space

▪ Flow of processing when hash table area has insufficient space

The following explains the flow of processing when the hash table area has insufficient space to create the hash table.

  1. If there is insufficient hash table area to create the hash table, the outer table data is stored in multiple work tables. Similarly, the inner table data is also stored in multiple work tables.

    [Figure]

    Note

    In the preceding example, three work tables are created for the outer table, and three work tables are created for the inner table. The number of work tables that are created differs depending on conditions, such as the specification of the SQL statement.

  2. A hash table is created with a work table for the outer table (work table B1), and then the hash table is matched with a work table for the inner table (work table P1).

    After matching between the hash table and work table P1 is complete, a hash table is created with work table B2, and the hash table is matched with work table P2.

    After matching between the hash table and work table P2 is complete, a hash table is created with work table B3, and the hash table is matched with work table P3.

    [Figure]

If the hash table area becomes insufficient during the processing in step 2

If the hash table area becomes insufficient during the processing in step 2, any data that cannot fit in the hash table is stored in another work table. In this case, in addition to matching between a work table for the inner table and a hash table, matching between a work table for the inner table and the newly created work table (work table B4) occurs.

[Figure]

Note

If a new work table (work table B4) is created due to insufficient hash table area during the processing in step 2, the KFAA51130-W message is output to the server message log file.

■ Action to take when the hash table area has insufficient space

If the hash table area is insufficient, processing time for the SQL statement might take longer due to work table creation and matching. To remedy a situation in which the hash table area has insufficient space, increase the value specified for the adb_sql_exe_hashtbl_area_size operand in the server definition or the client definition. This operand specifies the size of the hash table area.

(5) Conditions where a hash join is not applied

A hash join will not be applied as the join method when any of the following conditions are met:

Note

You can specify the outer table of a joined table by join method specification. For details about join method specification, see Specification format and rules for join method specifications in the manual HADB SQL Reference.

(6) Conditions where a hash filter is applied

  1. A hash filter is applied during hash join when all the following conditions are satisfied:

    • 0 is not specified for the adb_sql_exe_hashflt_area_size operand in the server definition or client definition.

    • Tables are joined by a comma join or INNER JOIN specified.

  2. If the size of the hash filter area specified for the adb_sql_exe_hashflt_area_size operand in the server definition or client definition is too small, a shortage might occur in the hash filter area allocated for each hash retrieval. As a result, a hash filter is not applied to any hash retrieval for which the size of the hash filter is insufficient. If you want to apply the hash filter to all types of hash retrieval, change the value of the adb_sql_exe_hashflt_area_size operand so that the following condition is satisfied:

    value-specified-for-adb_sql_exe_hashflt_area_size>↑A×B×number-of-processing-real-threads-in-SQL-statement÷1024↑
    A:

    Number of hash filters to be used for the hash retrieval to which a hash filter was not applied

    If there are two or more cases in which a hash filter was not applied to hash retrieval, determine the number of hash filters for each hash retrieval, and then assign the largest value among those values. The following shows the number of hash filters to be used for a hash retrieval process:

    • For hash join: Number of = join conditions for hash join

    • For subqueries to which hash execution is applied, and which do not contain an external reference: 1

    • For subqueries to which hash execution is applied, and which contain external references: The number of = conditions that contains an external reference column

    B:

    Sum total of all the following specified in SQL statements

    • Number of hash joins to which a hash filter is applied

    • Number of subqueries to which hash execution using a hash filter is applied