Scalable Database Server, HiRDB Version 8 UAP Development Guide

[Contents][Index][Back][Next]

4.5.6 Join methods

Organization of this subsection
(1) Join method types
(2) Processing methods

(1) Join method types

Table 4-8 describes the join method types (except direct product) and their features. If the join methods found in this table cannot be applied, direct product is applied.

Table 4-8 Join method types and features

Join method Processing method Initial data fetching Advantages Disadvantages
Merge join This method sorts the data by join column and executes matching in sequence from the smallest value in the join column. Slow The performance degradation is small compared with other methods because even tables with many hits can be joined with a small amount of memory.
Data can be searched rapidly if the join column data has already been sorted, and the sort processing for merge join can be cancelled.
If the data in the columns to be joined has not been sorted, the sort processing load increases, and the performance drops.
Nested-loops-join This method uses join column values from the outer table, searches the index defined in the join column of the inner table, and repeatedly processes nested matches. Fast Data can be searched rapidly if the inner table can be narrowed with the index specified in the join column. If the hit count of the outer table is high, the performance drops because the index is used to search the inner table each time a row is fetched from the outer table.
Hash join This method creates a hash table from the join column of the inner table, hashes the join column of the outer table, and executes matching with the hash table that was created from the inner table. Fast if the number of hits in the inner table is small (slower than next-loop-join, but faster than merge join) Data can be searched rapidly when the hit count is low for the inner table and high for the outer table. If the hit count in the inner table is high, memory usage becomes high. The performance drops because the hits for which memory is unavailable are first saved to a file.
SELECT-APSL If a condition contains a ? parameter, this method prepares several join method candidates, and determines the optimal search method when the value of the ? parameter is input. Differs depending on the search method that is selected The optimal search method can be selected when the value of the ? parameter is input. The optimizing information collection utility (pdgetcst) must be executed.* Also, the SQL object size becomes large because several join methods are prepared.
Distributed nested loops join This method uses join column values of the outer table to match up the values in a foreign table, which is the inner table. (This method is applied when HiRDB External Data Access is installed in the system.) Fast Data can be retrieved quickly if the number of outer table hits is small and the number of inner table hits is large. Because the foreign server containing the foreign table is searched each time a row is fetched from the outer table, the performance worsens as the number of outer table hits increases.

* In some cases, the optimal access path cannot be selected even if the optimizing information collection utility is executed. For details about the necessity of executing the optimizing information collection utility, see the manual HiRDB Version 8 Command Reference and verify the performance.

(2) Processing methods

(a) Merge join

Merge join is effective when the outer table cannot be narrowed very much.

SORT MERGE JOIN
This join method fetches rows from the outer and inner table, creates the respective work tables, and sorts the data. The join method then joins the rows if the join condition is satisfied.
Figure 4-17 shows the processing of SORT MERGE JOIN.

Figure 4-17 Processing of SORT MERGE JOIN

[Figure]

KEY SCAN MERGE JOIN
This join method system fetches rows from the outer and inner table by using KEY SCAN. The join method then joins the rows if the join condition is satisfied.
Figure 4-18 shows the processing of KEY SCAN MERGE JOIN.

Figure 4-18 Processing of KEY SCAN MERGE JOIN

[Figure]

LIST SCAN MERGE JOIN
This join method creates work tables from the outer and inner tables, and fetches rows in ascending join column order without sorting the data beforehand. The join method then joins the rows if the join condition is satisfied.
Figure 4-19 shows the processing of LIST SCAN MERGE JOIN.

Figure 4-19 Processing of LIST SCAN MERGE JOIN

[Figure]

L-KEY R-LIST MERGE JOIN
This join method fetches rows from the outer table by using KEY SCAN. The method creates a work table for the inner table and fetches rows without first sorting the data. The join method then joins the rows if the join condition is satisfied.

L-KEY R-SORT MERGE JOIN
This join method system fetches rows from the outer table by using KEY SCAN. The join method creates a work table for the inner table, sorts the data, and fetches rows. The join method then joins the rows if the join condition is satisfied.

L-LIST R-KEY MERGE JOIN
This join method creates a work table for the outer table and fetches rows without first sorting the data. The join method fetches rows from the inner table by using KEY SCAN. The join method then joins the rows if the join condition is satisfied.

L-LIST R-SORT MERGE JOIN
This join method creates a work table for the outer table and fetches rows without first sorting the data. The join method creates a work table for the inner table, sorts the data, and fetches rows. The join method then joins the rows if the join condition is satisfied.

L-SORT R-KEY MERGE JOIN
This join method creates a work table for the outer table, sorts the data, and fetches rows. The join method fetches rows from the inner table by using KEY SCAN. The join method then joins the rows if the join condition is satisfied.

L-SORT R-LIST MERGE JOIN
This join method creates a work table for the outer table, sorts the data, and fetches rows. The join method creates a work table for the inner table, sorts the data, and fetches rows. The join method then joins the rows if the join condition is satisfied.
(b) Nested-loops-join

Nested-loops-join is effective if an index is defined in the inner table, and the outer table can be narrowed significantly

NESTED LOOPS JOIN
This join method fetches rows one at a time from the outer table, matches them to individual rows in the inner table, and executes nested loop processing that fetches rows that satisfy the join condition.
Figure 4-20 shows the processing of NESTED LOOPS JOIN.

Figure 4-20 Processing of NESTED LOOPS JOIN

[Figure]
Note
In some cases, an index is used when the outer table is searched.

R-LIST NESTED LOOPS JOIN
This join method fetches rows from the inner table and creates a work table. The join method then fetches rows one at a time from the outer table, matches the work table that was created from the inner table to those individual rows, and executes nested loop processing that fetches rows that satisfy the join conditions.
Figure 4-21 shows the processing of R-LIST NESTED LOOPS JOIN.

Figure 4-21 Processing of R-LIST NESTED LOOPS JOIN

[Figure]
Note
In some cases, an index is used when the outer table is searched.
(c) Hash join

HASH JOIN
This join method first hashes the inner table with the join column values and creates a hash table. The join method then hashes the outer table with the join column values each time a row is fetched, and matches the outer table with the hash table that was created from the inner table.
Figure 4-22 shows the processing of HASH JOIN.

Figure 4-22 Processing of HASH JOIN

[Figure]

There are four hash join processing methods. Table 4-9 describes the hash join processing methods and features.

Table 4-9 Hash join processing methods and features

Processing method Description Advantages Disadvantages Selection method
Batch hash join This method performs hash join by expanding the hash table that was created from the inner table into the buffer area for all work tables. Hash join can be processed rapidly because this method expands the entire hash table in the work table buffer area before executing hash join. If the hash table for the inner table is large, the system's capability to execute SQL statements simultaneously is diminished because the work table work area becomes large. Change the hash table size.*
Bucket partitioning hash join This method partitions the inner and outer tables into several buckets, creates a hash table from some of the inner table buckets, and expands it in the work table buffer area. This method then saves the remaining buckets in a work table file and reads the contents of inner table buckets that were expanded in the work table buffer area and the outer table buckets of the area with the same value. The method then expands the inner table from the work table file to the work table buffer area a little at a time, and executes hash join.
The amount of memory used becomes small, and the processing performance drops slightly.
Hash join can be executed in environments that have a small work table buffer area. Because the rows of the inner and outer table are first saved to a work table file, the performance drops compared to when hash join is executed with only work table buffer area. Change the hash table size.*
Continuous hash join When three or more tables are searched, this method creates hash tables from the tables, except the outermost table, expands the hash tables in the work table buffer area, and executes hash join in succession.
The amount of memory used becomes large, and the processing performance improves.
Hash join can be processed rapidly because this method expands the entire hash table in the work table buffer area before executing hash join.
Also, hash join can be processed rapidly when only the outermost table is large.
When the number of tables to be executed becomes large, the work table buffer area that is used becomes large. This method cannot be selected. The HiRDB system automatically selects the optimal method based on the number of table rows.
Intermittent hash join When three or more tables are searched, this method executes hash join by saving the join results to a work table file each time tables or work tables are joined. Hash join involving three or more tables can be executed even in environments that have a small work table buffer area. The number of I/O operations increases and performance drops because the join results are first saved to a file each time tables or work tables are joined. This method cannot be selected. The HiRDB system automatically selects the optimal method based on the number of table rows.

* For details about how to change the hash table size, see 4.5.10 Preparing for application of hash join and subquery hash execution.

The processing methods are summarized as follows.

(d) SELECT-APSL

SELECT-APSL is a method that dynamically determines the join method during SQL execution.

SELECT-APSL (HiRDB/Parallel Server only)
If the conditions include the ? parameter, the optimal join method may change depending on the value of the ? parameter. Also, if the value of the ? parameter cannot be determined during SQL optimization processing, the optimal join method cannot be determined. The system therefore determines the join method by calculating the hit ratio during SQL execution.
SELECT-APSL is described as follows based on a display example of the access path display utility (pdvwopt).
Condition T1(outer-table).C1=? parameter
Reference value 0.047
[1] Nest-loop-join
[2] Merge join

Explanation
  • If the hit rate of the predicate T1(outer-table).C1=? parameter is less than the reference value (0.047), nested-loops-join is selected during execution because the hit rate is small and the outer table can be narrowed substantially.
  • If the hit rate of the predicate T1(outer-table).C1=? parameter is equal to or greater than the reference value (0.047), merge join is selected during execution because the hit rate is large and the outer table cannot be narrowed very much.
(e) Distributed nested loops join

DISTRIBUTED NESTED LOOPS JOIN (DNL JOIN)
The DISTRIBUTED NESTED LOOPS JOIN method performs a nested-type loop process that fetches rows that satisfy the join condition. To do this, the local HiRDB fetches rows from the outer table. Then for each row, the local HiRDB executes an SQL statement that uses a variable to pass the values of the outer table row to the foreign server where the inner table (foreign table) is located. The values of the outer table row are then matched with those in the inner table. The local HiRDB can get the retrieval search results from the foreign database (DBMS) by sending a foreign table acquisition request to the foreign table.
Figure 4-27 shows the processing method of DISTRIBUTED NESTED LOOPS JOIN.

Figure 4-27 Processing method of DISTRIBUTED NESTED LOOPS JOIN

[Figure]
(f) Cross join

CROSS JOIN
The CROSS JOIN process method combines and joins all rows of the outer table and all rows of the inner table. If there are conditions that apply across both tables, the conditions are judged after the tables are joined.
Figure 4-28 shows the CROSS JOIN processing method.

Figure 4-28 CROSS JOIN processing method

[Figure]

Note
Depending on the condition, sometimes a work table is not created.