Join methods

Table 4-8 describes the join method types (except direct product) and their features. If the join methods found in this table cannot be applied, direct product is applied.

Table 4-8 Join method types and features

Join method	Processing method	Initial data fetching	Advantages	Disadvantages
Merge join	This method sorts the data by join column and executes matching in sequence from the smallest value in the join column.	Slow	The performance degradation is small compared with other methods because even tables with many hits can be joined with a small amount of memory. Data can be searched rapidly if the join column data has already been sorted, and the sort processing for merge join can be cancelled.	If the data in the columns to be joined has not been sorted, the sort processing load increases, and the performance drops.
Nested-loops-join	This method uses join column values from the outer table, searches the index defined in the join column of the inner table, and repeatedly processes nested matches.	Fast	Data can be searched rapidly if the inner table can be narrowed with the index specified in the join column.	If the hit count of the outer table is high, the performance drops because the index is used to search the inner table each time a row is fetched from the outer table.
Hash join	This method creates a hash table from the join column of the inner table, hashes the join column of the outer table, and executes matching with the hash table that was created from the inner table.	Fast if the number of hits in the inner table is small (slower than next-loop-join, but faster than merge join)	Data can be searched rapidly when the hit count is low for the inner table and high for the outer table.	If the hit count in the inner table is high, memory usage becomes high. The performance drops because the hits for which memory is unavailable are first saved to a file.
SELECT-APSL	If a condition contains a ? parameter, this method prepares several join method candidates, and determines the optimal search method when the value of the ? parameter is input.	Differs depending on the search method that is selected	The optimal search method can be selected when the value of the ? parameter is input.	The optimizing information collection utility (pdgetcst) must be executed.^* Also, the SQL object size becomes large because several join methods are prepared.
Distributed nested loops join	This method uses join column values of the outer table to match up the values in a foreign table, which is the inner table. (This method is applied when HiRDB External Data Access is installed in the system.)	Fast	Data can be retrieved quickly if the number of outer table hits is small and the number of inner table hits is large.	Because the foreign server containing the foreign table is searched each time a row is fetched from the outer table, the performance worsens as the number of outer table hits increases.

^* In some cases, the optimal access path cannot be selected even if the optimizing information collection utility is executed. For details about the necessity of executing the optimizing information collection utility, see the manual HiRDB Version 8 Command Reference and verify the performance.

(a) Merge join

Merge join is effective when the outer table cannot be narrowed very much.

SORT MERGE JOIN: This join method fetches rows from the outer and inner table, creates the respective work tables, and sorts the data. The join method then joins the rows if the join condition is satisfied.; Figure 4-17 shows the processing of SORT MERGE JOIN.; Figure 4-17 Processing of SORT MERGE JOIN

KEY SCAN MERGE JOIN: This join method system fetches rows from the outer and inner table by using KEY SCAN. The join method then joins the rows if the join condition is satisfied.; Figure 4-18 shows the processing of KEY SCAN MERGE JOIN.; Figure 4-18 Processing of KEY SCAN MERGE JOIN

LIST SCAN MERGE JOIN: This join method creates work tables from the outer and inner tables, and fetches rows in ascending join column order without sorting the data beforehand. The join method then joins the rows if the join condition is satisfied.; Figure 4-19 shows the processing of LIST SCAN MERGE JOIN.; Figure 4-19 Processing of LIST SCAN MERGE JOIN

L-KEY R-LIST MERGE JOIN: This join method fetches rows from the outer table by using KEY SCAN. The method creates a work table for the inner table and fetches rows without first sorting the data. The join method then joins the rows if the join condition is satisfied.

L-KEY R-SORT MERGE JOIN: This join method system fetches rows from the outer table by using KEY SCAN. The join method creates a work table for the inner table, sorts the data, and fetches rows. The join method then joins the rows if the join condition is satisfied.

L-LIST R-KEY MERGE JOIN: This join method creates a work table for the outer table and fetches rows without first sorting the data. The join method fetches rows from the inner table by using KEY SCAN. The join method then joins the rows if the join condition is satisfied.

L-LIST R-SORT MERGE JOIN: This join method creates a work table for the outer table and fetches rows without first sorting the data. The join method creates a work table for the inner table, sorts the data, and fetches rows. The join method then joins the rows if the join condition is satisfied.

L-SORT R-KEY MERGE JOIN: This join method creates a work table for the outer table, sorts the data, and fetches rows. The join method fetches rows from the inner table by using KEY SCAN. The join method then joins the rows if the join condition is satisfied.

L-SORT R-LIST MERGE JOIN: This join method creates a work table for the outer table, sorts the data, and fetches rows. The join method creates a work table for the inner table, sorts the data, and fetches rows. The join method then joins the rows if the join condition is satisfied.

(b) Nested-loops-join

Nested-loops-join is effective if an index is defined in the inner table, and the outer table can be narrowed significantly

NESTED LOOPS JOIN

This join method fetches rows one at a time from the outer table, matches them to individual rows in the inner table, and executes nested loop processing that fetches rows that satisfy the join condition.

Figure 4-20 shows the processing of NESTED LOOPS JOIN.

Figure 4-20 Processing of NESTED LOOPS JOIN

Note: In some cases, an index is used when the outer table is searched.

R-LIST NESTED LOOPS JOIN

This join method fetches rows from the inner table and creates a work table. The join method then fetches rows one at a time from the outer table, matches the work table that was created from the inner table to those individual rows, and executes nested loop processing that fetches rows that satisfy the join conditions.

Figure 4-21 shows the processing of R-LIST NESTED LOOPS JOIN.

Figure 4-21 Processing of R-LIST NESTED LOOPS JOIN

Note: In some cases, an index is used when the outer table is searched.

(c) Hash join

HASH JOIN: This join method first hashes the inner table with the join column values and creates a hash table. The join method then hashes the outer table with the join column values each time a row is fetched, and matches the outer table with the hash table that was created from the inner table.; Figure 4-22 shows the processing of HASH JOIN.; Figure 4-22 Processing of HASH JOIN

There are four hash join processing methods. Table 4-9 describes the hash join processing methods and features.

Table 4-9 Hash join processing methods and features

Processing method	Description	Advantages	Disadvantages	Selection method
Batch hash join	This method performs hash join by expanding the hash table that was created from the inner table into the buffer area for all work tables.	Hash join can be processed rapidly because this method expands the entire hash table in the work table buffer area before executing hash join.	If the hash table for the inner table is large, the system's capability to execute SQL statements simultaneously is diminished because the work table work area becomes large.	Change the hash table size.^*
Bucket partitioning hash join	This method partitions the inner and outer tables into several buckets, creates a hash table from some of the inner table buckets, and expands it in the work table buffer area. This method then saves the remaining buckets in a work table file and reads the contents of inner table buckets that were expanded in the work table buffer area and the outer table buckets of the area with the same value. The method then expands the inner table from the work table file to the work table buffer area a little at a time, and executes hash join. The amount of memory used becomes small, and the processing performance drops slightly.	Hash join can be executed in environments that have a small work table buffer area.	Because the rows of the inner and outer table are first saved to a work table file, the performance drops compared to when hash join is executed with only work table buffer area.	Change the hash table size.^*
Continuous hash join	When three or more tables are searched, this method creates hash tables from the tables, except the outermost table, expands the hash tables in the work table buffer area, and executes hash join in succession. The amount of memory used becomes large, and the processing performance improves.	Hash join can be processed rapidly because this method expands the entire hash table in the work table buffer area before executing hash join. Also, hash join can be processed rapidly when only the outermost table is large.	When the number of tables to be executed becomes large, the work table buffer area that is used becomes large.	This method cannot be selected. The HiRDB system automatically selects the optimal method based on the number of table rows.
Intermittent hash join	When three or more tables are searched, this method executes hash join by saving the join results to a work table file each time tables or work tables are joined.	Hash join involving three or more tables can be executed even in environments that have a small work table buffer area.	The number of I/O operations increases and performance drops because the join results are first saved to a file each time tables or work tables are joined.	This method cannot be selected. The HiRDB system automatically selects the optimal method based on the number of table rows.

^* For details about how to change the hash table size, see 4.5.10 Preparing for application of hash join and subquery hash execution.

The processing methods are summarized as follows.

Batch hash join
This processing method expands the entire hash table created from the inner table in the work table buffer area and then executes hash join. Figure 4-23 shows the processing method of batch hash join.

Figure 4-23 Processing method of batch hash join
Bucket partitioning hash join
This processing method partitions the inner and outer table into buckets, expands part of the inner table into the work table buffer area, and saves the remaining sections to a work table file.
Bucket partitioning refers to hashing a table with join row values and partitioning the table into multiple small tables. Join processing is executed on an inner table section that was expanded in the work file buffer area. First, a hash table is created from the inner table, and rows are fetched one at a time from the outer table and then merged and joined with the hash table that was created from the inner table. After joining of the tables found in the work table buffer area is completed, the buckets of the outer and inner tables are expanded from the work table files into the work table buffer area, and join processing is executed in the same manner. Processing ends after all tables are expanded into the work table buffer area and joined.
Figure 4-24 shows the processing method of bucket partitioning hash join.

Figure 4-24 Processing method of bucket partitioning hash join
Continuous hash join
This processing method is applied to searches involving three or more tables.
First, hash tables are created from all target tables except the outermost table and expanded into the work table buffer area. Next, a row is fetched from the outer table, hashed, and then matched and joined with the hash table that was created from the inner table. If the join condition is satisfied, the row is hashed with the join result, and then matched and joined with the hash table.
When joining is completed to the last row or when the condition becomes false, processing returns to the outermost table, the next row is fetched, and join processing is repeated in the same manner. During join processing, if there is a location where the join key value in the inner table is duplicated, processing returns to that location and join processing is repeated. When processing of all duplicate key values is completed, processing returns to the outermost table, the next row is fetched, and join processing is repeated in the same manner.
Figure 4-25 shows the processing method of a continuous hash join.

Figure 4-25 Processing method of continuous hash join
Intermittent hash join
This processing method is applied to searches involving three or more tables.
First, a hash table is created from the inner table of the first join and expanded in the work table buffer area. Next, rows are fetched one at a time from the outer table, hashed with the join column values of the outer table, and then matched and joined with the hash table that was created from the inner table. After all lines from the outer table have been fetched and joined, processing proceeds to the next join process.
The processing changes depending on whether the join result becomes the outer table or the inner table.
If the join result becomes the outer table, a hash table is created from the next inner table to be joined, and rows are fetched one at a time from the join results and then matched and joined with the hash table that was created from the inner table.
If the join result becomes the inner table, a hash table is created from the join results, and rows are fetched one at a time from the outer table and then matched and joined with the hash table that was created from the join results.
Figure 4-26 shows the processing method of intermittent hash join. In the example shown for this processing method, that tables are joined as follows: outer table 1 ((outer table 2 inner table 1) inner table 2).

Figure 4-26 Processing method of intermittent hash join

(d) SELECT-APSL

SELECT-APSL is a method that dynamically determines the join method during SQL execution.

SELECT-APSL (HiRDB/Parallel Server only)

If the conditions include the ? parameter, the optimal join method may change depending on the value of the ? parameter. Also, if the value of the ? parameter cannot be determined during SQL optimization processing, the optimal join method cannot be determined. The system therefore determines the join method by calculating the hit ratio during SQL execution.

SELECT-APSL is described as follows based on a display example of the access path display utility (pdvwopt).

Condition T1(outer-table).C1=? parameter

Reference value 0.047

[1] Nest-loop-join

[2] Merge join

Explanation

If the hit rate of the predicate T1(outer-table).C1=? parameter is less than the reference value (0.047), nested-loops-join is selected during execution because the hit rate is small and the outer table can be narrowed substantially.
If the hit rate of the predicate T1(outer-table).C1=? parameter is equal to or greater than the reference value (0.047), merge join is selected during execution because the hit rate is large and the outer table cannot be narrowed very much.

(e) Distributed nested loops join

DISTRIBUTED NESTED LOOPS JOIN (DNL JOIN): The DISTRIBUTED NESTED LOOPS JOIN method performs a nested-type loop process that fetches rows that satisfy the join condition. To do this, the local HiRDB fetches rows from the outer table. Then for each row, the local HiRDB executes an SQL statement that uses a variable to pass the values of the outer table row to the foreign server where the inner table (foreign table) is located. The values of the outer table row are then matched with those in the inner table. The local HiRDB can get the retrieval search results from the foreign database (DBMS) by sending a foreign table acquisition request to the foreign table.; Figure 4-27 shows the processing method of DISTRIBUTED NESTED LOOPS JOIN.; Figure 4-27 Processing method of DISTRIBUTED NESTED LOOPS JOIN

(f) Cross join

CROSS JOIN: The CROSS JOIN process method combines and joins all rows of the outer table and all rows of the inner table. If there are conditions that apply across both tables, the conditions are judged after the tables are joined.; Figure 4-28 shows the CROSS JOIN processing method.; Figure 4-28 CROSS JOIN processing method

Note: Depending on the condition, sometimes a work table is not created.

4.5.6 Join methods