Scalable Database Server, HiRDB Version 8 UAP Development Guide
Table 4-8 describes the join method types (except direct product) and their features. If the join methods found in this table cannot be applied, direct product is applied.
Table 4-8 Join method types and features
Join method | Processing method | Initial data fetching | Advantages | Disadvantages |
---|---|---|---|---|
Merge join | This method sorts the data by join column and executes matching in sequence from the smallest value in the join column. | Slow | The performance degradation is small compared with other methods because even tables with many hits can be joined with a small amount of memory. Data can be searched rapidly if the join column data has already been sorted, and the sort processing for merge join can be cancelled. |
If the data in the columns to be joined has not been sorted, the sort processing load increases, and the performance drops. |
Nested-loops-join | This method uses join column values from the outer table, searches the index defined in the join column of the inner table, and repeatedly processes nested matches. | Fast | Data can be searched rapidly if the inner table can be narrowed with the index specified in the join column. | If the hit count of the outer table is high, the performance drops because the index is used to search the inner table each time a row is fetched from the outer table. |
Hash join | This method creates a hash table from the join column of the inner table, hashes the join column of the outer table, and executes matching with the hash table that was created from the inner table. | Fast if the number of hits in the inner table is small (slower than next-loop-join, but faster than merge join) | Data can be searched rapidly when the hit count is low for the inner table and high for the outer table. | If the hit count in the inner table is high, memory usage becomes high. The performance drops because the hits for which memory is unavailable are first saved to a file. |
SELECT-APSL | If a condition contains a ? parameter, this method prepares several join method candidates, and determines the optimal search method when the value of the ? parameter is input. | Differs depending on the search method that is selected | The optimal search method can be selected when the value of the ? parameter is input. | The optimizing information collection utility (pdgetcst) must be executed.* Also, the SQL object size becomes large because several join methods are prepared. |
Distributed nested loops join | This method uses join column values of the outer table to match up the values in a foreign table, which is the inner table. (This method is applied when HiRDB External Data Access is installed in the system.) | Fast | Data can be retrieved quickly if the number of outer table hits is small and the number of inner table hits is large. | Because the foreign server containing the foreign table is searched each time a row is fetched from the outer table, the performance worsens as the number of outer table hits increases. |
Merge join is effective when the outer table cannot be narrowed very much.
Figure 4-17 Processing of SORT MERGE JOIN
Figure 4-18 Processing of KEY SCAN MERGE JOIN
Figure 4-19 Processing of LIST SCAN MERGE JOIN
Nested-loops-join is effective if an index is defined in the inner table, and the outer table can be narrowed significantly
Figure 4-20 Processing of NESTED LOOPS JOIN
Figure 4-21 Processing of R-LIST NESTED LOOPS JOIN
Figure 4-22 Processing of HASH JOIN
There are four hash join processing methods. Table 4-9 describes the hash join processing methods and features.
Table 4-9 Hash join processing methods and features
Processing method | Description | Advantages | Disadvantages | Selection method |
---|---|---|---|---|
Batch hash join | This method performs hash join by expanding the hash table that was created from the inner table into the buffer area for all work tables. | Hash join can be processed rapidly because this method expands the entire hash table in the work table buffer area before executing hash join. | If the hash table for the inner table is large, the system's capability to execute SQL statements simultaneously is diminished because the work table work area becomes large. | Change the hash table size.* |
Bucket partitioning hash join | This method partitions the inner and outer tables into several buckets, creates a hash table from some of the inner table buckets, and expands it in the work table buffer area. This method then saves the remaining buckets in a work table file and reads the contents of inner table buckets that were expanded in the work table buffer area and the outer table buckets of the area with the same value. The method then expands the inner table from the work table file to the work table buffer area a little at a time, and executes hash join. The amount of memory used becomes small, and the processing performance drops slightly. |
Hash join can be executed in environments that have a small work table buffer area. | Because the rows of the inner and outer table are first saved to a work table file, the performance drops compared to when hash join is executed with only work table buffer area. | Change the hash table size.* |
Continuous hash join | When three or more tables are searched, this method creates hash tables from the tables, except the outermost table, expands the hash tables in the work table buffer area, and executes hash join in succession. The amount of memory used becomes large, and the processing performance improves. |
Hash join can be processed rapidly because this method expands the entire hash table in the work table buffer area before executing hash join. Also, hash join can be processed rapidly when only the outermost table is large. |
When the number of tables to be executed becomes large, the work table buffer area that is used becomes large. | This method cannot be selected. The HiRDB system automatically selects the optimal method based on the number of table rows. |
Intermittent hash join | When three or more tables are searched, this method executes hash join by saving the join results to a work table file each time tables or work tables are joined. | Hash join involving three or more tables can be executed even in environments that have a small work table buffer area. | The number of I/O operations increases and performance drops because the join results are first saved to a file each time tables or work tables are joined. | This method cannot be selected. The HiRDB system automatically selects the optimal method based on the number of table rows. |
The processing methods are summarized as follows.
Figure 4-23 Processing method of batch hash join
Figure 4-24 Processing method of bucket partitioning hash join
Figure 4-25 Processing method of continuous hash join
Figure 4-26 Processing method of intermittent hash join
SELECT-APSL is a method that dynamically determines the join method during SQL execution.
Figure 4-27 Processing method of DISTRIBUTED NESTED LOOPS JOIN
Figure 4-28 CROSS JOIN processing method
All Rights Reserved. Copyright (C) 2007, Hitachi, Ltd.