12.3.2 Types of table row partitioning

There are two ways to partition a table by rows:

These two methods of table row partitioning are explained as follows.

Organization of this subsection
(1) Key range partitioning
(2) Hash partitioning
(3) Differences among key range partitioning, flexible hash partitioning, and FIX hash partitioning
(4) Specification rules when table row partitioning is defined
(5) Examples of key range partitioning (with storage condition specified)
(6) Example of key range partitioning (with boundary values specified)
(7) Example of flexible hash partitioning and FIX hash partitioning

(1) Key range partitioning

Key range partitioning divides a table into groups of rows on the basis of ranges of values in a specific column in the table. The column used as the condition for table row partitioning is called the partitioning key.

When this method is used, it is possible to tell which table data is stored in which RDAREA.

Row partitioning can be specified in two ways:

(a) By specifying storage conditions

Comparison operators are used to specify conditions for determining which table data is to be stored in each RDAREA. Only one range of storage condition values can be specified for each RDAREA.

(b) By specifying boundary values

Literals are used to specify the boundary values of the data to be stored in each RDAREA. Multiple ranges determined by boundary values can be specified for one RDAREA. Boundary values can also be specified with matrix partitioning. For details about matrix partitioning, see 12.4 Table matrix partitioning.

(2) Hash partitioning

Hash partitioning uses the values of a table column as a hash function for dividing the table and storing it uniformly in RDAREAs. The column specified for partitioning the table is called the partitioning key. This method is used to distribute the table data uniformly among the RDAREAs without having to deal with key ranges. Hash partitioning can be combined with key range partitioning with boundary values specified to achieve matrix partitioning. For details about matrix partitioning, see 12.4 Table matrix partitioning.

The two types of hash partitioning are flexible hash partitioning and FIX hash partitioning.

When a table is divided and stored in multiple RDAREAs using flexible hash partitioning, there is no way to know which data is stored in which RDAREA. Therefore, in order to search for particular data in the table, all back-end servers containing the table are subject to search processing.

When a table is partitioned using FIX hash partitioning, HiRDB identifies which table data is stored in which RDAREA. Therefore, only the back-end server believed to contain the desired data is subject to search processing.

(a) Selecting the partitioning key

The key selected as the partitioning key should satisfy the following conditions:

For hash partitioning, either a single column or multiple columns can be selected as the partitioning key. If a single column is specified and there are too few different values in the column for purposes of partitioning or the key values are unevenly distributed, data may not be divided uniformly. In this case, more than one column on which partitioning can be based should be specified in order to distribute the data more uniformly among the RDAREAs.

(b) Types of hash functions

Twelve hash functions are available for hash partitioning, as listed in the following table.

Table 12-2 Hash functions

Hash functionExplanation
HASH0This hash function can be used with data types DATE, TIMESTAMP, CHAR(8) #1 or CHAR(6)#1 in the columns specified for partitioning. It uses year and month values to hash data and allocate, by circulating on a monthly basis, the RDAREAs in which the data is stored. You need to specify a single column for partitioning keys.
HASH1This hash function can be used with all data types in the columns specified for partitioning. It hashes all bytes#2 of the data from all columns specified for partitioning. This hash function can be specified for columns whose data length is 0 or more bytes.
HASH2This hash function can be used with all data types in the columns specified for partitioning. It hashes all bytes#2 of the data from all columns specified for partitioning. This hash function can be specified for columns whose data length is 0 or more bytes.
You use this hash function when HASH1 is unable to store the data uniformly among the RDAREAs.
HASH3Use this hash function when the data type of the columns to be partitioned is INTEGER or SMALLINT. It hashes the final 2 bytes#2 of all partitioned columns. This hash function can be specified only for columns whose data length is at least 2 bytes.
HASH4Use this hash function when the data type of the columns to be partitioned is DATE. It hashes the initial 4 bytes#2 of all partitioned columns. This hash function can be specified only for columns whose data length is at least 4 bytes.
HASH5Use this hash function when the data type of the columns to be partitioned is TIME. It hashes the initial 3 bytes#2 of all partitioned columns. This hash function can be specified only for columns whose data length is at least 3 bytes.
HASH6This hash function can be used with all data types in the columns specified for partitioning. The most appropriate type is DECIMAL. It hashes all bytes#2 of the data from all columns specified for partitioning. This hash function can be specified for columns whose data length is 0 or more bytes.
HASHAThis hash function can be used with all data types in the columns specified for partitioning. It hashes all bytes#2 of the data from all columns specified for partitioning. This hash function can be specified for columns whose data length is 0 or more bytes.
HASHBThis hash function can be used with all data types in the columns specified for partitioning. It hashes all bytes#2 of the data from all columns specified for partitioning. This hash function can be specified for columns whose data length is 0 or more bytes.
You use this hash function when HASHA is unable to store the data uniformly among the RDAREAs.
HASHCUse this hash function when the data type of the columns to be partitioned is INTEGER or SMALLINT. It hashes the final 2 bytes#2 of all partitioned columns. This hash function can be specified only for columns whose data length is at least 2 bytes.
HASHDUse this hash function when the data type of the columns to be partitioned is DATE. It hashes the initial 4 bytes#1 of all partitioned columns. This hash function can be specified only for columns whose data length is at least 4 bytes.
HASHEUse this hash function when the data type of the columns to be partitioned is TIME. It hashes the initial 3 bytes#2 of all partitioned columns. This hash function can be specified only for columns whose data length is at least 3 bytes.
HASHFThis hash function can be used with all data types in the columns specified for partitioning. The most appropriate type is DECIMAL. It hashes all bytes#2 of all data of all partitioned columns. This hash function can be specified for columns whose data length is 0 or more bytes.
Note 1
  • If the table is not a rebalancing table, specify one of HASH0-HASH6. HASH6 provides the most uniform hashing result, so normally you should specify HASH6. However, depending on the data in the partitioning key, the hashing result may not be uniform. In such a case, specify one of the other hash functions.
  • If the table is a rebalancing table, specify one of HASHA-HASHF. HASHF provides the most uniform hashing result, so normally you should specify HASHF. However, depending on the data in the partitioning key, the hashing result may not be uniform. In such a case, specify one of the other hash functions.
Note 2
A LOB column cannot be specified as the partitioning key.
#1: When the data type is CHAR(8) or CHAR(6), use the following format:
  • For CHAR(8): 'YYYYMMDD'
  • For CHAR(6): 'YYYYMM'
    YYYY: 0001 to 9999 (Year)
    MM: 01 to 12 (month)
    DD: 01 to the final day of the month of that year (date)
#2: For the VARCHAR, MVARCHAR, and NVARCHAR types, hashing ignores any trailing spaces. For the DECIMAL, INTERVAL YEAR TO DAY, and INTERVAL HOUR TO SECOND types, if the sign is F, it must be changed to C for hashing.
(c) Selecting a hash function
Selecting an appropriate hash function by actually storing data in a database
To select a hash function, use the following procedure:
  1. Specify a hash function appropriate to the partitioning key.
  2. Use the database condition analysis utility (pddbst) to check the number of rows stored in each RDAREA.
  3. If there is an uneven distribution in the number of rows stored in the RDAREAs, change the hash function so that a uniform number of rows is stored in each RDAREA.
  4. If the method described in step 3 does not result in an even distribution of stored rows, duplicate the specification of an RDAREA with fewer rows stored to make the number of stored rows even. The following figure shows an example.

    Figure 12-4 Example of duplicating a table storage RDAREA specification using hash partitioning

    [Figure]

Selecting an appropriate hash function by creating a UAP that uses a hash function for table partitioning
To select a hash function, use the following procedure:
  1. Create an application program that locates an uneven distribution in the number of data items in each RDAREA by using the hash functions for table partitioning (function that outputs a partitioning condition specification sequence from the data values for the partitioning key), available from a HiRDB library.
  2. For each hash function, obtain the number of data items in the partitioning condition specification sequence that is output by the hash function for table partitioning, then select the most evenly distributed hash function.

For details about how to create a UAP for using a hash function for table partitioning, see the HiRDB Version 9 UAP Development Guide.

(d) Times when hash functions are used

Hash functions are used at the following times:

(3) Differences among key range partitioning, flexible hash partitioning, and FIX hash partitioning

The following table lists and explains the differences between key range partitioning, flexible hash partitioning, and FIX hash partitioning.

Table 12-3 Differences between key range partitioning, flexible hash partitioning, and FIX hash partitioning

DifferenceKey range partitioningFlexible hash partitioningFIX hash partitioning
Database designKey ranges must be taken into account when database is designed.There is no need to take key ranges into account when database is designed.There is no need to take key ranges into account when database is designed.
RetrievalOnly back-end servers that may contain the applicable data according to the search condition are subject to retrieval processing.All back-end servers containing the table are subject to retrieval processing.When an = or an IN predicate for a partitioning column is specified in a search condition, only RDAREAs that might contain the data are searched. However, when HASH0 is used as the hash function, RDAREAs subject to retrieval processing when an = predicate, IN predicate, range condition (using <, >, <=, or >=), BETWEEN predicate, LIKE predicate (starts-with comparison), or SIMILAR predicate (starts-with comparison) is specified are searched.#1
Support for increasing the amount of dataIf keys increase, data may be concentrated in some RDAREAs.Data is already stored uniformly in RDAREAs even if data increases.Data is already stored uniformly in RDAREAs even if data increases.
Handling of RDAREA shutdownSQLs can be executed if their search condition is specified so that no shutdown RDAREA is accessed.If even one of the RDAREAs containing the table subject to retrieval is shut down, SQLs cannot be executed regardless of their search condition.SQLs can be executed if their search condition is specified so that no shutdown RDAREA is accessed.#2
Change in number of table partitionsTable must be re-created and reorganized.ALTER TABLE can be used to add RDAREAs, and reorganization of the table is not required.Table must be re-created and reorganized. ALTER TABLE can be used to add RDAREAs only if the table contains no data.
Data loading and reloading in units of RDAREAsData is checked to see if it is the correct data to be stored in the corresponding RDAREAs.Data is not checked to see if it is the correct data to be stored in the corresponding RDAREAs.Data is checked to see if it is the correct data to be stored in the corresponding RDAREAs.
Method for creating an input data file by RDAREA during data loadingInput data is classified by RDAREA taking into account the key ranges.Input data is classified so that the number of data items per RDAREA becomes uniform.An application is created using a hash function for table partitioning,#3 and input data is classified by RDAREA.
Updating of partitioning keyUpdating must use existing values.Can be updated.Updating must use existing values.
UNIQUE definition for cluster key and index definition with UNIQUE specifiedUNIQUE cannot be specified.UNIQUE cannot be specified.UNIQUE can be specified.
Changing the partition storage conditions by ALTER TABLEPartitioning storage conditions can be changed for the following method:
  • Boundary value specification
Storage condition specification (only = is used for the storage condition comparison operator)
Partition storage conditions cannot be changed. RDAREAs can be added by ALTER TABLE.Partition storage conditions cannot be changed. RDAREAs cannot be added by ALTER TABLE.
#1: The ASSIGN LIST statement results in a workload on the back-end servers to which the search condition is not applied.
#2: The ASSIGN LIST statement handles the entire table as being shut down.
#3: For details about how to create a UAP for using a hash function for table partitioning, see the HiRDB Version 9 UAP Development Guide.

(4) Specification rules when table row partitioning is defined

When table row partitioning is defined, the following specification rules apply:

#1: A column or repetition column of any of the following data types cannot be specified as the partitioning key:
  • CHAR, VARCHAR, MCHAR, or MVARCHAR type whose defined length is 256 bytes or greater
  • NCHAR or NVARCHAR type whose defined length is 28 characters or greater
  • BLOB type
  • BINARY type
  • Abstract data type
  • TIMESTAMP type whose decimal places precision is greater than 0
  • TIMESTAMP type whose default value is CURRENT_TIMESTAMP USING BES
#2: When multiple storage conditions are specified, the conditions are evaluated in the order they were specified, and data is stored in the RDAREA that is specified in the first storage condition whose evaluation result is true. If none of the conditions results in true, the system stores data in the RDAREA for which no storage condition was specified. If there is no such RDAREA, data is not stored in any of the RDAREAs. The table definition is invalid if it contains an RDAREA in which no row is stored as a result of evaluating the conditions.
#3: A literal is specified for a boundary value. A character string literal with a length of 0 is not permitted. If you specify multiple boundary values, they must be specified in ascending order. Also, you must specify at the end an RDAREA for which no boundary value is specified.

(5) Examples of key range partitioning (with storage condition specified)

The following figure shows an example of key range partitioning (with a storage condition specified).

Figure 12-5 Example of key range partitioning with a storage condition specified

[Figure]

Explanation:
The STOCK table is partitioned and stored in two user RDAREAs (USR01 and USR02) using ranges of values in the product code (PCODE) column as the condition; the specified ranges are 100L-399S and 400L-699S.

(6) Example of key range partitioning (with boundary values specified)

The following figure shows an example of key range partitioning (with boundary values specified).

Figure 12-6 Example of key range partitioning with boundary values specified

[Figure]

Explanation:
The STOCK table is partitioned and stored in two user RDAREAs (USR01 and USR02) using values in the product code (PCODE) column as boundary values; the specified boundary values are 302S and 591S.

(7) Example of flexible hash partitioning and FIX hash partitioning

The following figure shows an example of flexible hash partitioning and FIX hash partitioning.

Figure 12-7 Example of flexible hash partitioning and FIX hash partitioning

[Figure]

Explanation:
The STOCK table is partitioned and stored in two user RDAREAs (USR01 and USR02) using the product code (PCODE) column as the partitioning key and using hash function 6.
The target RDAREAs for storage of actual data may differ from this example.