5.2.2 Criteria for selecting row store tables and column store tables
This section explains the criteria for selecting whether to define a table as a row store table, or as a column store table.
- Organization of this subsection
(1) Restrictions on column store tables
The following operational restrictions apply to column store tables. If operational issues could occur as a result of any of these restrictions, define the table as a row store table.
-
You cannot define a text index for a column store table.
-
You cannot define a column store table as an archivable multi-chunk table.
-
You cannot define a column store table as a FIX table.
(2) Criteria for selecting whether to define a table as a row store table or column store table
Use the following criteria to determine whether to define a table as a row store table or a column store table:
Manner of data retrieval from table or salient characteristic of table being defined |
Suitable table type |
|
---|---|---|
Manner of data retrieval from table |
Data is retrieved in the following manner:
|
Row store table |
Data is retrieved in the following manner:
|
Column store table |
|
Operation performed for the table to be defined |
|
Row store table |
Size of table being defined |
|
Row store table |
|
Column store table |
|
Use in a data warehouse |
|
Row store table |
|
Column store table |
- #
-
You can prevent degradation of retrieval performance for column store tables by enabling the updated-row columnizing facility. For details about the updated-row columnizing facility, see 11.18 Using the updated-row columnizing facility (maintaining the retrieval performance for column store tables).
- Important
-
A table suited to definition as a column store table is also generally best defined as a multi-chunk table. For this reason, we recommend that you define these tables as multi-chunk column store tables. For details about the points to consider when defining a table as a multi-chunk table, see 5.2.4 Points to consider in defining a multi-chunk table.
To define a table as a row store table, either specify ROW in the STORAGE FORMAT option in the CREATE TABLE statement, or omit the STORAGE FORMAT option altogether. To define a table as a column store table, specify COLUMN in the STORAGE FORMAT option in the CREATE TABLE statement.
For details about the CREATE TABLE statement, see Definition SQL in the manual HADB SQL Reference.
(3) B-tree indexes used when retrieving data from column store tables
A B-tree index defined for a column store table can be used to search that column store table only in limited circumstances. The following are typical cases in which a B-tree index is used for a column store table. We recommend that you define a B-tree index for a column store table only when the range of the processing target can be narrowed as shown in these cases.
-
Case where an index that will be used for index specification is specified when an SQL statement is executed
-
Case where the minimum value is obtained by specifying the set function MIN or the maximum value is obtained by specifying the set function MAX
-
Case where the UPDATE or DELETE statement is executed by specifying a search condition
In cases such as accessing data for a specific year or month, we recommend that you define range indexes and use them to narrow the search range.
For details about the rules that govern the use of B-tree indexes when retrieving data from column store tables, see B-tree indexes and text indexes used during execution of SQL statements in the HADB Application Development Guide.
- Note
-
The updated-row columnizing facility is not applied to column store tables for which B-tree indexes are defined. For details about the updated-row columnizing facility, see 11.18 Using the updated-row columnizing facility (maintaining the retrieval performance for column store tables).
(4) Column-data compression types for column store tables
When importing data into a column store table, the data in each column is compressed using one of the compression types listed in the following table. This means that the same data will be smaller in a column store table than in a row store table. When importing data, the HADB server automatically selects one of the compression types in the following table based on the data being imported.
No. |
Compression type |
Description |
---|---|---|
1 |
Non-compression (NONE) |
The data is stored uncompressed. |
2 |
Run-length encoding (RUNLENGTH) |
When the same data value is repeated, the data is stored as the data value and number of consecutive occurrences. This compression type is effective on data where the same value occurs in consecutive elements. Compression example: AAAABB → 4A2B |
3 |
Delta encoding (DELTA) |
Data is stored as the first data value and the subsequent differences between adjacent data values. This compression type is effective when data is concentrated around a specific value. Compression example: 100,99,101,103 → 100,-1,2,2 |
4 |
Delta run-length encoding (DELTA_RUNLENGTH) |
This compression type applies run-length encoding to delta-encoded data. This compression type is effective on data that is incremented by a certain value (data that follows an arithmetic progression). This compression type is effective on data such as slip numbers that are incremented by one each time. |
5 |
Dictionary encoding (DICTIONARY) |
Data that appears often is stored in a dictionary, and the data in the table contains only index references to the dictionary. This compression type is effective on data such as character string data with low cardinality (low uniqueness of data). The maximum dictionary size is 16 kilobytes. |
As shown in the preceding table, there are compression methods that eliminate redundancy among data values, and those that make use of the differences between values. This means that compression is less effective when applied to data that has little repetition or little difference between data items, such as character string data and binary data of 128 bytes and longer.
- Note
-
-
Upon completing data import, HADB server outputs the compression type it selected in a message.
-
The compression type is determined when data is imported. Therefore, if the pattern of data changes while data is being imported into a table, the compression type might change even within the same column.
-
HADB server selects the compression type at the level of the individual threads that store the data being imported.
-
HADB server calculates the compression rate for the first 16 megabytes of column data to be stored, and selects the compression type with the best compression rate. If a poor compression rate means the size of the compressed data is the same as or larger than the uncompressed data, the HADB server selects non-compression as the compression type.
-
Note that you can also specify the compression type for each column in a column store table when performing the following operations. In this case, the column data will always be compressed using the specified compression type.
-
Using the CREATE TABLE statement to define a column store table
-
Using the ALTER TABLE statement to add a column to a column store table
- ■Relationship between a column store table for which the UPDATE statement or DELETE statement is executed and the compression type
-
In the case of column store tables that have an update as a prerequisite, we recommend specifying a compression type other than a delta compression type#1. Do not let HADB automatically select the compression type#2.
- Reason:
-
When an UPDATE statement or DELETE statement is executed on data in column-store format, HADB extends the length of all columns for all the targeted row data. When extending the length for DELTA compression, HADB needs to calculate the data values from the beginning of the data to the data subject to the lengthening. As a result, processing to extend the length of the columns takes longer than other compression methods. Therefore, the time required to execute the UPDATE statement or DELETE statement takes longer when a DELTA compression method is used for a column store table.
- #1
-
In the COMPRESSION TYPE specification in a CREATE TABLE statement, we recommend specifying a compression type other than DELTA and specifying something other than automatic selection (AUTO).
- #2
-
If the compression type specification is omitted, or if AUTO is specified for the compression type specification, HADB automatically selects the compression method. As a result, HADB might apply a DELTA compression method.
(5) Examples of when to use row store tables and column store tables
This subsection explains when you might use a row store table and when you might use a column store table, based on the following operational example:
- Operational example
-
Company A intends to operate a data warehouse for the purpose of analyzing the sales data collected from its sales offices. The analysis axes for sales data are dates, sales offices, products, and customers. Company A decided to implement the following star schema as the schema model for its data warehouse.
Figure 5‒2: Data warehouse schema model for analysis of Company A's sales data
In the preceding schema, the SALES table is a fact table. The PRODUCTS table, CUSTOMERS table, OFFICES table, and CALENDAR table are dimension tables. The following explains the data stored in each table:
-
SALES table
The SALES table stores the past 10 years of sales data. This consists of such data as product IDs of sold products, the number of products sold, unit prices, customer IDs, IDs of offices whose sales contributed to the totals, and the dates on which sales occurred. Because approximately 100,000 items of data are stored per day, this table will ultimately store approximately 360,000,000 items of sales data.
The data stored as product IDs, customer IDs, office IDs, and sales dates in this table serve as primary keys (foreign keys) for the PRODUCTS, CUSTOMERS, OFFICES, and CALENDAR tables respectively.
-
PRODUCTS table
The PRODUCTS table stores data about the products handled by Company A (such as product IDs, names, and categories).
-
CUSTOMERS table
The CUSTOMERS table stores data about the customers of Company A (such as their customer IDs, names, industries, and addresses).
-
OFFICES table
The OFFICES table stores data about the sales offices of Company A (such as office IDs, names, and addresses).
-
CALENDAR table
The CALENDAR table stores dimension data related to dates and times (date, month, business quarter, year).
When activity such as compiling monthly sales data and analyzing sales at the customer and office level takes place in relation to the preceding schema model, the following forms of data retrieval are likely to occur frequently:
-
Retrieving all data in the SALES table
-
Retrieving data in the SALES table for a specific period (such as the current month or the previous month)
Therefore, the SALES table is defined as a column store table. A range index is defined for the column that stores dates of sale.
The PRODUCTS table, CUSTOMERS table, OFFICES table, and CALENDAR table are likely to be joined using the foreign keys stored in the SALES table. Because these tables contain a relatively small number of data items, they are defined as row store tables. A B-tree index is defined for columns that correspond to primary keys.