12.21.7 How to measure the data compression rate
(1) How to measure the data compression rate before data is compressed and stored
The data compression rate depends greatly on the nature of the data to be compressed. An exact compression rate cannot be obtained until after data has actually been compressed and stored. However, you can obtain an estimate of the compression rate by using the approximate length of the data compressed by gzip.# The formula is shown below.
- #
- gzip uses a compression algorithm that is equivalent to the one used by HiRDB (Deflate).
- Formula:
compression rate (%) = {(data length after compression obtained by gzip x 1.05#) data length before compression} x 100 |
- #
- An extra 5% is added to the compressed data size because of differences between zlib and gzip in the format of the headers that are added during compression processing for managing compression information.
(2) How to measure the data compression rate after data has been compressed and stored
The following shows the formula for obtaining an approximate compression rate after data has actually been compressed and stored.
- Formula:
compression rate (%) = (sum of data lengths after compression#1 sum of data lengths before compression#2)#3 x 100 |
- #1: The following shows the calculation procedure:
- Execute the database condition analysis utility (pddbst) with the -d option specified for each RDAREA or table.
- Based on the <BINARY segment> information in the output results, use the following formula to obtain the length of the compressed data:
10
ni x a x b i=1 |
ni: Maximum value of each ratio indicated as Used Page Ratio (number of used pages for each ratio) for binary-only segments (for example, if Used Page Ratio is 1 to 10%, then the maximum value is 0.1 (10%); if it is 11 to 20%, the maximum value is 0.2 (20%)).
a: Page value corresponding to ni
b: Page size of binary-only segment
- If the RDAREA or table processed in step 1 contains both compressed and uncompressed columns of the BINARY type, the data lengths of the uncompressed columns are subtracted from the results obtained in step 2. The data lengths of uncompressed columns are obtained by executing the following SQL statement:
select sum (length(uncompressed-column-name)) from table-identifier [in RDAREA-name] |
- #2
- The length of the uncompressed data is obtained by executing the same SQL statement as is used for obtaining the data length of an uncompressed column (see step 3 in footnote #1).
- #3
- A value of 1.0 or greater means that the data length has increased after compression processing or that the effects of compression are small. In such a case, we recommend that you remove the compression specifications by changing the definitions of the columns to be compressed. For details about removing the compression specification, see 12.21.5 How to change the definition of a compressed column (removing the compression specification for a column).