Hitachi

Hitachi Advanced Database Setup and Operation Guide


11.4.23 Operation taking background import and chunks into consideration (Example 1: Adding and deleting data on a regular basis)

This subsection provides an operation example that takes into consideration background import and chunks. When you use background import, you can repeat data addition and deletion on a regular basis, as described below.

In the following description, the operation of Company A is used as an example.

■ Operation example of Company A

Company A, which has about 1,000 branch stores across the country, has adopted an analysis system for analyzing product purchasing histories. The database of Company A's analysis system is managed in the following manner:

  • The data of purchase histories of about 1,000 branch stores across the country is centrally managed by using a purchasing history table in the database of the analysis system.

  • The database of the analysis system also contains the shop table and product table that are used to manage the master data other than the purchasing history data, in addition to the purchasing history table.

  • Purchasing history data is collected from all branch stores once a day, and is collectively imported into the purchasing history table.

  • The system analyzes the product purchasing history data while importing the data into the purchasing history table.

  • Product purchasing history analysis can target a period of six months, three months, one month, one week, and one day.

  • To prevent degradation of the system's retrieval performance, every month, the purchasing history data chunks of the immediately preceding month are merged into a single chunk (to suppress increases in the number of chunks).

  • The analysis processing of Company A requires the purchasing history data of the past six months. Therefore, the purchasing history table, which stores purchasing history data, stores the data collected in the past six months.

  • The data older than the data collected in the last six months is backed up and then deleted to suppress increase of the data size of the purchasing history table.

  • The purchasing history table data is backed up every month.

  • If the purchasing history data has incorrect information, correct it.

■ Database schema model of the analysis system

The following figure shows the database schema model of the analysis system used by Company A.

Figure 11‒25: Database schema model of Company A's analysis system

[Figure]

Explanation:
  • In the preceding star schema, the purchasing history table (PURCHASE) functions as a fact table. The purchasing history table imports data every day, and is used for retrieval during analysis even while data is being imported. To enable simultaneous execution of data import and retrieval, the purchasing history table is defined as a multi-chunk table. In addition, the purchasing history table is also defined as a column store table because value summation for a specific field frequently occurs (for example, to obtain the average or total) during analysis. (3) Adding new data (background import) and the following subsections describe operations on the purchasing history table.

  • To manage the master data other than the purchasing history data, the shop table (SHOPLIST), product table (ITEMLIST), customer table (CUSTOMER), and trade time table (TRADETIME) are defined. These tables function as dimension tables in the star schema. These tables are defined as a single-chunk table that is a row store table.

■ Overview of operations related to the purchasing history table (PURCHASE)

The following figure shows an overview of operations related to the purchasing history table.

Figure 11‒26: Overview of operations related to the purchasing history table

[Figure]

Organization of this subsection

(1) Enabling the updated-row columnizing facility

Because the UPDATE statement is sometimes used to update the data of the purchasing history table (PURCHASE), execute the adbcolumnize command to enable the updated-row columnizing facility.

■ Command execution example
adbcolumnize --start
Note

If the UPDATE or INSERT statement is executed for a column store table, the updated or added data is stored in row store format. This might degrade the retrieval performance for the column store table. If the updated-row columnizing facility is enabled, HADB automatically converts the data from row store format to column store format, thus preventing degradation of retrieval performance for column store tables.

(2) Defining the database of the analysis system

Define the purchasing history table (PURCHASE) to centrally manage the purchasing history data of all branch stores. Also, define range indexes for the purchasing history table.

In addition, define the following four tables to manage the master data other than the purchasing history data:

(a) Defining the purchasing history table (PURCHASE)

Define the purchasing history table (PURCHASE) as follows:

  • Define the purchasing history table as a multi-chunk table because the table will be subject to background import.

  • Define the purchasing history table as a column store table because the table will be used as a fact table of a star schema.

  • Store the purchasing history table in DB area TBLDAT_001.

  • Set the maximum number of chunks in the purchasing history table to 44. The following shows how this value was estimated.

■ Estimating the maximum number of chunks in the purchasing history table

The purchasing history table will store the data of the past six months. This is the basis for estimating the maximum number of chunks to be specified in the chunk specification of the CREATE TABLE statement.

Assuming 31 days in a month, background import is performed 31 times a month, and therefore the number of chunks needed is 31. For the remaining five months, the number of chunks is kept low by grouping data into a single chunk every month in order to maintain a high level of retrieval performance.

Assuming a safety factor of 1.2, the maximum number of chunks that can be created is as follows:

maximum-number-of-chunks-to-be-created = ↑(31 + 5) × 1.2 (safety factor)↑ = 44
Note

If the number of chunks increases, this negatively impacts retrieval performance. Therefore, we recommend that you design and operate your database system to minimize the number of chunks that will be used.

Based on this estimate, specify 44 as the maximum number of chunks. Specify this value for CHUNK of the chunk specification of the CREATE TABLE statement. The following shows an example of specifying the CREATE TABLE statement to define the purchasing history table.

■ Example of specifying the CREATE TABLE statement
CREATE TABLE "PURCHASE"
            ("SHOP_ID"     INTEGER,
             "TRADE_TIME"  TIMESTAMP,
             "ITEM_ID"     INTEGER,
             "ITEM_NAME"   VARCHAR(100),
             "CUSTOMER_ID" INTEGER,
             "PURCHASE_ID" CHAR(8))
    IN "TBLDAT_001"
    CHUNK=44                                ...1
    STORAGE FORMAT COLUMN                   ...2
Explanation:
  1. This specification sets the maximum number of chunks. This specification defines the purchasing history table as a multi-chunk table.

  2. This specification defines the purchasing history table as a column store table.

For details about the CREATE TABLE statement, see CREATE TABLE (define a table) in Definition SQL in the manual HADB SQL Reference.

(b) Defining range indexes for the purchasing history table (PURCHASE)

Define range indexes for the purchasing history table (PURCHASE). Define the range indexes so that they are stored in DB area RIXDAT_001.

Define range indexes for the trade time (TRADE_TIME column). The following shows an example of specifying the CREATE INDEX statement to define range indexes.

■ Example of specifying the CREATE INDEX statement
CREATE INDEX "TRADE_TIME_RIX"
   ON "PURCHASE"("TRADE_TIME") IN "RIXDAT_001" 
      EMPTY INDEXTYPE RANGE

For details about the CREATE INDEX statement, see CREATE INDEX (define an index) in Definition SQL in the manual HADB SQL Reference.

(c) Defining tables other than the purchasing history table (PURCHASE)

Define also the following tables, which are required to manage the master data other than the purchasing history data:

  • Shop table (SHOPLIST)

  • Product table (ITEMLIST)

  • Customer table (CUSTOMER)

  • Trade time table (TRADETIME)

Define the preceding tables as follows:

  • Define the preceding tables as single-chunk tables because they will not be subject to background import.

  • Define the preceding tables as row store tables because they will be used as dimension tables of a star schema.

  • Define the preceding tables so that they are stored in DB areas TBLDAT_002 to TBLDAT_005, respectively.

  • B-tree indexes must be defined for some columns of the preceding tables. The following lists the relevant columns. Define the B-tree indexes so that they are stored in DB areas IDXDAT_002 to IDXDAT_005, respectively.

    • Branch store code (SHOP_ID column) of the shop table (SHOPLIST)

    • Product ID (ITEM_ID column) of the product table (ITEMLIST)

    • Customer number (CUSTOMER_ID column) of the customer table (CUSTOMER)

    • Trade time (TRADE_TIME column) of the trade time table (TRADETIME)

(d) Table used in this operation example

The following figure shows the purchasing history table (PURCHASE) used in this operation example.

Figure 11‒27: Purchasing history table (PURCHASE) used in this operation example

[Figure]

(3) Adding new data (background import)

Once a day, new purchasing history data collected from branch stores all over the country is added to the purchasing history table (PURCHASE).

New data is added using background import. The group of data that is stored in the multi-chunk table in a single background import operation is managed together with its indexes as a group (chunk). For details about a chunk, see 2.14.2 Managing data in data-import units (chunks).

Note

By managing data in units of chunks, you can acquire backups and delete data on a chunk-by-chunk basis. When you perform background import or merging chunks, you can add a comment to a chunk. Adding a comment to a chunk makes it easier to identify a certain chunk you want to manage.

To perform background import, use the adbimport command with the -b option specified. The following shows an example of executing the adbimport command.

■ Command execution example
adbimport -u ADBUSER01 -p '#HelloHADB_01' -k "'" -s , -g 10
          -w /home/adbmanager/tmp
          -z /home/adbmanager/imp_file/imp_opt_file01.txt
          -b -m 'January 2014_daily'
          PURCHASE
          /home/adbmanager/imp_file/imp_data_path01.txt
Explanation:

Use background import to import the contents of the input data file, set up in the input data path file (/home/adbmanager/imp_file/imp_data_path01.txt), into the purchasing history table (PURCHASE).

For details about the adbimport command, see adbimport (Import Data) in the manual HADB Command Reference. Also, see 11.4.2 Storing data in a multi-chunk table (background import).

(4) Correcting the data imported by background import

If you find errors in the data that was imported by background import in (3) Adding new data (background import) and the amount of data to be corrected is small, use the UPDATE or DELETE statement to correct the data. The following shows an example of correcting one record by using the UPDATE statement.

■ Example of executing an SQL statement
UPDATE "PURCHASE" SET "ITEM_ID"=103 WHERE "PURCHASE_ID"='A0015309'

Explanation:

The preceding SQL statement changes the product ID (ITEM_ID) to 103 for the record whose purchase ID (PURCHASE_ID) is A0015309 in the purchase history table (PURCHASE).

For details about the UPDATE statement, see UPDATE (update rows) in the manual HADB SQL Reference.

If the amount of data to be corrected is large, we recommend that you correct the contents of the input data file and execute the adbimport command to replace the current data with the corrected data. For details about how to replace the current data with the corrected data, see (4) Operational example (replacing the data in a chunk) in 11.4.12 Changing the chunk status.

(5) Retrieving data

You can also retrieve data from the purchasing history table (PURCHASE) while data is being added to the table by background import.

By using background import to add data, you can import data even if another user is retrieving data from the same database (data can be imported during data retrieval).

The following figure shows the relationship between background import and data retrieval.

Figure 11‒28: Relationship between background import and data retrieval

[Figure]

The purchasing history data stored in the purchasing history table is analyzed for six-month, three-month, one-month, one-week, and one-day periods.

If range indexes are defined for the purchasing history table, which is a multi-chunk table, each range index holds the maximum and minimum values (a range) of the values existing in a key field within the chunk. Since chunks that store data not included in the search condition are skipped during retrieval, retrieval efficiency improves.

The following figure shows the processing that takes place when a range index is defined for the purchasing history table.

Figure 11‒29: Processing that takes place when a range index is defined for the purchasing history table

[Figure]

Explanation:

In this case, a range index is defined for the trade time (TRADE_TIME column). Therefore, during retrieval using the trade time, retrieval of chunks that do not store trade time data not included in the search condition is skipped.

(6) Merging chunks

By merging the data chunks of a month into one chunk in the purchasing history table (PURCHASE), you can prevent retrieval performance degradation that tends to result from an increase in the number of chunks. In the example shown later, one month's worth of data chunks (data chunks for January 2014) are merged into a single chunk.

(a) Identifying the chunks to be merged

To determine the chunk IDs of the chunks to be merged, retrieve the system table STATUS_CHUNKS as explained below.

The following shows an SQL statement example.

■ SQL statement example
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC
   WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='PURCHASE' 
      AND "CHUNK_COMMENT"='January 2014_daily'
■ Retrieval result
 CHUNK_ID
 --------------------
                    1
           (omitted)
                   31
Explanation:

In this example, the chunk IDs (1 to 31) of the chunks to be merged are identified by using the chunk comment ('January 2014_daily'), added during background import, as the key during retrieval of the STATUS_CHUNKS table.

Check the chunks to be merged and their chunk IDs based on the explanation in 11.4.8 Checking the chunk status and the number of chunks created.

(b) Checking whether chunk reorganization is necessary

For the merge-target chunks identified in (a) Identifying the chunks to be merged, you can check whether reorganization of the chunks is necessary. To do this, use the adbdbstatus command to obtain information about the need for reorganization.

The following shows an example of executing the command.

■ Command execution example
adbdbstatus -d reorginfo -n ADBUSER01.PURCHASE -c 1-31

Explanation:

In this example, the command obtains the information about the need for reorganization in terms of the chunk IDs of the merge-target chunks (1 to 31).

The information about the need for reorganization (Reorganization_necessity) is output for each chunk. If Reorganization_necessity is Recommended, perform reorganization of the chunk. For details about the procedure for reorganizing chunks, see (c) Reorganizing chunks.

After reorganizing chunks, merge them into a single chunk as described in (d) Merging chunks.

(c) Reorganizing chunks

As a result of the check in (b) Checking whether chunk reorganization is necessary, if you find that there are chunks for which the need for reorganization (Reorganization_necessity) is Recommended, reorganize them. This subsection describes the chunk reorganization procedure in an example case where the chunk with the chunk ID 2 needs to be reorganized.

■ Procedure for reorganizing chunks
  1. Stop the commands, jobs, and application programs that update the purchase history table (PURCHASE).

  2. Search the system table STATUS_CHUNK, and check the status and comment of the chunk with the chunk ID 2.

    ■ Example of executing an SQL statement

    SELECT CHUNK_ID, CHUNK_COMMENT, CHUNK_STATUS
      FROM "MASTER"."STATUS_CHUNKS"
        WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='PURCHASE'
          AND "CHUNK_ID"=2

    ■ Execution result of the SQL statement

    CHUNK_ID    CHUNK_COMMENT CHUNK_STATUS
    ----------- ------------- ------------------------
              2 daily         Normal
  3. Use the adbexport command to output the data of the chunk with the chunk ID 2.

    Execute the command twice: once by specifying the export option file (/home/adbmanager/exp_file/exp_opt_file01.txt), and once by specifying the output data path file (/home/adbmanager/exp_file/exp_filepath.txt).

    ■ Command execution example

    adbexport -u ADBUSER01 -p '#HelloHADB_01'
              -z /home/adbmanager/exp_file/exp_opt_file01.txt
              -n PURCHASE
              -c 2
              /home/adbmanager/exp_file/exp_filepath.txt

    ■ Content of the output data path file (/home/adbmanager/exp_file/exp_filepath.txt)

    /home/adbmanager/exp_file/purchase_data0.csv
    /home/adbmanager/exp_file/purchase_data1.csv
    /home/adbmanager/exp_file/purchase_data2.csv
    /home/adbmanager/exp_file/purchase_data3.csv
    /home/adbmanager/exp_file/purchase_data4.csv
    /home/adbmanager/exp_file/purchase_data5.csv
    /home/adbmanager/exp_file/purchase_data6.csv
    /home/adbmanager/exp_file/purchase_data7.csv
    /home/adbmanager/exp_file/purchase_data8.csv
    /home/adbmanager/exp_file/purchase_data9.csv
  4. Use the adbimport command to import the output data.

    Execute the command twice: once by specifying the import option file (/home/adbmanager/imp_file/imp_opt_file01.txt), and once by specifying the input data path file (/home/adbmanager/exp_file/exp_filepath.txt).

    For the input data path file, use the output data path file, as is, that you used when executing the adbexport command. For the -m option, specify the string 'daily' of the chunk comment that you checked in step 2.

    ■ Command execution example

    adbimport -u ADBUSER01 -p '#HelloHADB_01' -k "'" -s , -g 10
              -w /home/adbmanager/tmp
              -z /home/adbmanager/imp_file/imp_opt_file01.txt
              -b --status wait
              -m 'daily'
              PURCHASE
              /home/adbmanager/exp_file/exp_filepath.txt
  5. Use the adbchgchunkstatus command to change the status of the chunk.

    Change the status of the reorganization-target chunk (chunk with the chunk ID 2) and the chunk that you created in step 4 (chunk with the chunk ID 32).

    ■ Command execution example

    adbchgchunkstatus -u ADBUSER01 -p '#HelloHADB_01'
                      -w 2 -n 32 PURCHASE

    When the preceding command is executed, the chunk with the chunk ID 2 is placed in wait status, and the chunk with the chunk ID 32 is placed in normal status.

  6. Use the PURGE CHUNK statement to delete the reorganization-target chunk (chunk with the chunk ID 2).

    ■ Example of executing an SQL statement

    PURGE CHUNK PURCHASE WHERE CHUNKID=2
  7. Start the commands, jobs, and application programs that update the purchase history table (PURCHASE).

(d) Merging chunks

To merge chunks, use the adbmergechunk command. The following shows an execution example of the adbmergechunk command.

■ Command execution example
adbmergechunk -u ADBUSER01 -p '#HelloHADB_01' -g 2
              -w /home/adbmanager/tmp
              -z /home/adbmanager/merge_file/merge_opt_file01.txt
              -m 'January 2014'
              -c 1-31
              PURCHASE
Explanation:

Using the value specified in the merge chunk option file (/home/adbmanager/merge_file/merge_opt_file01.txt) as the input information, perform chunk merging on the purchasing history table (PURCHASE) as specified in the -c option of the adbmergechunk command.

In this example, the data for January 2014 is grouped together by merging chunks with chunk IDs 1 through 31 into a single chunk.

The following figure shows an example of chunk merging.

Figure 11‒30: Chunk merging example

[Figure]

Chunks with chunk IDs of 1 through 31 are automatically deleted when the adbmergechunk command terminates normally.

Note

Merging chunks can improve retrieval performance. There is also a limit to the number of chunks that can be managed by a table or DB area. If the number of chunks is insufficient, you can reduce the number by merging multiple chunks into a single chunk.

You can merge chunks even when another user is retrieving the merge-target chunks.

For details about the adbmergechunk command, see adbmergechunk (Merge Chunks) in the manual HADB Command Reference. Also, see 11.4.9 Merging chunks (to reduce the number of chunks).

(e) Checking whether reorganization of merged chunks is necessary

In this subsection, you merge the chunks of January 2014 into one chunk, and then execute the adbdbstatus command to check the information about the need for reorganization (Reorganization_necessity) of the chunk. The following shows the procedure.

Procedure

  1. Check the chunk ID of the merge-result chunk.

    ■ Example of executing an SQL statement

    SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC
        WHERE "TABLE_SCHEMA"='ADBUSER' AND "TABLE_NAME"='PURCHASE'
          AND "CHUNK_COMMENT"='January 2014'

    ■ Execution result of the SQL statement

     CHUNK_ID
     -----------------
                    32

    Explanation:

    By executing the preceding SQL statement, you can check the chunk ID of the chunk into which the chunks of January 2014 have been merged. In this example, the chunk ID of the merge-result chunk is 32.

  2. Check the information about the need for reorganization.

    ■ Command execution example

    adbdbstatus -d reorginfo -n ADBUSER01.PURCHASE -c 32

    This command outputs the information about the need for reorganization (Reorganization_necessity) of the chunk whose chunk ID is 32. If Reorganization_necessity is Recommended, perform reorganization of this chunk.

(7) Backing up data

To prevent data size of the purchasing history table (PURCHASE) from increasing, the purchasing history table is used to store the data for only the past six months. The data older than six months is backed up and then deleted. A backup is created once a month.

The following describes the detailed procedure.

(a) Identifying the chunks with the earliest creation date and time

Chunks that are older than 6 months (chunks with the earliest creation date and time) are identified in the purchasing history table (PURCHASE). To identify these chunks in the purchasing history table, retrieve the system table STATUS_CHUNKS as described below.

The following shows an SQL statement example.

■ SQL statement example
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC
   WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='PURCHASE'
      AND "CREATE_TIME"=(SELECT MIN("CREATE_TIME")
         FROM "MASTER"."STATUS_CHUNKS"
            WHERE TC."TABLE_SCHEMA"='ADBUSER01'
               AND TC."TABLE_NAME"='PURCHASE')
■ Retrieval result
 CHUNK_ID
 --------------------
                   32
KFAA96404-I 1 rows were selected.
Explanation:

When the STATUS_CHUNKS table is retrieved, the result shows that the chunk with chunk ID 32 is the chunk with the earliest creation date and time (older than 6 months) in this example.

Note

When chunks are merged, the earliest creation date and time among the merge-source chunks is stored in the STATUS_CHUNKS table.

For details about how to retrieve data from the STATUS_CHUNKS table, see C.9 Searching system tables.

(b) Backup acquisition

Now that the chunk ID of the chunk with the earliest creation date and time in the purchasing history table (PURCHASE) has been identified as 32, back up the data in that chunk in a CSV file.

To make a backup, use the adbexport command. The following shows an execution example of the adbexport command:

■ Command execution example
adbexport -u ADBUSER01 -p '#HelloHADB_01'
          -z /home/adbmanager/exp_file/exp_opt_file01.txt
          -n PURCHASE
          -c 32
          /home/adbmanager/exp_file/exp_data_path01.txt
■ Content of the output data path file (/home/adbmanager/exp_file/exp_data_path01.txt)
/home/adbmanager/exp_backup/backup0001.csv
/home/adbmanager/exp_backup/backup0002.csv
/home/adbmanager/exp_backup/backup0003.csv
Explanation:

The data contained in the chunk with chunk ID 32 is backed up in CSV files (/home/adbmanager/exp_backup/backup0001.csv, /home/adbmanager/exp_backup/backup0002.csv, and /home/adbmanager/exp_backup/backup0003.csv).

In this example, since chunks are merged every month, one month's worth of data is backed up.

Note

Although you can back up the files into a single file, distributing the files into multiple files can improve performance by distributing the output-related overhead.

For details about the adbexport command, see adbexport (Export Data) in the manual HADB Command Reference. Also, see 11.4.5 Exporting data in units of chunks.

(8) Deleting old data (chunks)

To prevent the size of the purchasing history table (PURCHASE) from increasing, delete any data that is older than six months.

Delete the chunk with the earliest creation date and time from the purchasing history table (PURCHASE). To delete a chunk, execute the PURGE CHUNK statement with a chunk ID specified in the search condition.

■ SQL statement example
PURGE CHUNK "PURCHASE" WHERE CHUNKID=32
Explanation:

The data contained in the chunk with chunk ID 32 is deleted.

In this example, since chunks are merged every month, one month's worth of data is deleted.

For details about the PURGE CHUNK statement, see PURGE CHUNK (delete all rows in a chunk) in Data Manipulation SQL in the manual HADB SQL Reference. Also, see 11.4.6 Deleting data in units of chunks.

(9) Repeatedly adding new data, merging, and deleting old data

By repeatedly performing background import, merging chunks, and deleting chunks using the PURGE CHUNK statement, you can continue to analyze purchasing histories without increasing the volume of data managed by HADB.