Hitachi

Hitachi Advanced Database Setup and Operation Guide


11.4.24 Operation taking background import and chunks into consideration (Example 2: Using chunks in wait status)

This subsection provides an operation example of background import that uses chunks in wait status. When you use chunks in wait status, you can retrieve data or prevent it from being retrieved at a desired time, as explained below.

Explanations in this example assume that operations are taking place in the following system:

The following figure shows the flow of operations in this system.

Figure 11‒31: Flow of operations in this system (usage of chunks in wait status)

[Figure]

Organization of this subsection

(1) Defining a multi-chunk table and indexes

To manage product data centrally, you need to define a product table (ITEMLIST) and indexes.

(a) Defining the product table (ITEMLIST)

Define the product table (ITEMLIST) to be stored in the DB area TBLDAT_002. Also, define the product table as a multi-chunk table to perform background import.

In this operation example, the following number of chunks is required:

  • The chunk containing the current month's product data that is to be analyzed (one chunk)

  • The chunks containing product data sent from branch offices (11 chunks)

    Because there are 10 branch offices, 10 chunks are needed to import the data of those branch offices. In addition, one chunk is needed to merge those 10 chunks. Therefore, 11 chunks are needed in total.

  • The chunks containing product data that are not subject to analysis (six chunks)

Assuming a safety factor of 1.2, the maximum number of chunks that can be created is as follows:

maximum-number-of-chunks-to-be-created = ↑(1 + 11 + 6) × 1.2 (safety factor)↑ = 22
Note

If the number of chunks increases, this negatively impacts retrieval performance. Therefore, we recommend that you design and operate your database system to minimize the number of chunks that are used.

Based on this estimate, specify 22 as the maximum number of chunks. Specify this value for CHUNK of the chunk specification in the CREATE TABLE statement. The following shows an SQL statement example.

■ SQL statement example
CREATE TABLE "ITEMLIST"
      ("SHOP_ID" INT,"DELIVERY_TIME" TIMESTAMP,
       "ITEM_ID" INT, "ITEM_NAME" VARCHAR(100))
    IN "TBLDAT002" CHUNK=22

For details about the CREATE TABLE statement, see CREATE TABLE (define a table) in Definition SQL in the manual HADB SQL Reference.

(b) Defining indexes for the product table (ITEMLIST)

Define a B-tree index and a range index for the product table. Define these indexes so that they will be stored in DB areas IDXDAT_002 and RIXDAT_002, respectively.

Define a B-tree index for the product ID (ITEM_ID column) and a range index for the delivery time (DELIVERY_TIME column). The following shows an SQL statement example.

■ SQL statement example
CREATE INDEX "ITEM_ID_IDX"
   ON "ITEMLIST"("ITEM_ID") IN "IDXDAT_002"
      EMPTY
 
CREATE INDEX "DELIVERY_TIME_RIX"
   ON "ITEMLIST"("DELIVERY_TIME") IN "RIXDAT_002"
      EMPTY INDEXTYPE RANGE

For details about the CREATE INDEX statement, see CREATE INDEX (define an index) in Definition SQL in the manual HADB SQL Reference.

(c) Table used in this operation example

The following figure shows a portion of the table used in this operation example.

Figure 11‒32: Portion of the table used in this operation example

[Figure]

(2) Adding the next month's product data to chunks in wait status (background import)

The next month's product data is sent from each branch office by the 25th of each month. At the main office, the next month's product data is added to the product table (ITEMLIST) as it arrives from the branch offices.

Background import is used to add the next month's product data. During this process, the next month's product data needs to be kept excluded from analysis until the next month begins. Therefore, the data is added as chunks in wait status.

The group of data that is stored in the multi-chunk table in a single background import operation is managed together with its indexes as a group (chunk). For details about a chunk, see 2.14.2 Managing data in data-import units (chunks).

Note

By managing data in units of chunks, you can acquire backups and delete data on a chunk-by-chunk basis. When you perform background import or merge chunks, you can add a comment to a chunk. Adding a comment to a chunk makes it easier to identify a certain chunk you want to manage.

To perform background import to create a chunk in wait status, use the adbimport command with the -b and --status wait options specified. The following shows an example of executing the adbimport command.

■ Command execution example
adbimport -u ADBUSER01 -p '#HelloHADB_01' -k "'" -s , -g 10
          -w /home/adbmanager/tmp
          -z /home/adbmanager/imp_file/imp_opt_file02.txt
          -b --status wait
          -m 'October 2014'
          ITEMLIST
          /home/adbmanager/imp_file/imp_data_path02.txt
Explanation

Use background import to import the contents of the input data file, set up in the input data path file (/home/adbmanager/imp_file/imp_data_path02.txt), into the product table (ITEMLIST).

Whenever the next month's product data arrives from the ten branch offices, background import is performed. As a result, ten chunks in wait status are created.

For details about the adbimport command, see adbimport (Import Data) in the manual HADB Command Reference. Also, see 11.4.4 Temporarily excluding data to be imported to a multi-chunk table from retrieval (creating a chunk in wait status).

(3) Retrieving data

Even while the next month's product data is being added to the database, the current month's product data is retrieved and analyzed.

By using background import to add data, you can import data even if another user is retrieving data from the same database (data can be imported during data retrieval).

The following figure shows the relationship between background import and data retrieval.

Figure 11‒33: Relationship between background import that creates chunks in wait status and data retrieval

[Figure]

Although the current month's product data can be analyzed, the next month's product data cannot be analyzed until the new month begins because it is excluded from analysis (chunks in wait status cannot be retrieved).

(4) Merging chunks in wait status (to reduce the number of chunks)

Performing background import whenever the next month's product data arrives from the branch offices creates multiple chunks in wait status. When these chunks in wait status are changed to chunks in normal status to be included in the analysis, the increase in the number of chunks might degrade the retrieval performance.

Therefore, by merging the multiple chunks in wait status into a single chunk, you can prevent retrieval performance degradation that tends to result from an increase in the number of chunks. In this example, all product data that has been sent from the branch offices is first added to the database and is then merged into a single chunk.

(a) Identifying the chunks in wait status to be merged

To identify the chunk IDs of the chunks in wait status to be merged, retrieve the system table STATUS_CHUNKS as explained below.

The following show an SQL statement example.

■ SQL statement example
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC
   WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='ITEMLIST' 
      AND "CHUNK_COMMENT"='October 2014'
■ Retrieval result
 CHUNK_ID
 --------------------
                    2
           (omitted)
                   11
Explanation

In this example, the chunk IDs (2 to 11) of the chunks to be merged are identified by using the chunk comment ('October 2014'), added during background import, as the key during retrieval of the STATUS_CHUNKS table.

Check the chunks to be merged and their chunk IDs based on the explanation in 11.4.8 Checking the chunk status and the number of chunks created.

(b) Merging chunks in wait status

To merge chunks in wait status, use the adbmergechunk command. The following shows an execution example of the adbmergechunk command.

■ Command execution example
adbmergechunk -u ADBUSER01 -p '#HelloHADB_01' -g 2
              -w /home/adbmanager/tmp
              -z /home/adbmanager/merge_file/merge_opt_file02.txt
              -m 'October 2014_merge'
              -c 2-11
              ITEMLIST
Explanation

Using the values specified in the merge chunk option file (/home/adbmanager/merge_file/merge_opt_file02.txt) as the input information, perform chunk merging on the product table (ITEMLIST) as specified in the -c option of the adbmergechunk command.

In this example, the next month's product data (the data for October 2014) is grouped together by merging chunks with chunk IDs 2 through 11 into a single chunk in wait status.

The following figure shows an example of merging chunks in wait status.

Figure 11‒34: Example of merging chunks in wait status

[Figure]

Chunks with chunk IDs 2 through 11 are automatically deleted when the adbmergechunk command terminates normally.

Note

Merging chunks in wait status can prevent degradation in retrieval performance when these chunks are changed to chunks in normal status. There is also a limit to the number of chunks that can be managed by a table or DB area. If the number of chunks is insufficient, you can reduce the number by merging multiple chunks into a single chunk.

For details about the adbmergechunk command, see adbmergechunk (Merge Chunks) in the manual HADB Command Reference. Also, see 11.4.9 Merging chunks (to reduce the number of chunks).

(5) Changing the chunk status (switching the product data to be analyzed)

When the new month begins, the product data that was excluded from analysis is made subject to analysis. Conversely, the product data that had been subject to analysis is now excluded from analysis.

When the new month begins, the chunk status is changed as follows:

In this example, the product data for September 2014 is now excluded from analysis, and the product data for October 2014 is made subject to analysis.

(a) Identifying the chunk ID of the chunk whose status is to be changed

To identify the chunk ID of the chunk whose status is to be changed, retrieve the system table STATUS_CHUNKS as explained in the following.

First, identify the chunk ID of the product data for September 2014.

■ SQL statement example (changing a chunk in normal status to a chunk in wait status)
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC
   WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='ITEMLIST'
      AND "CHUNK_COMMENT"='September 2014_merge'
■ Retrieval result
 CHUNK_ID
 --------------------
                    1
Explanation

In this example, the chunk ID of the chunk containing the product data for September 2014 is identified using the chunk comment ('September 2014_merge') that was added during chunk merging, as the key during retrieval of the STATUS_CHUNKS table.

Next, identify the chunk ID of the product data for October 2014.

■ SQL statement example (changing a chunk in wait status to a chunk in normal status)
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC
   WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='ITEMLIST'
      AND "CHUNK_COMMENT"='October 2014_merge'
■ Retrieval result
 CHUNK_ID
 --------------------
                   12
Explanation

In this example, the chunk ID of the chunk containing the product data for October 2014 is identified using the chunk comment ('October 2014_merge') that was added during chunk merging, as the key during retrieval of the STATUS_CHUNKS table.

(b) Changing the chunk status

To change the chunk status, use the adbchgchunkstatus command. Chunk statuses are changed as described in this subsection.

  • Changing a chunk in normal status to a chunk in wait status

    Below, the chunk containing the product data for September 2014 (chunk ID: 1) is the one whose status is changed.

  • Changing a chunk in wait status to a chunk in normal status

    Below, the chunk containing the product data for October 2014 (chunk ID: 12) is the one whose status is changed.

The following is an execution example of the adbchgchunkstatus command.

■ Command execution example
adbchgchunkstatus -u ADBUSER01 -p '#HelloHADB_01'
                  -w 1 -n 12 ITEMLIST
Explanation

The chunk status in the product table (ITEMLIST) is changed as specified in the -w and -n options.

In this example, the chunk with chunk ID 1 is changed to wait status, and the chunk with chunk ID 12 is changed to normal status.

The following figure shows an example of changing chunk statuses.

Figure 11‒35: Example of changing chunk statuses

[Figure]

Explanation

The chunk containing the product data for September 2014 (chunk ID: 1) is excluded from analysis, and the chunk containing the product data for October 2014 (chunk ID: 12) becomes subject to analysis.

For details about the adbchgchunkstatus command, see adbchgchunkstatus (Change Chunk Status) in the manual HADB Command Reference. Also, see 11.4.12 Changing the chunk status.

(6) Backing up data

To prevent the amount of data from becoming unmanageable, data older than 6 months is backed up and then deleted. The database is backed up every month.

(a) Identifying the chunks with the earliest creation date and time

Chunks that are older than 6 months (chunks with the earliest creation date and time) are identified in the product table (ITEMLIST). To identify these chunks in the product table, retrieve the system table STATUS_CHUNKS as described below.

The following shows an SQL statement example.

■ SQL statement example
SELECT "CHUNK_ID" FROM "MASTER"."STATUS_CHUNKS" TC
   WHERE "TABLE_SCHEMA"='ADBUSER01' AND "TABLE_NAME"='ITEMLIST'
      AND "CREATE_TIME"=(SELECT MIN("CREATE_TIME")
         FROM "MASTER"."STATUS_CHUNKS"
            WHERE TC."TABLE_SCHEMA"='ADBUSER01'
               AND TC."TABLE_NAME"='ITEMLIST')
■ Retrieval result
 CHUNK_ID
 --------------------
                    1
Explanation

When the STATUS_CHUNKS table is retrieved, the result shows that the chunk with chunk ID 1 is the chunk with the earliest creation date and time (older than 6 months).

Note

When chunks are merged, the earliest creation date and time among the merge-source chunks is stored in the STATUS_CHUNKS table.

For details about how to retrieve data from the STATUS_CHUNKS table, see C.9 Searching system tables.

(b) Making a backup

Now that the chunk ID of the chunk with the earliest creation date and time in the product table (ITEMLIST) has been identified as 1, back up the data in that chunk in a CSV file.

To make a backup, use the adbexport command. The following shows an execution example of the adbexport command.

■ Command execution example
adbexport -u ADBUSER01 -p '#HelloHADB_01'
          -z /home/adbmanager/exp_file/exp_opt_file02.txt
          -n ITEMLIST
          -c 1
          /home/adbmanager/exp_file/exp_data_path02.txt
■ Content of the output data path file (/home/adbmanager/exp_file/exp_data_path02.txt)
/home/adbmanager/exp_backup/backup0101.csv
/home/adbmanager/exp_backup/backup0102.csv
/home/adbmanager/exp_backup/backup0103.csv
Explanation

The data contained in the chunk with chunk ID 1 is backed up in CSV files (/home/adbmanager/exp_backup/backup0101.csv, /home/adbmanager/exp_backup/backup0102.csv, and /home/adbmanager/exp_backup/backup0103.csv).

In this example, since chunks are merged every month, one month's worth of data is backed up.

Note

Although you can back up the files into a single file, distributing the files into multiple files can improve performance by distributing the output-related overhead.

For details about the adbexport command, see adbexport (Export Data) in the manual HADB Command Reference. Also, see 11.4.5 Exporting data in units of chunks.

(7) Deleting old data (chunks)

To prevent the amount of data from becoming unmanageable, delete data that is older than 6 months.

Delete the chunk with the earliest creation date and time from the product table (ITEMLIST). To delete a chunk, execute the PURGE CHUNK statement with a chunk ID specified in the search condition.

■ SQL statement example
PURGE CHUNK "ITEMLIST" WHERE CHUNKID=1
Explanation

The data contained in the chunk with chunk ID 1 is deleted.

In this example, since chunks are merged every month, one month's worth of data is deleted.

For details about the PURGE CHUNK statement, see PURGE CHUNK (delete all rows in a chunk) in Data Manipulation SQL in the manual HADB SQL Reference. Also, see 11.4.6 Deleting data in units of chunks.

(8) Repeatedly adding new data, merging, changing status, and deleting old data

By repeatedly performing background import to create chunks in wait status, by merging chunks in wait status, changing chunk statuses, and by deleting chunks using the PURGE CHUNK statement, you can continue to analyze only the product data of a specific month without increasing the overall volume of data managed by HADB.