Hitachi

Hitachi Advanced Database Setup and Operation Guide


2.16 Retrieving data from CSV files

HADB enables you to use the SELECT statement to retrieve data in CSV files (CSV-format files). For example, if your last year's sales management data is stored in a CSV file, you can retrieve the sales management data without having to import it to a table.

To do this, execute the SELECT statement with the path of the target CSV file specified. If you specify a search condition in the SELECT statement, you can retrieve the data that satisfies the condition in the same manner as when you retrieve data that has been imported to a table. The following figure provides an overview of data retrieval from CSV files.

Figure 2‒57: Overview of data retrieval from CSV files

[Figure]

#

Data can be retrieved from CSV files that are not compressed or that are compressed in GZIP format.

Explanation
  1. Executes the SELECT statement with the CSV file path specified.

    Specify the CSV file path in a function provided by HADB (ADB_CSVREAD function for retrieving data from CSV files).

  2. The HADB server accesses the CSV file whose path is specified in the function.

  3. The HADB server retrieves data from the CSV file and stores it temporarily as a table. This set of data stored temporarily in tabular format is called a table-function derived table.

  4. The HADB server retrieves the table-function derived table according to the search condition specified in the SELECT statement and returns the retrieval results to the application program.

■ Example of SELECT statement with the ADB_CSVREAD function specified

This example retrieves data in the following CSV files that have been compressed in GZIP format:

  • /dir/file1.csv.gz

  • /dir/file2.csv.gz

  • /dir/file3.csv.gz

SELECT * FROM TABLE(ADB_CSVREAD(
           MULTISET['/dir/file1.csv.gz','/dir/file2.csv.gz','/dir/file3.csv.gz'],
          'COMPRESSION_FORMAT=GZIP;
           FIELD_NUM=1,2;
           ENCLOSING_CHAR=";
           DELIMITER_CHAR=,;'))
           AS "T1" ("C1" INTEGER, "C2" VARCHAR(32))
Explanation

The underlined parts indicate the information specified in the ADB_CSVREAD function.

The information specified in the ADB_CSVREAD function includes the absolute path names of the CSV files and the data to be retrieved from the CSV files.

For details about the ADB_CSVREAD function, see ADB_CSVREAD function under System-defined functions in Constituent Elements in the manual HADB SQL Reference.

Important
  • The CSV files must meet any of the following conditions:

    ∙ CSV files that are not compressed

    ∙ CSV files that are compressed in GZIP format by using the OS command gzip

    ∙ CSV files that are compressed in GZIP format and exported by using the adbexport command

  • The data in such a CSV file cannot be updated.

  • Retrieval of data from a CSV file requires more time than retrieval of data that has been imported to a table because the CSV file must be analyzed.