HiRDB Dataextractor Version 8 Description, User's Guide and Operator's Guide

[Contents][Index][Back][Next]

3.1.4 Converting the character codes of extracted data

When you extract or import data between systems that employ different character locales, you can convert the extracted data to the character codes employed by the target system.

This subsection describes the character code sets that can be converted. For details about how to use this function, see 4.2.3 Additional data extraction and import functions.

Organization of this subsection
(1) Data types supported for conversion of character codes
(2) Convertible character code sets
(3) Character code set conversion ranges
(4) Converting from SJIS to EUC
(5) Converting from SJIS to UTF-8
(6) Converting from EUC to SJIS
(7) Converting from EUC to UTF-8
(8) Converting from UTF-8 to SJIS or EUC

(1) Data types supported for conversion of character codes

For extracted data, the following data types are supported for conversion of character codes:

1 You can use the null value information file to exclude desired columns from code conversion. For details about how to specify the null value information file, see 4.2.4 Contents of files specified with the xtrep command.

2 If you are only creating a file without importing data into a HiRDB table, the data type is excluded from code conversion because it is treated as having the BLOB attribute. In such a case, you can use one of the following methods to convert the codes of such data:

(2) Convertible character code sets

The following table shows the combinations of character codes sets that can be converted by HiRDB Dataextractor:

Source character code set Target character code set
SJIS EUC UTF-8
SJIS -- Y Y
EUC Y -- Y
UTF-8 Y Y --

Legend:
Y: Can be converted.
--: Cannot be converted.

(3) Character code set conversion ranges

Figure 3-8 shows the conversion range for the SJIS character code set.

Figure 3-8 SJIS character code set conversion range

[Figure]

Figure 3-9 shows the conversion range for the EUC character code set.

Figure 3-9 EUC character code set conversion range

[Figure]

Table 3-9 shows the conversion range for the UTF-8 character code set.

Table 3-9 UTF-8 character code set

1st byte 2nd byte 3rd byte Conversion rule
0x00-0x7F -- -- Recognize as a 1-byte code, and convert to the corresponding code.
0x80-0xBF 0x00-0xFF -- Convert to 0x20.
-- --
0x2C-0xDE 0x80-0xFF -- Recognize as a 2-byte code, and convert to the corresponding code.
Other than above -- Convert according to the specification of the XTUNDEF environment variable.
-- -- Recognize as an incomplete code, and skip without converting it.
0xDF 0x80-0xBF -- Recognize as a 2-byte code, and convert to the corresponding code.
Other than above -- Convert according to the specification of the XTUNDEF environment variable.
-- -- Recognize as an incomplete code, and skip without converting it.
0xE0 0xA0-0xFF 0x80-0xFF Recognize as a 3-byte code, and convert to the corresponding code.
Other than above Convert according to the specification of the XTUNDEF environment variable.
-- Recognize as an incomplete code, and skip without converting it.
Other than above 0x80-0xFF Convert according to the specification of the XTUNDEF environment variable.
Other than above
-- Recognize as an incomplete code, and skip without converting it.
-- --
0xE1-0xEE 0x80-0xFF 0x80-0xFF Recognize as a 3-byte code, and convert to the corresponding code.
Other than above Convert according to the specification of the XTUNDEF environment variable.
-- Recognize as an incomplete code, and skip without converting it.
Other than above 0x80-0xFF Convert according to the specification of the XTUNDEF environment variable.
-- Recognize as an incomplete code, and skip without converting it.
-- --
0xEF 0x80-0xBF 0x80-0xBF Recognize as a 3-byte code, and convert to the corresponding code.
Other than above Convert according to the specification of the XTUNDEF environment variable.
-- Recognize as an incomplete code, and skip without converting it.
Other than above 0x80-0xBF Convert according to the specification of the XTUNDEF environment variable.
Other than above
-- Recognize as an incomplete code, and skip without converting it.
-- --
Other than above -- -- Convert according to the specification of the XTUNDEF environment variable.

Legend:
--: The character codes are not converted.

(4) Converting from SJIS to EUC

(a) Single-byte codes ((1) - (4) in Figure 3-8)
(b) Double-byte codes (SJIS standard Kanji area; (6) and (7) in Figure 3-8)
(c) Double-byte codes (Gaiji area; (8) in Figure 3-8)
(d) Double-byte codes (other than (b) or (c))

(5) Converting from SJIS to UTF-8

(a) Single-byte codes
(b) Double-byte codes (standard character set)
(c) Double-byte codes (Gaiji codes)
(d) Double-byte codes (other than (b) or (c))

(6) Converting from EUC to SJIS

(a) Single-byte codes ((1) - (4) in Figure 3-9)
(b) Double-byte codes (Standard Kanji codes; (6) in Figure 3-9)
(c) Double-byte codes (Gaiji area; (5) in Figure 3-9)
(d) Double-byte codes (other than (b) or (c))

(7) Converting from EUC to UTF-8

(a) Single-byte codes
(b) Double-byte codes (standard character set)
(c) Double-byte codes (Gaiji code)
(d) Double-byte codes (other than (b) or (c))

(8) Converting from UTF-8 to SJIS or EUC

(a) Single-byte codes
(b) Double-byte codes (standard character set)
(c) Double-byte codes (Gaiji codes)
(d) Double-byte codes (other than (b) or (c))