HiRDB Datareplicator Version 8 Description, User's Guide and Operator's Guide
This section explains how to design for the character code sets that are to be supported.
The following types of character code sets can be used by the source and target databases:
If the source and target databases do not use the same character code set, Datareplicator converts character codes in accordance with a provided definition. If the source and target databases use the same character code set, there is no need to convert character codes.
The target Datareplicator executes character code conversion in accordance with the correspondence between the character code sets at the source and target databases. You use the dblocale operand in the import system definition to specify the target database's character code set.
If the source database is a mainframe database, use the ebcdic_type operand in the import environment definition to specify the type of EBCDIK/KEIS character code set for the source database. The following table shows the correspondence of character code sets between the source and target databases.
Table 4-15 Correspondence of character code sets between the source and target databases
| Source database's character code set | Target database's character code set | ||||
|---|---|---|---|---|---|
| EBCDIK/KEIS#1 | EBCDIK | JIS8/Shift JIS | EUC#2 | UTF-8 | |
| EBCDIK/KEIS#1 | -- | -- | Y | Y | Y |
| EBCDIK | -- | -- | Y | -- | -- |
| JIS8/Shift JIS | Y | Y | -- | Y | Y |
| EUC#2 | Y | -- | Y | -- | Y |
| UTF-8 | Y | -- | Y | Y | -- |
Y: The character code set used in the source database can be converted to the character code set used in the target database.
--: There is no need to convert the character code set.
Datareplicator uses the following character code conversion rules:
Datareplicator converts a single-byte code to the corresponding Shift JIS character code.
Datareplicator uses the provided mapping table for converting character codes to convert double-byte Gaiji codes. If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.
Datareplicator converts the space character in shift code. The following table shows the conversion rules for the space character (from EBCDIK/KEIS to JIS8/Shift JIS).
Table 4-16 Conversion rules for the space character (from EBCDIK/KEIS to JIS8/Shift JIS)
| EBCDIK/KEIS character code | JIS8/Shift JIS character code |
|---|---|
| Double-byte space character (A1A1)16 |
Double-byte space character (8140)16 |
| Two consecutive single-byte space characters (40)16(40)16 |
Converted to one double-byte space character or two consecutive single-byte space characters, depending on the shiftspace_cnv operand value specified in the import environment definition |
| Single-byte space character (40)16 |
Single-byte space character (20)16 |
Datareplicator assigns the last 8,836 characters in the 9,024-character Gaiji area as the Gaiji area for EUC character codes for conversion purposes.
Datareplicator converts any other code in accordance with the user-created mapping table for converting character codes. If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.
Note that code set 3 might result in an error when an SQL statement is issued.
Datareplicator converts the space character. The following table shows the conversion rules for the space character (from EBCDIK/KEIS to EUC).
Table 4-17 Conversion rules for the space character (from EBCDIK/KEIS to EUC)
| EBCDIK/KEIS character code | JIS8/Shift JIS character code |
|---|---|
| Double-byte space character (A1A1)16 |
Double-byte space character (A1A1)16 |
| Two consecutive single-byte space characters (40)16(40)16 |
Converted to one double-byte space character or two consecutive single-byte space characters, depending on the shiftspace_cnv operand value specified in the import environment definition |
| Single-byte space character (40)16 |
Single-byte space character (20)16 |
Datareplicator converts a single-byte kana character to a double-byte character. If the source table contains single-byte kana characters, the data length increases after conversion and data overflow might occur. The following table shows how overflow is handled (from EBCDIK/KEIS to EUC).
Table 4-18 Handling of overflow (from EBCDIK/KEIS to EUC)
| Location of overflow | Datareplicator's action | User's action |
|---|---|---|
| Literal or update data | SQL error (applicable if the data exceeds the defined length after conversion of all bytes) | Change the defined length and re-execute import processing. |
Datareplicator uses the provided mapping table for converting character codes to convert double-byte Gaiji codes.
If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.
Datareplicator converts the space character in shift code. The following table shows the conversion rules for the space character (from EBCDIK/KEIS to UTF-8).
Table 4-19 Conversion rules for the space character (from EBCDIK/KEIS to UTF-8)
| EBCDIK/KEIS character code | UTF-8 character code |
|---|---|
| Double-byte space character (A1A1)16 |
Double-byte space character (E38080)16 |
| Two consecutive single-byte space characters (40)16(40)16 |
Converted to one double-byte space character or two consecutive single-byte space characters, depending on the shiftspace_cnv operand value specified in the import environment definition |
| Single-byte space character (40)16 |
Single-byte space character (20)16 |
Datareplicator converts a single-byte kana character to a three-byte character and a double-byte standard kanji character to a three-byte character. If the source table contains single-byte kana characters or standard kanji characters, the data length increases after conversion and data overflow might occur.
The following table shows how overflow is handled (from EBCDIK/KEIS to UTF-8).
Table 4-20 Handling of overflow (from EBCDIK/KEIS to UTF-8)
| Location of overflow | Datareplicator's action | User's action |
|---|---|---|
| Literal or update data | SQL error (applicable if the data exceeds the defined length after conversion of all bytes) | Change the defined length and re-execute import processing. |
If the target database is HiRDB using UTF-8, a column of the NCHAR or NVARCHAR type cannot be created in the table. To store such data in a different data type, such as MCHAR, use a column data editing UOC routine.
Datareplicator converts such single-byte codes to the corresponding JIS8 character codes (single-byte codes). Because only single-byte codes can be specified in a column for which EBCDIK is specified in the character set specification, the following might result:
Datareplicator converts single-byte codes to the corresponding EBCDIK character codes.
Datareplicator treats bytes 1 and 2 of a double-byte code as single-byte codes and converts them to single-byte EBCDIK code characters. Therefore, the following might result:
Datareplicator converts a single-byte code to the corresponding EBCDIK/KEIS character code.
Datareplicator converts a double-byte code to the corresponding EBCDIK/KEIS character code. If the last character is the first byte of a double-byte code, Datareplicator converts it to a space character ((40)16).
Datareplicator converts a double-byte Gaiji code in accordance with the mapping table for converting character codes. If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to the space character ((4040)16).
Datareplicator converts a double-byte space character ((8140)16) to the corresponding double-byte space character ((A1A1)16) and a single-byte space character ((20)16) to the corresponding single-byte space character ((40)16).
Datareplicator converts double-byte codes to the corresponding UTF-8 character codes.
Datareplicator uses the provided mapping table for converting character codes to convert double-byte Gaiji codes.
If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.
Datareplicator converts the space character in shift code. The following table shows the conversion rules for the space character (from JIS8/Shift JIS to UTF-8).
Table 4-21 Conversion rules for the space character (from JIS8/Shift JIS to UTF-8)
| JIS8/SJIS character code | UTF-8 character code |
|---|---|
| Double-byte space character (8140)16 |
Double-byte space character (E38080)16 |
| Single-byte space character (20)16 |
Single-byte space character (20)16 |
Datareplicator converts a single-byte kana character to a three-byte character and a double-byte standard kanji character to a three-byte character. If the source table contains single-byte kana characters or standard kanji characters, the data length increases after conversion and data overflow might occur. The following table shows how overflow is handled (from JIS8/Shift JIS to UTF-8).
Table 4-22 Handling of overflow (from JIS8/Shift JIS to UTF-8)
| Location of overflow | Datareplicator's action | User's action |
|---|---|---|
| Literal or update data | SQL error (applicable if the data exceeds the defined length after conversion of all bytes) | Change the defined length and re-execute import processing. |
Table 4-23 Character mapping in SJIS Kanji encoding and Unicode
| Mapping method | Description | ||
|---|---|---|---|
| Method | Target | Range of kanji | |
| JIS method | SJIS to JIS X0221 | JIS standard level 1 | Depends on the mapping stipulated by JIS X0221. |
| JIS standard level 2 | |||
| MS method | Windows-the customer support center character set to MS-The customer support center | JIS standard level 1 | Depends on the mapping defined by Microsoft. |
| JIS standard level 2 | |||
| Vendor-extended characters | |||
Datareplicator converts a single-byte code to the corresponding Shift JIS character code.
Datareplicator converts a double-byte code to the corresponding Shift JIS character code. If the last character is the first byte of a double-byte code, Datareplicator converts it to a space character ((20)16).
Datareplicator converts a double-byte Gaiji code in accordance with the mapping table for converting character codes. If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.
Datareplicator converts a double-byte space character ((A1A1)16) to the corresponding code ((8140)16). Datareplicator converts two consecutive single-byte space characters ((40)16) to a double-byte space character ((8140)16). One single-byte space character ((40)16) is converted to the corresponding code ((20)16).
Datareplicator converts a single-byte code, excluding kana characters, to the corresponding UTF-8 character code.
Datareplicator converts a double-byte code (Gaiji) in accordance with the user-created mapping table for converting character codes.
If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.
Datareplicator converts the space character in shift code. The following table shows the conversion rules for the space character (from EUC to UTF-8).
Table 4-24 Conversion rules for the space character (from EUC to UTF-8)
| EUC character code | UTF-8 character code |
|---|---|
| Double-byte space character (A1A1)16 |
Double-byte space character (E38080)16 |
| Single-byte space character (20)16 |
Single-byte space character (20)16 |
Datareplicator converts a double-byte kana character to a three-byte character and a double-byte standard kanji character to a 3-byte character. If the source table contains single-byte kana characters or standard kanji characters, the data length increases after conversion and data overflow might occur. The following table shows how overflow is handled (from EUC to UTF-8).
Table 4-25 Handling of overflow (for conversion from EUC to UTF-8)
| Location of overflow | Datareplicator's action | User's action |
|---|---|---|
| Literal or update data | SQL error (applicable if the data exceeds the defined length after conversion of all bytes) | Change the defined length and re-execute import processing. |
Datareplicator converts a single-byte code to the corresponding character code.
Datareplicator converts a double-byte or three-byte code to the corresponding UTF-8 character code.
Datareplicator converts a three-byte code in accordance with the user-created mapping table for converting character codes.
If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.
You can suppress character code conversion for a column at the target Datareplicator by specifying suppression of character code conversion for the desired extracted column. For details about the specification to suppress character code conversion, see the update information field definition in 5.10 Import definition.
The following table shows the conversion rules for JIS8/Shift JIS codes.
Table 4-26 Conversion rules for JIS8/Shift JIS codes
| 1-byte | 2-byte | 3-byte | Code conversion rule |
|---|---|---|---|
| 0x00 to 0x80 | -- | -- | Treats as JIS8 and converts it to the corresponding code. |
| 0x81 to 0x9F | 0x40 to 0xFC (excluding 0x7F) |
-- | Treats as SJIS (Kanji) and converts it to the corresponding code. |
| Other | -- | Converts according to the specification of undefcode_cnv. | |
| -- | -- | Treats as incomplete code and skips it without converting. | |
| 0xA0 to 0xDF | -- | -- | Treats as JIS8 and converts it to the corresponding code. |
| 0xE0 to 0xEF | 0x40 to 0xFC (excluding 0x7F) |
-- | Treats as SJIS (Kanji) and converts it to the corresponding code. |
| Other | -- | Converts according to the specification of undefcode_cnv. | |
| -- | -- | Treats as incomplete code and skips it without converting. | |
| 0xF0 to 0xFC | 0x40 to 0xFC (excluding 0x7F) |
-- | Treats as SJIS (Gaiji) and converts it to the corresponding code. |
| Other | -- | Converts according to the specification of undefcode_cnv. | |
| -- | -- | Treats as incomplete code and skips it without converting. | |
| 0xFD to 0xFF | -- | -- | Treats as JIS8 and converts it to the corresponding code. |
The following table shows the conversion rules for EUC codes.
Table 4-27 Conversion rules for EUC codes
| 1-byte | 2-byte | 3-byte | Code conversion rule |
|---|---|---|---|
| 0x00 to 0x8D | -- | -- | Treats as code set 0 and converts it to the corresponding code. |
| 0x8E | 0xA0 to 0xFF | -- | Treats as code set 2 (kana characters) and converts it to the corresponding code. |
| Other | -- | Converts according to the specification of undefcode_cnv. | |
| -- | -- | Treats as incomplete code and skips it without converting. | |
| 0x8F | 0xA1 to 0xFE | 0xA1 to 0xFE | Treats as code set 3 (Gaiji) and converts it to the corresponding code. |
| Other | Converts according to the specification of undefcode_cnv. | ||
| -- | Treats as incomplete code and skips it without converting. | ||
| Other | 0xA1 to 0xFE | Converts according to the specification of undefcode_cnv. | |
| Other | |||
| -- | Treats as incomplete code and skips it without converting. | ||
| -- | -- | ||
| 0x90 to 0x9F | -- | -- | Treats as code set 0 and converts it to the corresponding code. |
| 0xA0 | -- | -- | Converts to 0x20. |
| 0xA1 to 0xFE | 0xA1 to 0xFE | -- | Treats as code set 1 (Kanji) and converts it to the corresponding code. |
| Other | -- | Converts according to the specification of undefcode_cnv. | |
| -- | -- | Treats as incomplete code and skips it without converting. |
The following table shows the conversion rules for EBCDIK/KEIS codes.
Table 4-28 Conversion rules for EBCDIK/KEIS codes
| 1-byte | 2-byte | 3-byte | Code conversion rule | |
|---|---|---|---|---|
| Single-byte shift on | 0x00-0x09 | -- | -- | Converts to the corresponding code. |
| 0x0A | 0x41 | -- | Toggles single-byte shift without performing conversion. | |
| 0x42 | -- | Toggles double-byte shift without performing conversion. | ||
| Other | -- | Converts byte 1 or byte 2 as a single-byte code. | ||
| -- | -- | Treats as incomplete code and skips it without converting. | ||
| 0x0B to 0xFF | -- | -- | Converts to the corresponding code. | |
| Double-byte shift on | 0x00 to 0x40 | -- | -- | Converts byte 1 or byte 2 as a single-byte code. |
| 0x41 to 0xA0 | 0xA1 to 0xFE | -- | Converts to the corresponding code. | |
| Other | -- | Converts according to the specification of undefcode_cnv. | ||
| -- | -- | Treats as incomplete code and skips it without converting. | ||
| 0xA1 to 0xFE | 0xA1 to 0xFE | -- | Converts to the corresponding code. | |
| Other | -- | Converts according to the specification of undefcode_cnv. | ||
| -- | -- | Treats as incomplete code and skips it without converting. | ||
| 0xFF | 0x01 to 0xFF | -- | Converts according to the specification of undefcode_cnv. | |
| -- | -- | Treats as incomplete code and skips it without converting. | ||
The following table shows the conversion rules for UTF-8 codes.
Table 4-29 Conversion rules for UTF-8 codes
| 1-byte | 2-byte | 3-byte | Code conversion rule |
|---|---|---|---|
| 0x00 to 0x7F | -- | -- | Treats as single-byte code and converts it to the corresponding code. |
| 0x80 to 0xBF | 0x00 to 0xFF | -- | Converts to 0x20. |
| -- | -- | ||
| 0xC2 to 0xDE | 0x80 to 0xFF | -- | Treats as double-byte code and converts it to the corresponding code. |
| Other | -- | Converts according to the specification of undefcode_cnv. | |
| -- | -- | Treats as incomplete code and skips it without converting. | |
| 0xDF | 0x80 to 0xBF | -- | Treats as double-byte code and converts it to the corresponding code. |
| Other | -- | Converts according to the specification of undefcode_cnv. | |
| -- | -- | Treats as incomplete code and skips it without converting. | |
| 0xE0 | 0xA0 to 0xFF | 0x80 to 0xFF | Treats as three-byte code and converts it to the corresponding code. |
| Other | Converts according to the specification of undefcode_cnv. | ||
| -- | Treats as incomplete code and skips it without converting. | ||
| Other | 0x80 to 0xFF | Converts according to the specification of undefcode_cnv. | |
| Other | |||
| -- | Treats as incomplete code and skips it without converting. | ||
| -- | -- | ||
| 0xE1 to 0xEE | 0x80 to 0xFF | 0x80 to 0xFF | Treats as three-byte code and converts it to the corresponding code. |
| Other | Converts according to the specification of undefcode_cnv. | ||
| -- | Treats as incomplete code and skips it without converting. | ||
| Other | 0x80 to 0xFF | Converts according to the specification of undefcode_cnv. | |
| -- | Treats as incomplete code and skips it without converting. | ||
| -- | -- | ||
| 0xEF | 0x80 to 0xBF | 0x80 to 0xBF | Treats as three-byte code and converts it to the corresponding code. |
| Other | Converts according to the specification of undefcode_cnv. | ||
| -- | Treats as incomplete code and skips it without converting. | ||
| Other | 0x80 to 0xBF | Converts according to the specification of undefcode_cnv. | |
| Other | |||
| -- | Treats as incomplete code and skips it without converting. | ||
| -- | -- | ||
| Other | -- | -- | Converts according to the specification of undefcode_cnv. |
All rights reserved. Copyright (C) 2007, 2013, Hitachi, Ltd.