HiRDB Datareplicator Version 8 Description, User's Guide and Operator's Guide

[Contents][Glossary][Index][Back][Next]

4.3.5 Designing for character code sets

This section explains how to design for the character code sets that are to be supported.

Organization of this subsection
(1) Types of character code sets
(2) Character code conversion
(3) Rules for character code conversion
(4) Rules for character code conversion from EBCDIK/KEIS to JIS8/Shift JIS
(5) Rules for character code conversion from EBCDIK/KEIS to EUC
(6) Rules for character encoding conversion from EBCDIK/KEIS to UTF-8
(7) Rules for character encoding conversion from EBCDIK to JIS8
(8) Rules for character encoding conversion from JIS8 to EBCDIK
(9) Rules for character code conversion from JIS8/Shift JIS or EUC to EBCDIK/KEIS
(10) Rules for character encoding conversion from JIS8/Shift JIS to UTF-8
(11) Rules for character code conversion from EUC to JIS8/Shift JIS
(12) Rules for character code conversion from EUC to UTF-8
(13) Rules for character code conversion from UTF-8 to EUC or JIS8/Shift JIS
(14) Suppression of character code conversion
(15) Details of conversion rules for each character the customer support center

(1) Types of character code sets

The following types of character code sets can be used by the source and target databases:

(2) Character code conversion

If the source and target databases do not use the same character code set, Datareplicator converts character codes in accordance with a provided definition. If the source and target databases use the same character code set, there is no need to convert character codes.

(a) Character code conversion by the target Datareplicator

The target Datareplicator executes character code conversion in accordance with the correspondence between the character code sets at the source and target databases. You use the dblocale operand in the import system definition to specify the target database's character code set.

If the source database is a mainframe database, use the ebcdic_type operand in the import environment definition to specify the type of EBCDIK/KEIS character code set for the source database. The following table shows the correspondence of character code sets between the source and target databases.

Table 4-15 Correspondence of character code sets between the source and target databases

Source database's character code set Target database's character code set
EBCDIK/KEIS#1 EBCDIK JIS8/Shift JIS EUC#2 UTF-8
EBCDIK/KEIS#1 -- -- Y Y Y
EBCDIK -- -- Y -- --
JIS8/Shift JIS Y Y -- Y Y
EUC#2 Y -- Y -- Y
UTF-8 Y -- Y Y --

Y: The character code set used in the source database can be converted to the character code set used in the target database.

--: There is no need to convert the character code set.

#1
Only XDM/DS and VOS3 Database Datareplicator support EBCDIK/KEIS.

#2
Conversion results might not be correct if the specification of the character code set is not in the range 0 to 2.

(3) Rules for character code conversion

Datareplicator uses the following character code conversion rules:

Character code conversion method
Uses a mapping table for converting character codes.

Definition of Gaiji conversion method
For conversion of Gaiji characters, you must use the hdsccnvedt command to edit and define a mapping table for converting character codes.

(4) Rules for character code conversion from EBCDIK/KEIS to JIS8/Shift JIS

(a) Single-byte codes

Datareplicator converts a single-byte code to the corresponding Shift JIS character code.

(b) Double-byte codes (standard character codes)
(c) Double-byte codes (Gaiji)

Datareplicator uses the provided mapping table for converting character codes to convert double-byte Gaiji codes. If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.

(d) Space character

Datareplicator converts the space character in shift code. The following table shows the conversion rules for the space character (from EBCDIK/KEIS to JIS8/Shift JIS).

Table 4-16 Conversion rules for the space character (from EBCDIK/KEIS to JIS8/Shift JIS)

EBCDIK/KEIS character code JIS8/Shift JIS character code
Double-byte space character
(A1A1)16
Double-byte space character
(8140)16
Two consecutive single-byte space characters
(40)16(40)16
Converted to one double-byte space character or two consecutive single-byte space characters, depending on the shiftspace_cnv operand value specified in the import environment definition
Single-byte space character
(40)16
Single-byte space character
(20)16

(5) Rules for character code conversion from EBCDIK/KEIS to EUC

(a) Single-byte codes
(b) Double-byte codes (standard character codes)
(c) Double-byte codes (Gaiji)

Datareplicator assigns the last 8,836 characters in the 9,024-character Gaiji area as the Gaiji area for EUC character codes for conversion purposes.

Datareplicator converts any other code in accordance with the user-created mapping table for converting character codes. If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.

Note that code set 3 might result in an error when an SQL statement is issued.

(d) Space character

Datareplicator converts the space character. The following table shows the conversion rules for the space character (from EBCDIK/KEIS to EUC).

Table 4-17 Conversion rules for the space character (from EBCDIK/KEIS to EUC)

EBCDIK/KEIS character code JIS8/Shift JIS character code
Double-byte space character
(A1A1)16
Double-byte space character
(A1A1)16
Two consecutive single-byte space characters
(40)16(40)16
Converted to one double-byte space character or two consecutive single-byte space characters, depending on the shiftspace_cnv operand value specified in the import environment definition
Single-byte space character
(40)16
Single-byte space character
(20)16
(e) Handling of overflow

Datareplicator converts a single-byte kana character to a double-byte character. If the source table contains single-byte kana characters, the data length increases after conversion and data overflow might occur. The following table shows how overflow is handled (from EBCDIK/KEIS to EUC).

Table 4-18 Handling of overflow (from EBCDIK/KEIS to EUC)

Location of overflow Datareplicator's action User's action
Literal or update data SQL error (applicable if the data exceeds the defined length after conversion of all bytes) Change the defined length and re-execute import processing.

Note:
If an identifier exceeds 30 bytes or update data exceeds 32,000 bytes, Datareplicator is unable to continue processing. Before starting data linkage, make sure that source system identifiers and update data will not exceed the permitted maximum length.

(6) Rules for character encoding conversion from EBCDIK/KEIS to UTF-8

(a) Single-byte codes
(b) Double-byte codes (standard character codes)
(c) Double-byte codes (Gaiji)

Datareplicator uses the provided mapping table for converting character codes to convert double-byte Gaiji codes.

If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.

(d) Space character

Datareplicator converts the space character in shift code. The following table shows the conversion rules for the space character (from EBCDIK/KEIS to UTF-8).

Table 4-19 Conversion rules for the space character (from EBCDIK/KEIS to UTF-8)

EBCDIK/KEIS character code UTF-8 character code
Double-byte space character
(A1A1)16
Double-byte space character
(E38080)16
Two consecutive single-byte space characters
(40)16(40)16
Converted to one double-byte space character or two consecutive single-byte space characters, depending on the shiftspace_cnv operand value specified in the import environment definition
Single-byte space character
(40)16
Single-byte space character
(20)16
(e) Handling of overflow

Datareplicator converts a single-byte kana character to a three-byte character and a double-byte standard kanji character to a three-byte character. If the source table contains single-byte kana characters or standard kanji characters, the data length increases after conversion and data overflow might occur.

The following table shows how overflow is handled (from EBCDIK/KEIS to UTF-8).

Table 4-20 Handling of overflow (from EBCDIK/KEIS to UTF-8)

Location of overflow Datareplicator's action User's action
Literal or update data SQL error (applicable if the data exceeds the defined length after conversion of all bytes) Change the defined length and re-execute import processing.

Note:
If an identifier exceeds 30 bytes or update data exceeds 32,000 bytes, Datareplicator is unable to continue processing. Before starting data linkage, make sure that source system identifiers and update data will not exceed the permitted maximum lengths.
(f) Notes

If the target database is HiRDB using UTF-8, a column of the NCHAR or NVARCHAR type cannot be created in the table. To store such data in a different data type, such as MCHAR, use a column data editing UOC routine.

(7) Rules for character encoding conversion from EBCDIK to JIS8

(a) Single-byte codes

Datareplicator converts such single-byte codes to the corresponding JIS8 character codes (single-byte codes). Because only single-byte codes can be specified in a column for which EBCDIK is specified in the character set specification, the following might result:

(8) Rules for character encoding conversion from JIS8 to EBCDIK

(a) Single-byte codes

Datareplicator converts single-byte codes to the corresponding EBCDIK character codes.

(b) Double-byte codes (standard character codes)

Datareplicator treats bytes 1 and 2 of a double-byte code as single-byte codes and converts them to single-byte EBCDIK code characters. Therefore, the following might result:

(9) Rules for character code conversion from JIS8/Shift JIS or EUC to EBCDIK/KEIS

(a) Single-byte codes

Datareplicator converts a single-byte code to the corresponding EBCDIK/KEIS character code.

(b) Double-byte codes (standard character codes)

Datareplicator converts a double-byte code to the corresponding EBCDIK/KEIS character code. If the last character is the first byte of a double-byte code, Datareplicator converts it to a space character ((40)16).

(c) Double-byte codes (Gaiji)

Datareplicator converts a double-byte Gaiji code in accordance with the mapping table for converting character codes. If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to the space character ((4040)16).

(d) Space character

Datareplicator converts a double-byte space character ((8140)16) to the corresponding double-byte space character ((A1A1)16) and a single-byte space character ((20)16) to the corresponding single-byte space character ((40)16).

(10) Rules for character encoding conversion from JIS8/Shift JIS to UTF-8

(a) Single-byte codes
(b) Double-byte codes (standard character codes)

Datareplicator converts double-byte codes to the corresponding UTF-8 character codes.

(c) Double-byte codes (Gaiji)

Datareplicator uses the provided mapping table for converting character codes to convert double-byte Gaiji codes.

If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.

(d) Space character

Datareplicator converts the space character in shift code. The following table shows the conversion rules for the space character (from JIS8/Shift JIS to UTF-8).

Table 4-21 Conversion rules for the space character (from JIS8/Shift JIS to UTF-8)

JIS8/SJIS character code UTF-8 character code
Double-byte space character
(8140)16
Double-byte space character
(E38080)16
Single-byte space character
(20)16
Single-byte space character
(20)16
(e) Handling of overflow

Datareplicator converts a single-byte kana character to a three-byte character and a double-byte standard kanji character to a three-byte character. If the source table contains single-byte kana characters or standard kanji characters, the data length increases after conversion and data overflow might occur. The following table shows how overflow is handled (from JIS8/Shift JIS to UTF-8).

Table 4-22 Handling of overflow (from JIS8/Shift JIS to UTF-8)

Location of overflow Datareplicator's action User's action
Literal or update data SQL error (applicable if the data exceeds the defined length after conversion of all bytes) Change the defined length and re-execute import processing.

Note:
If an identifier exceeds 30 bytes or update data exceeds 32,000 bytes, Datareplicator is unable to continue processing. Before starting data linkage, make sure that source system identifiers and update data will not exceed the permitted maximum lengths.
(f) Notes

(11) Rules for character code conversion from EUC to JIS8/Shift JIS

(a) Single-byte codes

Datareplicator converts a single-byte code to the corresponding Shift JIS character code.

(b) Double-byte codes (standard character codes)

Datareplicator converts a double-byte code to the corresponding Shift JIS character code. If the last character is the first byte of a double-byte code, Datareplicator converts it to a space character ((20)16).

(c) Double-byte codes (Gaiji)

Datareplicator converts a double-byte Gaiji code in accordance with the mapping table for converting character codes. If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.

(d) Space character

Datareplicator converts a double-byte space character ((A1A1)16) to the corresponding code ((8140)16). Datareplicator converts two consecutive single-byte space characters ((40)16) to a double-byte space character ((8140)16). One single-byte space character ((40)16) is converted to the corresponding code ((20)16).

(12) Rules for character code conversion from EUC to UTF-8

(a) Single-byte codes

Datareplicator converts a single-byte code, excluding kana characters, to the corresponding UTF-8 character code.

(b) Double-byte codes (standard character codes)
(c) Three-byte codes (Gaiji)

Datareplicator converts a double-byte code (Gaiji) in accordance with the user-created mapping table for converting character codes.

If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.

(d) Space character

Datareplicator converts the space character in shift code. The following table shows the conversion rules for the space character (from EUC to UTF-8).

Table 4-24 Conversion rules for the space character (from EUC to UTF-8)

EUC character code UTF-8 character code
Double-byte space character
(A1A1)16
Double-byte space character
(E38080)16
Single-byte space character
(20)16
Single-byte space character
(20)16
(e) Handling of overflow

Datareplicator converts a double-byte kana character to a three-byte character and a double-byte standard kanji character to a 3-byte character. If the source table contains single-byte kana characters or standard kanji characters, the data length increases after conversion and data overflow might occur. The following table shows how overflow is handled (from EUC to UTF-8).

Table 4-25 Handling of overflow (for conversion from EUC to UTF-8)

Location of overflow Datareplicator's action User's action
Literal or update data SQL error (applicable if the data exceeds the defined length after conversion of all bytes) Change the defined length and re-execute import processing.

Note:
If an identifier exceeds 30 bytes or update data exceeds 32,000 bytes, Datareplicator is unable to continue processing. Before starting data linkage, make sure that source system identifiers and update data will not exceed the permitted maximum lengths.
(f) Notes

(13) Rules for character code conversion from UTF-8 to EUC or JIS8/Shift JIS

(a) Single-byte codes

Datareplicator converts a single-byte code to the corresponding character code.

(b) Double-byte and three-byte codes (standard kanji)

Datareplicator converts a double-byte or three-byte code to the corresponding UTF-8 character code.

(c) Three-byte codes (Gaiji)

Datareplicator converts a three-byte code in accordance with the user-created mapping table for converting character codes.

If a Gaiji character is not defined or no mapping table for converting character codes has been provided, Datareplicator converts the Gaiji code to a space character according to the value of the undefcode_cnv operand specified in the import environment definition.

(14) Suppression of character code conversion

You can suppress character code conversion for a column at the target Datareplicator by specifying suppression of character code conversion for the desired extracted column. For details about the specification to suppress character code conversion, see the update information field definition in 5.10 Import definition.

(15) Details of conversion rules for each character the customer support center

(a) Conversion rules for JIS8/Shift JIS codes

The following table shows the conversion rules for JIS8/Shift JIS codes.

Table 4-26 Conversion rules for JIS8/Shift JIS codes

1-byte 2-byte 3-byte Code conversion rule
0x00 to 0x80 -- -- Treats as JIS8 and converts it to the corresponding code.
0x81 to 0x9F 0x40 to 0xFC
(excluding 0x7F)
-- Treats as SJIS (Kanji) and converts it to the corresponding code.
Other -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
0xA0 to 0xDF -- -- Treats as JIS8 and converts it to the corresponding code.
0xE0 to 0xEF 0x40 to 0xFC
(excluding 0x7F)
-- Treats as SJIS (Kanji) and converts it to the corresponding code.
Other -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
0xF0 to 0xFC 0x40 to 0xFC
(excluding 0x7F)
-- Treats as SJIS (Gaiji) and converts it to the corresponding code.
Other -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
0xFD to 0xFF -- -- Treats as JIS8 and converts it to the corresponding code.
(b) Conversion rules for EUC codes

The following table shows the conversion rules for EUC codes.

Table 4-27 Conversion rules for EUC codes

1-byte 2-byte 3-byte Code conversion rule
0x00 to 0x8D -- -- Treats as code set 0 and converts it to the corresponding code.
0x8E 0xA0 to 0xFF -- Treats as code set 2 (kana characters) and converts it to the corresponding code.
Other -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
0x8F 0xA1 to 0xFE 0xA1 to 0xFE Treats as code set 3 (Gaiji) and converts it to the corresponding code.
Other Converts according to the specification of undefcode_cnv.
-- Treats as incomplete code and skips it without converting.
Other 0xA1 to 0xFE Converts according to the specification of undefcode_cnv.
Other
-- Treats as incomplete code and skips it without converting.
-- --
0x90 to 0x9F -- -- Treats as code set 0 and converts it to the corresponding code.
0xA0 -- -- Converts to 0x20.
0xA1 to 0xFE 0xA1 to 0xFE -- Treats as code set 1 (Kanji) and converts it to the corresponding code.
Other -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
(c) Conversion rules for EBCDIK/KEIS codes

The following table shows the conversion rules for EBCDIK/KEIS codes.

Table 4-28 Conversion rules for EBCDIK/KEIS codes

1-byte 2-byte 3-byte Code conversion rule
Single-byte shift on 0x00-0x09 -- -- Converts to the corresponding code.
0x0A 0x41 -- Toggles single-byte shift without performing conversion.
0x42 -- Toggles double-byte shift without performing conversion.
Other -- Converts byte 1 or byte 2 as a single-byte code.
-- -- Treats as incomplete code and skips it without converting.
0x0B to 0xFF -- -- Converts to the corresponding code.
Double-byte shift on 0x00 to 0x40 -- -- Converts byte 1 or byte 2 as a single-byte code.
0x41 to 0xA0 0xA1 to 0xFE -- Converts to the corresponding code.
Other -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
0xA1 to 0xFE 0xA1 to 0xFE -- Converts to the corresponding code.
Other -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
0xFF 0x01 to 0xFF -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
(d) Conversion rules for UTF-8 codes

The following table shows the conversion rules for UTF-8 codes.

Table 4-29 Conversion rules for UTF-8 codes

1-byte 2-byte 3-byte Code conversion rule
0x00 to 0x7F -- -- Treats as single-byte code and converts it to the corresponding code.
0x80 to 0xBF 0x00 to 0xFF -- Converts to 0x20.
-- --
0xC2 to 0xDE 0x80 to 0xFF -- Treats as double-byte code and converts it to the corresponding code.
Other -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
0xDF 0x80 to 0xBF -- Treats as double-byte code and converts it to the corresponding code.
Other -- Converts according to the specification of undefcode_cnv.
-- -- Treats as incomplete code and skips it without converting.
0xE0 0xA0 to 0xFF 0x80 to 0xFF Treats as three-byte code and converts it to the corresponding code.
Other Converts according to the specification of undefcode_cnv.
-- Treats as incomplete code and skips it without converting.
Other 0x80 to 0xFF Converts according to the specification of undefcode_cnv.
Other
-- Treats as incomplete code and skips it without converting.
-- --
0xE1 to 0xEE 0x80 to 0xFF 0x80 to 0xFF Treats as three-byte code and converts it to the corresponding code.
Other Converts according to the specification of undefcode_cnv.
-- Treats as incomplete code and skips it without converting.
Other 0x80 to 0xFF Converts according to the specification of undefcode_cnv.
-- Treats as incomplete code and skips it without converting.
-- --
0xEF 0x80 to 0xBF 0x80 to 0xBF Treats as three-byte code and converts it to the corresponding code.
Other Converts according to the specification of undefcode_cnv.
-- Treats as incomplete code and skips it without converting.
Other 0x80 to 0xBF Converts according to the specification of undefcode_cnv.
Other
-- Treats as incomplete code and skips it without converting.
-- --
Other -- -- Converts according to the specification of undefcode_cnv.