1.3 Character sets

Organization of this section
(1) Description
(2) Format
(3) Rules
(4) Notes

(1) Description

A character set defines the properties of character data, based on the following three attributes:

(2) Format

character-set-specification::=[MASTER.]character-set-name

character-set-name ::= {EBCDIK|UTF16}

(3) Rules

  1. The following table lists the character sets that are available in HiRDB.

    Table 1-9 Character sets available in HiRDB

    Character set nameUsage formatCharacter repertoireDefault collation sequence
    EBCDIKEBCDIK code.
    Characters are represented by 8-bit (single-byte) character codes.
    All EBCDIK-encoded charactersCode ordering based on bit combinations
    UTF16Characters are represented in the character encoding format defined by JIS X 0221 (ISO/IEC 10646), in which each character is encoded as two or four bytes. Byte order is big-endian.All Unicode charactersCode ordering based on bit combinations
  2. EBCDIK can only be specified as the character set if sjis is specified as the character code classification in the pdntenv command (pdsetup command in the UNIX edition).
  3. UTF16 can only be specified as the character set if utf-8 is specified as the character code classification in the pdntenv command (pdsetup command in the UNIX edition).
  4. A character set can be specified in any place where a character data type can be specified. A character set cannot be specified for the mixed character data type or national character data type.
  5. If no character set is specified, the character set is determined by the character code classification specified in the pdntenv command (pdsetup command in the UNIX edition). The character set that is assumed when no character set is specified is called the default character set. The following table lists the default character set that is assumed based on the character code classification specified in the pdntenv command (pdsetup command in the UNIX edition).

    Table 1-10 Default character sets for character codes specified in the pdntenv (pdsetup) command

    Character code specified in commandDefault character set
    sjisShift JIS kanji code
    chineseEUC Chinese kanji code
    ujisEUC Japanese kanji code
    utf-8Unicode (UTF-8)
    lang-cSingle-byte character code
    chinese-gb18030Chinese kanji code (GB18030)

(4) Notes

To use data encoded as UTF-16 in the ? parameter, specify the character set name in the character set descriptor area. Specifying UTF-16 data handling in the preprocessing options or embedded variable definitions allows data encoded as UTF-16 to also be used in embedded variables. In this case, the SQL preprocessor determines the character set name based on the specified preprocessing options and the embedded variable.

In addition to UTF16, either UTF-16LE or UTF-16BE can be specified as the character set name.

In the following descriptions, the UTF-16 character set name is assumed to include UTF-16LE and UTF-16BE.

For details about specifying the character set in preprocessing options, embedded variable definitions, or the character set descriptor area, see the HiRDB Version 9 UAP Development Guide.