Hitachi

Hitachi Advanced Database Setup and Operation Guide


5.4.6 Selecting the delimiting characters for word-context searches (DELIMITER)

When defining a text index for a word-context search, you need to specify the types of delimiting characters that delimit one word from another. For details about text indexes that support word-context search, see 5.4.5 Specifying a text index for a word-context search (TEXT WORDCONTEXT).

To specify the delimiting characters, you use the DELIMITER (text-index delimiter specification) operand of the CREATE INDEX statement. You must specify DELIMITER if you specify TEXT WORDCONTEXT for INDEXTYPE.

As the delimiting character type, you can specify DEFAULT or ALL. You must specify one or the other when specifying DELIMITER. The following shows which characters are handled as delimiting characters when DEFAULT is specified and when ALL is specified.

DEFAULT

When performing a word-context search, the following characters are handled as delimiting characters:

  • Single-byte spaces (0x20)

  • Tabs (0x09)

  • Line feeds (0x0A)

  • Carriage returns (0x0D)

  • Periods (0x2E)

  • Question marks (0x3F)

  • Exclamation marks (0x21)

ALL

When performing a word-context search, the following characters are handled as delimiting characters:

  • Single-byte spaces (0x20)

  • Tabs (0x09)

  • Line feeds (0x0A)

  • Carriage returns (0x0D)

  • Half-width symbols including periods, question marks, and exclamation marks (0x21 to 0x2F, 0x3A to 0x40, 0x5B to 0x60, and 0x7B to 0x7E)

The result of a word-context search of English text that contains symbols differs according to the delimiting characters. Keep this in mind when selecting the type of delimiting characters. The following explains cases that benefit from specifying DEFAULT, and cases that benefit from specifying ALL.

■ Cases that benefit from specifying DEFAULT

When using a word-context search to search for data that includes symbols, we recommend that you specify DEFAULT. Examples are cases in which you are searching for URLs and email addresses. If you specify DEFAULT and search with taro@hitachi.com as the search term, taro@hitachi is handled as one word. This allows you to retrieve only data that contains taro@hitachi.

If you specify ALL, taro and hitachi are handled as separate words. This means that the word-context search will also retrieve data such as xxxxx@taro.hitachi.com and taro.hitachi@com.

■ Cases that benefit from specifying ALL

When using a word-context search to search for data that does not include symbols, we recommend that you specify ALL. Examples are cases in which you are searching for phrases like high∆speed that consist of multiple words. If you specify ALL and search with high∆speed as the search term, the word-contest search will retrieve not only high∆speed but also similar phrases such as high-speed and high_speed. Here, represents a half-width space.

■ Notes

The DELIMITER specification might be ignored if the table targeted by the word-context search is an internal derived table.

  • If the derived table is expanded

    The DELIMITER specification is valid for the text index for the word-context search defined for columns in the result of expanding the internal derived table.

  • If the derived table is not expanded

    DELIMITER is invalid if ALL is specified. The word-context search will be executed subject to the same delimiting characters as if DEFAULT were specified for DELIMITER.

Note that this rule does not apply to internal derived tables obtained by equivalent exchange of an archivable multi-chunk table.

For details about the rules for derived table expansion, see Internal derived tables in Constituent Elements in the manual HADB SQL Reference.