2.17.1 Correction search
The following describes the correction search in text data.
- Organization of this subsection
(1) Overview of correction search
Correction search is a function that performs a text data search, ignoring the differences between uppercase and lowercase letters, between half-width and full-width characters, and between Japanese hiragana and katakana characters. For example, if you specify Hitachi as a search string, the function searches not only for Hitachi, but also for HITACHI and at the same time. The correction search function is useful when searching text data that has notational inconsistencies because you can perform a collective search as if you execute an SQL statement in which multiple search conditions are combined by using the OR operator.
Notational inconsistencies are likely to arise in text data created by multiple persons, such as log records of a call center and applications for use of products. If you want to retrieve from such text data all records that include a specific keyword (such as the name of a company, product, or person in charge), you can conveniently use the correction search function.
The following figure shows an example of correction search.
- Explanation
-
If Hitachi is specified as a search condition, the function retrieves all character strings that include the word Hitachi, irrespective of character case or width.
To perform correction search, use the scalar function CONTAINS. For details about the scalar function CONTAINS, see CONTAINS in Character string functions (acquisition of character string information) in Scalar Functions in the manual HADB SQL Reference.
Note that if you perform correction search, you can reduce the number of pages to be loaded by defining a text index that supports correction search. This improves table retrieval performance.
- Important
-
The correction search function can be used if the character encoding used on the HADB server is Unicode (the value specified for the environment variable ADBLANG is UTF8). It cannot be used if the character encoding is Shift-JIS (the value specified for the environment variable ADBLANG is SJIS).
(2) Rules of correction search
The following table describes the rules of correction search.
No. |
Character type |
Rule |
Example of correction search |
---|---|---|---|
1 |
Alphabetic character |
Correction search ignores the differences among the following character types:
|
If max is specified as a search condition, the correction search function searches for the following character strings: |
2 |
Number |
Correction search ignores the differences between the following character types:
|
|
3 |
|
▪ Japanese hiragana and katakana characters Correction search ignores the differences between the following character types:
▪ Full-width and half-width katakana characters Correction search ignores the differences between the following character types:
▪ Japanese dakuten and handakuten marks If one of the following characters is followed by a full-width or half-width dakuten mark (voiced sound mark), the correction search function assumes that the character and sign make up a single character.
The same rule also applies to a full-width or half-width handakuten mark (semi-voiced sound mark). ▪ Japanese youon and sokuon signs The correction search function equates Japanese youon and sokuon signs to their corresponding regular-sized characters. ▪ Japanese ombiki sign The correction search function does not ignore whether the ombiki sign is used, and treats the sign as a single ordinary character. |
▪ Example 1 ▪ Example 2 ▪ Example 3 |
4 |
Diacritical marks |
The correction search function ignores the difference between the following character types:
|
The correction search function assumes the following characters to be the same: Therefore, if MAX is specified as a search condition, the correction search function searches for the following character strings: |
5 |
Single character representing a specific character string |
If the correction search function encounters a character that represents a specific character string, the function expands the character to the character string. The function equates the character to the character string. |
The correction search function assumes the following characters to be the same: |
For hiragana, full-width katakana, and half-width katakana characters, a character with a dakuten or handakuten sign and a character without a dakuten or handakuten sign are assumed to be different. Therefore, the correction search function does not ignore the difference between those characters. The following shows examples. In the following combinations of characters, each combination uses the same (dakuten or handakuten) mark. Therefore, both characters are assumed to be the same character.
-
and
-
and
-
and
On the other hand, , , and do not have the same (dakuten or handakuten) mark. Therefore, these characters are assumed to be different. For example, if is specified as a search condition, and can be retrieved. However, , , , and cannot be retrieved because the correction search function does not ignore the difference among them.
Correction search uses sort codes, which are codes specified in the ISO/IEC 14651:2011 standard for sorting and comparing characters. The characters of the same sort code are assumed to be the same character.
- Note
-
-
For symbols and other characters that are assumed to be non-existent in the sort codes, the bytecode is used for sorting and comparison.
-
The sort code is used if SORTCODE(simple-string-specification) is specified as notation-correction-search-specification of the scalar function CONTAINS.
-