7.19.5 LIKE_REGEX predicate

Use the LIKE_REGEX predicate to search data by using a regular expression.

Organization of this subsection

(1) Specification format
(2) Explanation of specification format
(3) Regular expression specification examples
(4) Evaluation of predicates
(5) Rules
(6) Considerations for performance
(7) Example

(1) Specification format

LIKE_REGEX-predicate ::= match-value [NOT] LIKE_REGEX
                         regular-expression-string [FLAG {I | IGNORECASE}]
  match-value ::= value-expression
  regular-expression-string ::= character-string-literal

To Page Top

(2) Explanation of specification format

match-value:

Specifies the data to be retrieved in the form of a value expression. For details about value expressions, see 7.20 Value expression.

You must specify CHAR or VARCHAR type data for match-value.

NOT:

If you specify NOT, character strings that do not include any character string elements represented by the specified regular expression string are retrieved.

regular-expression-string:

Specify a regular expression in the form of a character string literal. For details about character string literals, see 6.3 Literals.

The regular expression string must be no more than 1,024 bytes long.

Specify the regular expression string in the following format:

regular-expression::={[regular-item]|regular-expression vertical-bar regular-expression}

regular-item::={regular-factor|regular-item regular-factor}

regular-factor::={regular-primary|regular-primary *|regular-primary +|regular-primary ?
             |regular-primary repetition-factor}

repetition-factor::=left-curly-bracket lower-limit [,[upper-limit]]right-curly-bracket

regular-primary::={character-specifier|character-class|.|^|$|regular-character-set|(regular-expression)}

character-specifier::={non-escaped-character|escaped-character}

regular-character-set::={[character-list...]|[^character-list...]}

character-list::={character-specifier|character-specifier - character-specifier|regular-character-set-identifier-specification}

regular-character-set-identifier-specification::=[:regular-character-set-identifier:]

The following table describes the regular expression rules.

Table 7‒7: Regular expression rules
No.	Regular expression		Meaning
1	Character specifier		Means a character string of length 1 (unit: characters).
2	`.` (period)		Means any character of length 1 (unit: characters).
3	`^` (caret)		Means the beginning of a match value. For a match value that includes a line break, this symbol does not mean the beginning of the line after the line break. If a caret (`^`) is specified in a pair of square brackets that enclose a regular character set, this pattern means any characters other than the listed characters.
4	`$` (dollar sign)		Means the end of a match value. For a match value that includes a line break, this symbol does not mean the end of the line before the line break.
5	regular-primary`*`		Means zero or more repetitions of the preceding regular primary.
6	regular-primary`+`		Means one or more repetitions of the preceding regular primary.
7	regular-primary`?`		Means zero or one repetition of the preceding regular primary.
8	regular-expression \| regular-expression		Means the regular expression that is specified to the left or right of the vertical bar (\|).
9	regular-primary`{n}` regular-primary`{n,m}` regular-primary`{n,}`		Means a repetition of the preceding regular primary. The following describes the specification patterns: `{n}`: The preceding regular primary is repeated `n` times. `{n,m}`: The preceding regular primary is repeated by the number of times in the range from `n` to `m`. `{n,}`: The preceding regular primary is repeated at least `n` times.
10	upper-limit		An integer in the range from 0 to 256.
11	lower-limit		An integer in the range from 0 to 256.
12	`[`character-list...`]`		Means any of the listed characters.
13	`[^`character-list...`]`		Means any characters other than the listed characters.
14	character-specifier-1 - character-specifier-2		Means any character in the range (of character codes) from character-specifier-1 to character-specifier-2. If an option that ignores case (`FLAG I` or `FLAG IGNORECASE` is specified), this pattern means the following, in addition to the half-width letters in the specified range of character codes: the corresponding half-width uppercase letter for any half-width lowercase letter in the specified range, and the corresponding half-width lowercase letter for any half-width uppercase letter in the specified range.
15	regular-character-set-identifier	`alpha`	Means any half-width uppercase letter (other than `\`, `@`, and `#`) or half-width lowercase letter. The meaning is the same as `[a-zA-Z]`.
16		`upper`	Means any half-width uppercase letter (other than `\`, `@`, and `#`). The meaning is the same as `[A-Z]`.
17		`lower`	Means any half-width lowercase letters. The meaning is the same as `[a-z]`.
18		`digit`	Means any number. The meaning is the same as `[0-9]`.
19		`alnum`	Means any half-width uppercase letter (other than `\`, `@`, and `#`), half-width lowercase letter, or number. The meaning is the same as `[a-zA-Z0-9]`.
20		`space`	Means a half-width space, tab, carriage return, linefeed, vertical tab, or page break character.
21		`blank`	Means a half-width space or tab.
22		`cntrl`	Means a control character. Specifically, this means 0x7f or any of the character codes in the range from 0x00 to 0x1f.
23		`graph`	Means any of the character codes in the range from 0x21 to 0x7e.
24		`print`	Means any of the character codes in the range from 0x20 to 0x7e.
25		`punct`	Means a single-byte symbolic character whose code is 0x7e or lower. The meaning is the same as [!-/\:-@\[-`\{-~].
26		`xdigit`	Means a hexadecimal character. The meaning is the same as `[a-fA-F0-9]`.
27	character-class	`\d`	Means any number. The meaning is the same as `[0-9]`.
28		`\D`	Means any character that is not a number. The meaning is the same as `[^0-9]`.
29		`\w`	Means any half-width uppercase letter (other than `\`, `@`, and `#`), any half-width lowercase letter, any number, or an underscore (`_`). The meaning is the same as `[a-zA-Z0-9_]`.
30		`\W`	Means any character that is not a half-width uppercase letter (other than `\`, `@`, and `#`), half-width lowercase letter, number, or underscore (`_`). The meaning is the same as `[^a-zA-Z0-9_]`.
31		`\s`	Means a half-width space, tab, carriage return, linefeed, vertical tab, or page break character.
32		`\S`	Means any character that is not a half-width space, tab, carriage return, linefeed, vertical tab, or page break character.
33		`\A`	Means the beginning of a match value.
34		`\Z`	Means the end of a match value.

FLAG {I|IGNORECASE}:

Specify this option to perform a search that ignores the difference between half-width uppercase letters (other than \, @, and #) and half-width lowercase letters.

Specifications of I and IGNORECASE are equivalent.

Note that if the character encoding that is used on the HADB server is Shift-JIS, this option cannot be specified.

To Page Top

(3) Regular expression specification examples

The following shows typical regular expression specification examples.

Table 7‒8: Typical regular expression specification examples
No.	Method	Pattern	Meaning	Example	Matched string
1	Forward match	`^nnn`	Begins with `nnn`	`^ACT`	`ACT`, `ACTOR`, `ACTION`, and other character strings that begin with `ACT`
2	Backward match	`nnn$`	Ends with `nnn`	`ING$`	`ING`, `BEING`, `HAVING`, and other character string that end with `ING`
3	Partial match	`nnn`	Includes `nnn` in any place	`Sun`	`Sun`, `Sunday`, `Sundays`, and other character strings that include `Sun`
4	Exact match	`^nnn$`	Equals to `nnn`	`^EQUAL$`	`EQUAL`
5	Middle match	`.nnn.`	Includes `nnn` that follows any character and is followed by any character	`.I.`	`BIT`, `HIT`, `KIT`, and other three-character strings whose second character is `I`
6	One or more repetitions	`mmm[0-9]+` or `mmm[[:digit:]]+`	Includes `mmm` in any place, and `mmm` is followed by any number	`KFAA[0-9]+` or `KFAA[[:digit:]]+`	`KFAA123`, `KFAA11104-E`, `KFAA11901-E`, and other character strings that begin with `KFAA` that is followed by any number
7	Selection of some characters	`^mmm.(n\|o)` or `^mmm.[no]`	Begins with `mmm` and contains n or o at the `i`-th character (`i` is a numeric value)	`^KFAA.(W\|E)` or `^KFAA.[WE]`	`KFAA20008-W`, `KFAA11901-E`, and other character strings that begin with `KFAA`, followed by `W` or `E`
8	`n` repetitions	`mmm{n}`	Begins with `mmm`, the last character of which is repeated `n` times	`123{3}`	`12333`

To Page Top

(4) Evaluation of predicates

If the specified match value includes an element of the character string set that is represented by the regular expression string, the predicate evaluates to true. In other cases, the predicate evaluates to false. Note that if the length of the regular expression string is 0, the predicate evaluates to true when the match value is not a null value.

If NOT is specified, the predicate evaluates to true when the specified match value does not include any string elements that are represented by the regular expression string. In other cases, the predicate evaluates to false. Note that if the length of the regular expression string is 0, the predicate evaluates to false when the match value is not a null value.

If the match value is a null value, the predicate will have no value.

To Page Top

(5) Rules

(a) Rules pertaining to the match value

If a dynamic parameter is specified by itself as the match value, the assumed data type of the dynamic parameter will be VARCHAR(32000).

(b) Rules pertaining to the escape character

If a backslash (\) is included in a regular expression string, the backslash (\) is treated as an escape character.
A special character that follows the escape character is treated as an ordinary character. The special characters that can be escaped are as follows:
- . (period)
- * (asterisk)
- + (plus sign)
- ? (question mark)
- | (vertical bar)
- ( (left parenthesis)
- ) (right parenthesis)
- { (left curly bracket)
- } (right curly bracket)
- [ (left square bracket)
- ] (right square bracket)
- \ (backslash)
- - (minus sign)^#
- : (colon)^#
- ^ (caret)
- $ (dollar sign)
#: Handled as a special character only if specified in a character list.
If the escape character is followed by an ordinary character, the escape character is skipped.
If the escape character is followed by no character, the escape character is skipped.
Two consecutive escape characters are treated as a single ordinary character.

To Page Top

(6) Considerations for performance

If a text index has been defined, the literal character in the regular expression is used to filter the pages by the text index. Therefore, in the same way as the LIKE predicate or scalar function CONTAINS, if the literal character in the regular expression is short simple text, such as a or 0, the effect of page filtering is lowered. Also, if the number of patterns that are represented by meta characters (parentheses, brackets, and quantifiers) increases, the text index is not used during a search because text-index-based page filtering takes time.

Therefore, you can improve the effect of text-index-based page filtering if you do not use meta characters to represent patterns. For example, when you search for the strings HADB and HiRDB, you can specify H(A|iR)DB or HADB | HiRDB as a search condition. In this case, the latter expression provides more effective text-index-based page filtering.

This also applies to repetition factors. For example, (abc){1,3} and abc | abcabc | abcabcabc have the same meaning. In this case, the latter expression provides more effective text-index-based page filtering.

To Page Top

(7) Example

Example

In this example, you search the data in column MSG of table T_MSG for the rows that contain a character string that begins with KFAA30 followed by a three-digit number, and ends with -E.

SELECT * FROM "T_MSG" 
    WHERE "MSG" LIKE_REGEX 'KFAA30[0-9]{3}-E'

The underlined portion is the specification of the LIKE_REGEX predicate.

The preceding LIKE_REGEX predicate specification matches, for example, the string KFAA30101-E.

To Page Top