7.19.5 LIKE_REGEX predicate
Use the LIKE_REGEX predicate to search data by using a regular expression.
- Organization of this subsection
(1) Specification format
LIKE_REGEX-predicate ::= match-value [NOT] LIKE_REGEX regular-expression-string [FLAG {I | IGNORECASE}] match-value ::= value-expression regular-expression-string ::= character-string-literal
(2) Explanation of specification format
- match-value:
-
Specifies the data to be retrieved in the form of a value expression. For details about value expressions, see 7.20 Value expression.
You must specify CHAR or VARCHAR type data for match-value.
- NOT:
-
If you specify NOT, character strings that do not include any character string elements represented by the specified regular expression string are retrieved.
- regular-expression-string:
-
Specify a regular expression in the form of a character string literal. For details about character string literals, see 6.3 Literals.
The regular expression string must be no more than 1,024 bytes long.
Specify the regular expression string in the following format:
regular-expression::={[regular-item]|regular-expression vertical-bar regular-expression} regular-item::={regular-factor|regular-item regular-factor} regular-factor::={regular-primary|regular-primary *|regular-primary +|regular-primary ? |regular-primary repetition-factor} repetition-factor::=left-curly-bracket lower-limit [,[upper-limit]]right-curly-bracket regular-primary::={character-specifier|character-class|.|^|$|regular-character-set|(regular-expression)} character-specifier::={non-escaped-character|escaped-character} regular-character-set::={[character-list...]|[^character-list...]} character-list::={character-specifier|character-specifier - character-specifier|regular-character-set-identifier-specification} regular-character-set-identifier-specification::=[:regular-character-set-identifier:]
The following table describes the regular expression rules.
Table 7‒7: Regular expression rules No.
Regular expression
Meaning
1
Character specifier
Means a character string of length 1 (unit: characters).
2
. (period)
Means any character of length 1 (unit: characters).
3
^ (caret)
Means the beginning of a match value. For a match value that includes a line break, this symbol does not mean the beginning of the line after the line break.
If a caret (^) is specified in a pair of square brackets that enclose a regular character set, this pattern means any characters other than the listed characters.
4
$ (dollar sign)
Means the end of a match value. For a match value that includes a line break, this symbol does not mean the end of the line before the line break.
5
regular-primary*
Means zero or more repetitions of the preceding regular primary.
6
regular-primary+
Means one or more repetitions of the preceding regular primary.
7
regular-primary?
Means zero or one repetition of the preceding regular primary.
8
regular-expression | regular-expression
Means the regular expression that is specified to the left or right of the vertical bar (|).
9
regular-primary{n}
regular-primary{n,m}
regular-primary{n,}
Means a repetition of the preceding regular primary. The following describes the specification patterns:
{n}: The preceding regular primary is repeated n times.
{n,m}: The preceding regular primary is repeated by the number of times in the range from n to m.
{n,}: The preceding regular primary is repeated at least n times.
10
upper-limit
An integer in the range from 0 to 256.
11
lower-limit
An integer in the range from 0 to 256.
12
[character-list...]
Means any of the listed characters.
13
[^character-list...]
Means any characters other than the listed characters.
14
character-specifier-1 - character-specifier-2
Means any character in the range (of character codes) from character-specifier-1 to character-specifier-2.
If an option that ignores case (FLAG I or FLAG IGNORECASE is specified), this pattern means the following, in addition to the half-width letters in the specified range of character codes: the corresponding half-width uppercase letter for any half-width lowercase letter in the specified range, and the corresponding half-width lowercase letter for any half-width uppercase letter in the specified range.
15
regular-character-set-identifier
alpha
Means any half-width uppercase letter (other than \, @, and #) or half-width lowercase letter. The meaning is the same as [a-zA-Z].
16
upper
Means any half-width uppercase letter (other than \, @, and #). The meaning is the same as [A-Z].
17
lower
Means any half-width lowercase letters. The meaning is the same as [a-z].
18
digit
Means any number. The meaning is the same as [0-9].
19
alnum
Means any half-width uppercase letter (other than \, @, and #), half-width lowercase letter, or number. The meaning is the same as [a-zA-Z0-9].
20
space
Means a half-width space, tab, carriage return, linefeed, vertical tab, or page break character.
21
blank
Means a half-width space or tab.
22
cntrl
Means a control character. Specifically, this means 0x7f or any of the character codes in the range from 0x00 to 0x1f.
23
graph
Means any of the character codes in the range from 0x21 to 0x7e.
24
print
Means any of the character codes in the range from 0x20 to 0x7e.
25
punct
Means a single-byte symbolic character whose code is 0x7e or lower. The meaning is the same as [!-/\:-@\[-`\{-~].
26
xdigit
Means a hexadecimal character. The meaning is the same as [a-fA-F0-9].
27
character-class
\d
Means any number. The meaning is the same as [0-9].
28
\D
Means any character that is not a number. The meaning is the same as [^0-9].
29
\w
Means any half-width uppercase letter (other than \, @, and #), any half-width lowercase letter, any number, or an underscore (_). The meaning is the same as [a-zA-Z0-9_].
30
\W
Means any character that is not a half-width uppercase letter (other than \, @, and #), half-width lowercase letter, number, or underscore (_). The meaning is the same as [^a-zA-Z0-9_].
31
\s
Means a half-width space, tab, carriage return, linefeed, vertical tab, or page break character.
32
\S
Means any character that is not a half-width space, tab, carriage return, linefeed, vertical tab, or page break character.
33
\A
Means the beginning of a match value.
34
\Z
Means the end of a match value.
- FLAG {I|IGNORECASE}:
-
Specify this option to perform a search that ignores the difference between half-width uppercase letters (other than \, @, and #) and half-width lowercase letters.
Specifications of I and IGNORECASE are equivalent.
Note that if the character encoding that is used on the HADB server is Shift-JIS, this option cannot be specified.
(3) Regular expression specification examples
The following shows typical regular expression specification examples.
No. |
Method |
Pattern |
Meaning |
Example |
Matched string |
---|---|---|---|---|---|
1 |
Forward match |
^nnn |
Begins with nnn |
^ACT |
ACT, ACTOR, ACTION, and other character strings that begin with ACT |
2 |
Backward match |
nnn$ |
Ends with nnn |
ING$ |
ING, BEING, HAVING, and other character string that end with ING |
3 |
Partial match |
nnn |
Includes nnn in any place |
Sun |
Sun, Sunday, Sundays, and other character strings that include Sun |
4 |
Exact match |
^nnn$ |
Equals to nnn |
^EQUAL$ |
EQUAL |
5 |
Middle match |
.nnn. |
Includes nnn that follows any character and is followed by any character |
.I. |
BIT, HIT, KIT, and other three-character strings whose second character is I |
6 |
One or more repetitions |
mmm[0-9]+ or mmm[[:digit:]]+ |
Includes mmm in any place, and mmm is followed by any number |
KFAA[0-9]+ or KFAA[[:digit:]]+ |
KFAA123, KFAA11104-E, KFAA11901-E, and other character strings that begin with KFAA that is followed by any number |
7 |
Selection of some characters |
^mmm.*(n|o) or ^mmm.*[no] |
Begins with mmm and contains n or o at the i-th character (i is a numeric value) |
^KFAA.*(W|E) or ^KFAA.*[WE] |
KFAA20008-W, KFAA11901-E, and other character strings that begin with KFAA, followed by W or E |
8 |
n repetitions |
mmm{n} |
Begins with mmm, the last character of which is repeated n times |
123{3} |
12333 |
(4) Evaluation of predicates
If the specified match value includes an element of the character string set that is represented by the regular expression string, the predicate evaluates to true. In other cases, the predicate evaluates to false. Note that if the length of the regular expression string is 0, the predicate evaluates to true when the match value is not a null value.
If NOT is specified, the predicate evaluates to true when the specified match value does not include any string elements that are represented by the regular expression string. In other cases, the predicate evaluates to false. Note that if the length of the regular expression string is 0, the predicate evaluates to false when the match value is not a null value.
If the match value is a null value, the predicate will have no value.
(5) Rules
(a) Rules pertaining to the match value
-
If a dynamic parameter is specified by itself as the match value, the assumed data type of the dynamic parameter will be VARCHAR(32000).
(b) Rules pertaining to the escape character
-
If a backslash (\) is included in a regular expression string, the backslash (\) is treated as an escape character.
-
A special character that follows the escape character is treated as an ordinary character. The special characters that can be escaped are as follows:
-
. (period)
-
* (asterisk)
-
+ (plus sign)
-
? (question mark)
-
| (vertical bar)
-
( (left parenthesis)
-
) (right parenthesis)
-
{ (left curly bracket)
-
} (right curly bracket)
-
[ (left square bracket)
-
] (right square bracket)
-
\ (backslash)
-
- (minus sign)#
-
: (colon)#
-
^ (caret)
-
$ (dollar sign)
#: Handled as a special character only if specified in a character list.
-
-
If the escape character is followed by an ordinary character, the escape character is skipped.
-
If the escape character is followed by no character, the escape character is skipped.
-
Two consecutive escape characters are treated as a single ordinary character.
(6) Considerations for performance
If a text index has been defined, the literal character in the regular expression is used to filter the pages by the text index. Therefore, in the same way as the LIKE predicate or scalar function CONTAINS, if the literal character in the regular expression is short simple text, such as a or 0, the effect of page filtering is lowered. Also, if the number of patterns that are represented by meta characters (parentheses, brackets, and quantifiers) increases, the text index is not used during a search because text-index-based page filtering takes time.
Therefore, you can improve the effect of text-index-based page filtering if you do not use meta characters to represent patterns. For example, when you search for the strings HADB and HiRDB, you can specify H(A|iR)DB or HADB | HiRDB as a search condition. In this case, the latter expression provides more effective text-index-based page filtering.
This also applies to repetition factors. For example, (abc){1,3} and abc | abcabc | abcabcabc have the same meaning. In this case, the latter expression provides more effective text-index-based page filtering.
(7) Example
- Example
-
In this example, you search the data in column MSG of table T_MSG for the rows that contain a character string that begins with KFAA30 followed by a three-digit number, and ends with -E.
SELECT * FROM "T_MSG" WHERE "MSG" LIKE_REGEX 'KFAA30[0-9]{3}-E'
The underlined portion is the specification of the LIKE_REGEX predicate.
The preceding LIKE_REGEX predicate specification matches, for example, the string KFAA30101-E.