Hitachi

Hitachi Advanced Database SQL Reference


7.19.5 LIKE_REGEX predicate

Use the LIKE_REGEX predicate to search data by using a regular expression.

Organization of this subsection

(1) Specification format

LIKE_REGEX-predicate ::= match-value [NOT] LIKE_REGEX
                         regular-expression-string [FLAG {I | IGNORECASE}]
  match-value ::= value-expression
  regular-expression-string ::= character-string-literal

(2) Explanation of specification format

match-value:

Specifies the data to be retrieved in the form of a value expression. For details about value expressions, see 7.20 Value expression.

You must specify CHAR or VARCHAR type data for match-value.

NOT:

If you specify NOT, character strings that do not include any character string elements represented by the specified regular expression string are retrieved.

regular-expression-string:

Specify a regular expression in the form of a character string literal. For details about character string literals, see 6.3 Literals.

The regular expression string must be no more than 1,024 bytes long.

Specify the regular expression string in the following format:

regular-expression::={[regular-item]|regular-expression vertical-bar regular-expression}

regular-item::={regular-factor|regular-item regular-factor}

regular-factor::={regular-primary|regular-primary *|regular-primary +|regular-primary ?
             |regular-primary repetition-factor}

repetition-factor::=left-curly-bracket lower-limit [,[upper-limit]]right-curly-bracket

regular-primary::={character-specifier|character-class|.|^|$|regular-character-set|(regular-expression)}

character-specifier::={non-escaped-character|escaped-character}

regular-character-set::={[character-list...]|[^character-list...]}

character-list::={character-specifier|character-specifier - character-specifier|regular-character-set-identifier-specification}

regular-character-set-identifier-specification::=[:regular-character-set-identifier:]

The following table describes the regular expression rules.

Table 7‒7: Regular expression rules

No.

Regular expression

Meaning

1

Character specifier

Means a character string of length 1 (unit: characters).

2

. (period)

Means any character of length 1 (unit: characters).

3

^ (caret)

Means the beginning of a match value. For a match value that includes a line break, this symbol does not mean the beginning of the line after the line break.

If a caret (^) is specified in a pair of square brackets that enclose a regular character set, this pattern means any characters other than the listed characters.

4

$ (dollar sign)

Means the end of a match value. For a match value that includes a line break, this symbol does not mean the end of the line before the line break.

5

regular-primary*

Means zero or more repetitions of the preceding regular primary.

6

regular-primary+

Means one or more repetitions of the preceding regular primary.

7

regular-primary?

Means zero or one repetition of the preceding regular primary.

8

regular-expression | regular-expression

Means the regular expression that is specified to the left or right of the vertical bar (|).

9

regular-primary{n}

regular-primary{n,m}

regular-primary{n,}

Means a repetition of the preceding regular primary. The following describes the specification patterns:

{n}: The preceding regular primary is repeated n times.

{n,m}: The preceding regular primary is repeated by the number of times in the range from n to m.

{n,}: The preceding regular primary is repeated at least n times.

10

upper-limit

An integer in the range from 0 to 256.

11

lower-limit

An integer in the range from 0 to 256.

12

[character-list...]

Means any of the listed characters.

13

[^character-list...]

Means any characters other than the listed characters.

14

character-specifier-1 - character-specifier-2

Means any character in the range (of character codes) from character-specifier-1 to character-specifier-2.

If an option that ignores case (FLAG I or FLAG IGNORECASE is specified), this pattern means the following, in addition to the half-width letters in the specified range of character codes: the corresponding half-width uppercase letter for any half-width lowercase letter in the specified range, and the corresponding half-width lowercase letter for any half-width uppercase letter in the specified range.

15

regular-character-set-identifier

alpha

Means any half-width uppercase letter (other than \, @, and #) or half-width lowercase letter. The meaning is the same as [a-zA-Z].

16

upper

Means any half-width uppercase letter (other than \, @, and #). The meaning is the same as [A-Z].

17

lower

Means any half-width lowercase letters. The meaning is the same as [a-z].

18

digit

Means any number. The meaning is the same as [0-9].

19

alnum

Means any half-width uppercase letter (other than \, @, and #), half-width lowercase letter, or number. The meaning is the same as [a-zA-Z0-9].

20

space

Means a half-width space, tab, carriage return, linefeed, vertical tab, or page break character.

21

blank

Means a half-width space or tab.

22

cntrl

Means a control character. Specifically, this means 0x7f or any of the character codes in the range from 0x00 to 0x1f.

23

graph

Means any of the character codes in the range from 0x21 to 0x7e.

24

print

Means any of the character codes in the range from 0x20 to 0x7e.

25

punct

Means a single-byte symbolic character whose code is 0x7e or lower. The meaning is the same as [!-/\:-@\[-`\{-~].

26

xdigit

Means a hexadecimal character. The meaning is the same as [a-fA-F0-9].

27

character-class

\d

Means any number. The meaning is the same as [0-9].

28

\D

Means any character that is not a number. The meaning is the same as [^0-9].

29

\w

Means any half-width uppercase letter (other than \, @, and #), any half-width lowercase letter, any number, or an underscore (_). The meaning is the same as [a-zA-Z0-9_].

30

\W

Means any character that is not a half-width uppercase letter (other than \, @, and #), half-width lowercase letter, number, or underscore (_). The meaning is the same as [^a-zA-Z0-9_].

31

\s

Means a half-width space, tab, carriage return, linefeed, vertical tab, or page break character.

32

\S

Means any character that is not a half-width space, tab, carriage return, linefeed, vertical tab, or page break character.

33

\A

Means the beginning of a match value.

34

\Z

Means the end of a match value.

FLAG {I|IGNORECASE}:

Specify this option to perform a search that ignores the difference between half-width uppercase letters (other than \, @, and #) and half-width lowercase letters.

Specifications of I and IGNORECASE are equivalent.

Note that if the character encoding that is used on the HADB server is Shift-JIS, this option cannot be specified.

(3) Regular expression specification examples

The following shows typical regular expression specification examples.

Table 7‒8: Typical regular expression specification examples

No.

Method

Pattern

Meaning

Example

Matched string

1

Forward match

^nnn

Begins with nnn

^ACT

ACT, ACTOR, ACTION, and other character strings that begin with ACT

2

Backward match

nnn$

Ends with nnn

ING$

ING, BEING, HAVING, and other character string that end with ING

3

Partial match

nnn

Includes nnn in any place

Sun

Sun, Sunday, Sundays, and other character strings that include Sun

4

Exact match

^nnn$

Equals to nnn

^EQUAL$

EQUAL

5

Middle match

.nnn.

Includes nnn that follows any character and is followed by any character

.I.

BIT, HIT, KIT, and other three-character strings whose second character is I

6

One or more repetitions

mmm[0-9]+

or

mmm[[:digit:]]+

Includes mmm in any place, and mmm is followed by any number

KFAA[0-9]+

or

KFAA[[:digit:]]+

KFAA123, KFAA11104-E, KFAA11901-E, and other character strings that begin with KFAA that is followed by any number

7

Selection of some characters

^mmm.*(n|o)

or

^mmm.*[no]

Begins with mmm and contains n or o at the i-th character (i is a numeric value)

^KFAA.*(W|E)

or

^KFAA.*[WE]

KFAA20008-W, KFAA11901-E, and other character strings that begin with KFAA, followed by W or E

8

n repetitions

mmm{n}

Begins with mmm, the last character of which is repeated n times

123{3}

12333

(4) Evaluation of predicates

If the specified match value includes an element of the character string set that is represented by the regular expression string, the predicate evaluates to true. In other cases, the predicate evaluates to false. Note that if the length of the regular expression string is 0, the predicate evaluates to true when the match value is not a null value.

If NOT is specified, the predicate evaluates to true when the specified match value does not include any string elements that are represented by the regular expression string. In other cases, the predicate evaluates to false. Note that if the length of the regular expression string is 0, the predicate evaluates to false when the match value is not a null value.

If the match value is a null value, the predicate will have no value.

(5) Rules

(a) Rules pertaining to the match value

  1. If a dynamic parameter is specified by itself as the match value, the assumed data type of the dynamic parameter will be VARCHAR(32000).

(b) Rules pertaining to the escape character

  1. If a backslash (\) is included in a regular expression string, the backslash (\) is treated as an escape character.

  2. A special character that follows the escape character is treated as an ordinary character. The special characters that can be escaped are as follows:

    • . (period)

    • * (asterisk)

    • + (plus sign)

    • ? (question mark)

    • | (vertical bar)

    • ( (left parenthesis)

    • ) (right parenthesis)

    • { (left curly bracket)

    • } (right curly bracket)

    • [ (left square bracket)

    • ] (right square bracket)

    • \ (backslash)

    • - (minus sign)#

    • : (colon)#

    • ^ (caret)

    • $ (dollar sign)

    #: Handled as a special character only if specified in a character list.

  3. If the escape character is followed by an ordinary character, the escape character is skipped.

  4. If the escape character is followed by no character, the escape character is skipped.

  5. Two consecutive escape characters are treated as a single ordinary character.

(6) Considerations for performance

If a text index has been defined, the literal character in the regular expression is used to filter the pages by the text index. Therefore, in the same way as the LIKE predicate or scalar function CONTAINS, if the literal character in the regular expression is short simple text, such as a or 0, the effect of page filtering is lowered. Also, if the number of patterns that are represented by meta characters (parentheses, brackets, and quantifiers) increases, the text index is not used during a search because text-index-based page filtering takes time.

Therefore, you can improve the effect of text-index-based page filtering if you do not use meta characters to represent patterns. For example, when you search for the strings HADB and HiRDB, you can specify H(A|iR)DB or HADB | HiRDB as a search condition. In this case, the latter expression provides more effective text-index-based page filtering.

This also applies to repetition factors. For example, (abc){1,3} and abc | abcabc | abcabcabc have the same meaning. In this case, the latter expression provides more effective text-index-based page filtering.

(7) Example

Example

In this example, you search the data in column MSG of table T_MSG for the rows that contain a character string that begins with KFAA30 followed by a three-digit number, and ends with -E.

SELECT * FROM "T_MSG" 
    WHERE "MSG" LIKE_REGEX 'KFAA30[0-9]{3}-E'

The underlined portion is the specification of the LIKE_REGEX predicate.

The preceding LIKE_REGEX predicate specification matches, for example, the string KFAA30101-E.