G.2 Syntax of regular expressions
The following regular expressions can be used in JP1/IM. Use them in accordance with the coding conventions explained below.
- Important
-
We advise against using regular expressions other than those described here because the specifications differ according to the computer model and operating system. Use only the regular expressions described below.
- Organization of this subsection
(1) Ordinary characters
An ordinary character is one that requires a complete match with itself when specified as the search target in a regular expression. The only characters not handled as ordinary characters are control codes and special characters.
(2) Special characters
Special characters are the following: ^ $ . * + ? | ( ) { } [ ] \. These special characters are explained below.
- ^
-
The caret (^) means the first characters (match the start). The caret is a special character only when used as the first character in a regular expression. When used elsewhere, the caret is handled as an ordinary character.
When a caret is specified as a special character, lines beginning with the specified string make a match.
- $
-
The dollar sign ($) means the last characters (match the end). It is a special character only when used as the last character in a regular expression. When used elsewhere, the dollar sign is handled as an ordinary character.
When a dollar sign is specified as a special character, lines ending with the specified string make a match. When $ and ^ are used together, lines containing only the specified string make a match.
- . (period)
-
The period (.) means any single character.
When a period is specified as a special character, any single character other than a linefeed character makes a match.
- *
-
The asterisk (*) means zero or more occurrences of the preceding character.
- +
-
In JP1-specific regular expressions and basic regular expressions, the plus sign (+) is handled as an ordinary character.
As a special character, + means one or more occurrences of the preceding character.
- ?
-
In JP1-specific regular expressions and basic regular expressions, the question mark (?) is handled as an ordinary character.
As a special character, ? means zero or one occurrence of the preceding character.
- |
-
In JP1-specific regular expressions and basic regular expressions, the vertical bar (|) is handled as an ordinary character.
As a special character, | means an OR condition between the regular expressions on either side. It is used in combination with the special characters ( ).
- ( )
-
In JP1-specific regular expressions and basic regular expressions, left and right parentheses are handled as ordinary characters.
As special characters, ( ) group the enclosed regular expression.
Parentheses are used to explicitly indicate to the program that the enclosed characters are a regular expression. They are mainly used with a vertical bar (|). (See G.4 Tips on using regular expressions.)
- { }
-
In JP1-specific regular expressions and basic regular expressions, curly brackets are handled as ordinary characters.
As special characters, { } mean that the preceding character occurs repeatedly for the number of times specified inside the curly brackets.
- [ ]
-
In JP1-specific regular expressions and basic XPG4 regular expressions, square brackets are handled as ordinary characters.
As special characters, [ ] mean a match with any of the characters enclosed in the square brackets (or with any character not enclosed if a caret (^) is the first character).
- \
-
The backslash (\) cancels a special character (^ $ . * + ? | ( ) { } [ ] \).#
A special character preceded by a backslash is handled as an ordinary character. Use the backslash only to cancel a special character. You can sometimes use an alphanumeric character as a regular expression indicating a control code (linefeed code or tab character, for example) by prefixing it with a backslash. However, this can lead to unintended behavior as the regular expression will be handled differently according to the operating system and product.
- #
-
In JP1-specific regular expressions and basic XPG4 regular expressions, the following are handled as ordinary characters: + ? | ( ) { } [ ]
In basic POSIX 1003.2 regular expressions, the following are handled as ordinary characters: + ? | ( ) { }