Hitachi

Job Management Partner 1 Version 10 Job Management Partner 1/Advanced Shell Description, User's Guide, Reference, and Operator's Guide


awk command (performs text processing and pattern matching)

Organization of this page

Format

awk [-F input-field-separator] [-v variable-name=variable-value]... [-f script-file-path-name|script]
   [[target-path-name...]|[built-in-variable-name=variable-value...]]...

Description

This command retrieves lines (referred to hereafter as records) in a text file that match a particular pattern and performs specified processing on the retrieved lines.

Arguments

-F input-field-separator

Specifies the value to be used as the input field separator. The specified value becomes the value of the awk command's FS built-in variable.

-v variable-name=variable-value

Specifies a variable name and its value. The variable name and its value are passed to the script that is specified in a script file or in the argument specified in the -f option. Multiple variables can be specified. If you specify the same variable name more than once, the last specification takes effect.

-f script-file-path-name

Specifies the path name of a file (script file) that contains the patterns to be matched in the input files and the processing instructions for the records that match the patterns.

  • If - is specified as the path name, the standard input is assumed for the input.

  • Up to 19 -f options can be specified.

script

Specifies a pattern to be matched in the input files and the processing instructions for the records that match the pattern.

target-path-name

Specifies the path name of a file to be processed. Multiple path names can be specified.

If no path name is specified or - is specified as the path name, input is read from the standard input. Note that if only the BEGIN pattern is executed, no records are retrieved from the specified file or from the standard input.

built-in-variable-name=variable-value

Specifies the name of a built-in variable and its value. The variable name and its value are passed to the scripts that are specified in the script file or in the script argument specified in the -f option.

  • If you specify a name that is not for a listed built-in variable, it is treated the same as the -v option.

  • A built-in variable that is specified before all the target path names is enabled for all file processing, except for BEGIN pattern processing, and is also enabled for END pattern processing.

  • A built-in variable that is specified after all the target path names is enabled for END pattern processing only.

  • A built-in variable that is specified between target path names is enabled for processing of the path names specified after the variable specification and for END pattern processing.

Scripts (patterns and actions)

The following is the descriptive format of a script executed by the awk command:

[pattern] [{[action]}]

A pattern to be searched for in the input files is defined in pattern. For details about the pattern specification, see Types of patterns below. Processing instructions for records that match the pattern are defined in action.

Each successive record from the input file is compared to each specified pattern, and the action specified for a pattern is executed when a match is found for that pattern. A specified action can be performed on all records by not specifying a pattern (omitting pattern and specifying action only).

You specify for the action control statements and functions that perform desired processing on the records that match the specified pattern. The action operation can include control statements, built-in functions, user-defined functions, variables, and operators. Multiple statements, separated by an end-of-line code or semicolon, are permitted. When the entire {action} portion is omitted, including the braces, the matching records are output to the standard output. If you specify the empty braces without action ({ }), no processing is performed.

To include a comment, specify a hash mark (#) and then the comment string. Everything from the hash mark to the end of the line is treated as a comment.

Records and fields

A record is a unit that is obtained by using the input record separator to split up the input. In awk, an end-of-line code serves as the input record separator. In Windows, an end-of-line code is denoted by [CR] + [LF] or by [LF]. In UNIX, an end-of-line code is denoted by [LF]. Note that in UNIX, if [CR] + [LF] is used for the end-of-line code, the [CR] part will be included in the resulting record.

The input record separator can be changed by setting in the RS built-in variable any single-byte character to serve as the new record separator. If a character string is specified, only the first character in the character string becomes the input record separator.

Records are divided by field separators into units called fields. The default field separator is the space. The field separator can be changed by specifying in the -F option or by setting in the FS built-in variable any character string to serve as the new field separator.

The input passed to the specified action consists of the contents of the record currently being read from the input file and the value of each field in the record. The entire contents of the record are stored in field variable $0. The first field of the record is stored in field variable $1, the second field is stored in field variable $2, and so on.

Types of patterns

The following types of patterns can be specified.

Control statements

The control statements that can be used are described in the table below. The if, while, for, do, break, continue, and return statements are subject to the C language syntax rules. An exception is the for statement, which is limited to a single initialization expression and a single increment expression.

Control statement

Syntax

Description

if statement

if (conditional-expression) processing [else processing]

Branch conditionally.

if (variable in array) processing [else processing]

Branch based on whether the index specified in variable exists in array.

while statement

while (conditional-expression) processing

Repeat as long as the condition is true.

for statement

for (initialization-expression; continuation-conditional-expression; increment-expression) processing

Execute repeatedly.

for (variable in array) processing

Perform processing while setting variable to successive indexes of the array. Note that the indexes are retrieved in no particular order.

do statement

do processing while (continuation-conditional-expression)

Repeat as long as the condition at the end remains true.

break statement

break

Exit immediately from a loop.

continue statement

continue

Interrupt loop processing and return to the beginning of the next cycle of the loop.

next statement

next

Stop processing the current input record after this control statement, and start processing the next input record.

nextfile statement

nextfile

Stop processing the current input file after this control statement, and start processing the next input file.

return statement

return [expr]

Exit a user-defined function. The value specified in the expression expr is returned to the caller. If no value is specified for the expr expression, the return value of the user-defined function will be 0.

delete statement

delete array

Delete an array.

delete array[element]

Delete an element of the array.

exit statement

exit [expr]

Stop execution of a script during processing. The value specified in the expr expression is returned as the return code of the command. If no value is specified for the expr expression, the return code of the command will be 0.

The value specified in the expr expression is treated as a signed four-byte numeric value. In Windows, the value specified in the expression expr will be the return code of the command. In UNIX, when the value specified in the expr expression is outside the range 0 to 255, the low-order 8 bits of the value will be the return code of the command. If you are executing a job definition script in JP1/Advanced Shell, specify a value in the range 0 to 255.

If you are executing a job definition script in JP1/Advanced Shell in Windows and the value specified in the expr expression is outside the range 0 to 255, the return code that is returned to the caller of the command will be different from the value specified in the expr expression. For details about how the return codes of commands are handled in JP1/Advanced Shell, see 5.8.8 Return codes of jobs, job steps, and commands.

Built-in functions

The following built-in functions can be used.

User-defined functions

In addition to the built-in functions, you can also define your own functions. The syntax of a user-defined function is as follows:

function | func name([param[, ...]]) { statements }

The function name name must be specified in alphanumeric characters and the underscore (_), and the first character must be non-numeric.

You can specify in param arguments to the function using the names of user-defined variables or arrays. An argument is passed to the function by its value in the case of a user-defined variable or as a reference in the case of an array.

A parsing error does not occur if the number of arguments specified in the function definition differs from the number of arguments specified when the function is called. However, if the number of arguments specified when the function is called is greater than the number of arguments specified in the function definition, a warning message is output. The arguments specified in the function definition are considered local variables, but if the number of arguments specified when the function is called is greater than the number of arguments specified in the function definition, the extra arguments at the time of the function call are considered global variables.

A maximum of 50 arguments can be specified in a function definition. Similarly, a maximum of 50 arguments can be specified when the function is called. A check is performed at the time the function is called to confirm that the number of arguments does not exceed 50.

Variables

The types of variables used in scripts include user-defined variables, field variables, built-in variables, and arrays. User-defined variables and arrays are generated the first time they are used in a script. Note that the initial value stored in an uninitialized variable (one that has not been used in an arithmetic or assignment expression) is 0 in the case of a numeric value and NULL in the case of a character string.

The type of the value of a variable changes to numeric or character string depending on the situation in which it is used. However, non-numeric character strings have a numeric value of 0. For example, in the following example, the two print functions both produce 7 as the output:

x = "3" + "4"
y = 3 + 4
print x
print y

The following describes each type of variable.

    a = 1
    print $a
    print $1

Operators

The operators that can be used are listed and described in the table below, in order of lowest to highest priority. For an expression with operators at the same priority level, the operators are listed from left to right in order of highest to lowest priority.

Operator

Description

=, +=, -=, *=, /=, %=, ^=, **=

Assignment operators

?:

Ternary operator

||

Logical OR

&&

Logical AND

~, !~

Operators for match (~) or fail to match (!~) a regular expression

<, <=, >, >=, !=, ==

Relational operators

space

Concatenation of character strings

+, -

Addition and subtraction

*, /, %

Multiplication, division, and modulus

+, -, !

Unary and logical negation

^, **

Exponentiation

++, --

Increment and decrement operators

Output format

The following table lists and describes the conversion specifiers that follow % to indicate conversion specifications in the printf and sprintf functions:

Character

Description

c

Output as a single-byte character.

s

Output as a character string.

d

Output as a signed decimal integer.

i

o

Output as an unsigned octal integer.

x

Output as an unsigned hexadecimal integer. The values 10 through 15 use abcdef.

X

Output as an unsigned hexadecimal integer. The values 10 through 15 use ABCDEF.

u

Output as an unsigned decimal integer.

f

Output as a floating point number. It is converted to the format [-]dddd.dddd.

e

Output as a floating point number. It is converted to the format [-]d.dddde[+-]dd[d].

g

Output in the signed format of conversion specifier e or f, depending on which is able to represent the specified value and precision in the shortest way. Trailing zeros are not output.

E

Output as a floating point number. It is converted to the format [-]d.ddddE[+-]dd[d].

G

Output in the signed format of conversion specifier E or f, depending on which is able to represent the specified value and precision in the shortest way. Trailing zeros are not output.

%

Output as the % character.

Escape characters

You can use escape characters as follows:

The following table shows the escape characters that can be used:

Escape character

Meaning

\a

Alert character (bell)

\b

Backspace character

\f

Formfeed character (page break)

\n

Linefeed character

\r

Carriage return character

\t

Tab character

\v

Vertical tab character

\d, \dd, \ddd

Character represented by one, two, or three octal digits.#1 You cannot specify a numeric value that denotes 0.

\xhex

Character represented by a hexadecimal value (0 to 9, a to f, A to F).#1, #2 You cannot specify a numeric value that denotes 0.

\c

Any literal character (for example, \" for ")

\\

A single backslash character

#1

If you specify a pattern or regular expression enclosed in forward slashes (/), there are values that cannot be specified depending on the character encoding at the time of execution. The hexadecimal representations of the permissible values for each character encoding are shown below. If you specify a value that is outside these values, termination with an error will occur.

Character encoding

Permitted values (in hexadecimal)

Shift JIS

0x01 to 0x80, 0xA0 to 0xDF, 0xFD to 0xFF

UTF-8

0x01 to 0xBF, 0xFE to 0xFF

EUC

0x01 to 0x8D, 0x90 to 0xA0, 0xFF

C

0x01 to 0xFF

#2

If \xhex is specified in a character string enclosed in double quotation marks ("), the hexadecimal digits are assumed to extend from \x to the first non-hexadecimal character. If the hexadecimal representation exceeds 98 characters, only the first 98 characters will be used. When the hexadecimal representation exceeds two characters, the results of converting the hexadecimal values from their hexadecimal representation are not guaranteed.

A backslash (\) specified in variable values in the -v option and in built-in variable values in arguments is treated as an escape character (except when it is enclosed in single quotation marks (')). Specify path names carefully. The following shows examples.

Example 1: The correct path name, c:\a\b\c, cannot be passed to the awk command's scripts.

In this example, \ is deleted because it is processed as an escape character. As a result, c:\a\b\c is set in VAR001.

After that, \ is processed as an escape character again when the value is set in the variable in the awk command. As a result, c:abc is set in VAR001:

CCC01="c:\\a\\b\\c"
awk -v VAR001="${CCC01}"  -f prog01.awk

Similarly, the following two examples also cannot pass the correct path name:

awk -v VAR001=c:\\a\\b\\c  -f prog01.awk
awk -v VAR001='c:\a\b\c'  -f prog01.awk
Example 2: The correct path name, c:\a\b\c, is passed to the awk command's scripts.

In this example, c:\\a\\b\\c is stored in both CCC01 and CCC02.

After that, when the value is stored in VAR001 in the awk command, it becomes c:\a\b\c, thereby processing correctly:

CCC01='c:\\a\\b\\c'
CCC02="c:\\\\a\\\\b\\\\c"
awk -v VAR001="${CCC01}"  -f prog01.awk
awk -v VAR001="${CCC02}"  -f prog01.awk

Similarly, the following two examples can also pass the correct path name:

awk -v VAR001=c:\\\\a\\\\b\\\\c  -f prog01.awk
awk -v VAR001='c:\\a\\b\\c'  -f prog01.awk

Special file names

Special file names can be used to represent the input source and output destination when you use the getline function to read from the standard input or the print or printf function to output to the standard output or standard error output. The table below lists the special file names that are available. Note that attempting to apply the close function to a special file name will have no effect.

Special file name

Meaning

/dev/stdin

Standard input

/dev/stdout

Standard output

/dev/stderr

Standard error output

Return codes

Return code

Meaning

0

Normal termination

1 or greater

Error termination

Value specified in the exit statement

Command return code specified in the exit control statement

Notes

Usage examples