Hitachi

Job Management Partner 1 Version 10 Job Management Partner 1/Advanced Shell Description, User's Guide, Reference, and Operator's Guide


sort command (sorts text files)

Organization of this page

Format

sort [-c|-m] [-b] [-f] [-n] [-r] [-u] [-z]
     [-k start-position[, end-position]] [-o output-path-name]
     [-T temporary-file-directory] [-t field-delimiter]
     [input-path-name ...]

Description

This command reads input from files or from the standard input and performs one of the following operations, then sends the results to the standard output:

Arguments

Specifying the operation

If no operation is specified, the default is to sort. The -r option specifies whether items are to be sorted in ascending or descending order.

-c

Specifies that a single specified file is to be checked to determine if it is already sorted. This check functionality determines whether a specified file is already correctly sorted.

If the file is in sorted order, the command terminates with a return code of 0. If the file is not in sorted order, the command outputs a message (sort: found disorder: field-contents) to the standard error output and terminates with a return code of 1.

Specifying more than one file when this option is specified results in an error (sort: too many input files for the -c option). This option takes precedence when it is specified at the same time as any other option except for the -u option. Specifying this option more than once does not result in an error.

If this option is not specified, the default operation is to sort.

-m

Specifies that input files are to be merged (and assumes that they are already sorted). The -m option is ignored if it is specified at the same time as the -c option. Specifying this option more than once does not result in an error.

If this option is not specified, the default operation is to sort.

Input and output specifications
-o output-path-name

Specifies a destination for the output when the output is not to be sent to the standard output.

The output file is created if it does not already exist. In UNIX, the permissions for a newly created file are set according to the umask.

If the file already exists, the sort command first sends the output to a temporary file, then renames the temporary file to the output file, which overwrites the original file. The temporary output file is created in the same directory as the input files. In UNIX, the permissions for the file are reset according to the umask.

If this option is specified more than once, the last specification takes effect.

In UNIX, if you specify /dev/stdout (also written in lowercase /dev/stdout in Windows) for the output path name, the standard output is used.

If you specify a symbolic link for the output path name, the link is deleted and a new file is created.

-T temporary-file-directory

Specifies the directory to be used internally by the sort command for creating temporary files.

A temporary file is a work file that is used for sort and merge operations that cannot be performed entirely in memory.

If this option is specified more than once, the last specification takes effect.

If this option is omitted, the following directory is used:

Windows: common-application-data-folder\HITACHI\JP1AS\misc

UNIX: Directory specified in the TMPDIR environment variable (/var/tmp if the TMPDIR environment variable is not defined)

input-path-name

Specifies an input file. If this option omitted or specified as -, the standard input is read as the input. The standard input is also read when /dev/stdin (also written as lowercase /dev/stdin in Windows) is specified.

Sort key specifications
-b

Specifies that leading spaces are to be ignored in determining the start and end positions of a sort key specified with the -k option. The -b option is valid when a sort key is specified with the -k option. The -b option cannot be specified after the -k option.

-f

Specifies that lowercase letters are not to be distinguished from uppercase letters for purposes of sorting. Specifying this option more than once does not result in an error.

-n

Specifies numeric sorting, with the initial numeric character string in each line handled as a number.

The -n option takes precedence over the -f option. This option can be specified multiple times.

Numeric values are handled as follows.

  • A numeric value is a character string composed of the ASCII characters 0 (0x30) through 9 (0x39).

  • Leading whitespaces (0x20 and 0x09) and zeros (0x30) are ignored.

  • A minus sign (0x2d) is allowed to precede a numeric value.

  • No more than one decimal point can be specified.

  • A numeric value can include a digits grouping character in the integer portion.

  • The decimal point and the digits grouping character in the integer portion depend on the locale. Typically, the period (.) is used as the decimal point, and the comma (,) is used as the digits grouping character.

  • Anything other than a numeric character string is treated as 0.

  • Do not specify for a sort key a numeric value that consists of more than 61 digits in the integer or more than 61 digits following the decimal point in a decimal value.

-r

Specifies sorting in descending order. If this option is not specified, the default is sorting in ascending order. This option can be specified multiple times.

Specifying the field separator
-t field-delimiter

Specifies the field delimiter. The field delimiter is not considered part of the field in determining the offset for the sort key. Consecutive field delimiters denote an empty field between them. You cannot specify the same character for the record delimiter.

If the -t option is omitted, fields are delimited by one or more consecutive whitespaces (consecutive spaces do not denote an empty field between them). Leading spaces are considered part of the field in determining the offset for the sort key.

If you specify more than one character for the field delimiter or if you specify a multibyte character, only the initial byte is used as the field delimiter (which cannot be the same byte value as the record delimiter).

If -t is specified but no field delimiter value is specified, the option or file name that follows immediately will be interpreted as the field delimiter during processing. To prevent this from happening, you must make sure to specify a delimiter. It is an error to specify this option multiple times (sort: multiple field-delimiters).

Specifying a sort key
-k start-position[, end-position]

Specifies the start and end positions of the sort key. If you specify more than one sort key, then all lines with the same value for the first sort key can be distinguished on the basis of the next sort key.

If start-position is greater than end-position, or if a specified field does not exist, the command assumes that no sort key is specified and all comparisons against this sort key are considered to be the same.

Specify start-position and end-position in the following format:

field-position[.indent][bfnr]
  • field-position

    Specifies the position of the field in the record. It is an error to specify a non-numeric value (sort: missing field number) or a negative value (sort: field numbers must be positive).

    You cannot specify 0 for the start position.

    If you specify 0 for the end position, the sort key is considered to extend to the end of the record.

    The maximum value that can be specified for field-position is the maximum value for the int type (overflow will occur if you specify a greater value).

    If you specify 0 for the field position of an end position, you cannot specify the indent described below.

  • indent

    Specifies an offset within the field. It is an error to specify a non-numeric or a negative value (sort: missing offset).

    The unit for the offset indentation is bytes. If the middle of a multibyte character is specified, evaluation occurs from that byte position.

    You cannot specify 0 for the indent of the start position.

    If you specify 0 for the indent of the end position, it is treated as though no indent were specified.

    The maximum value that can be specified for indent is the maximum value for the int type (overflow will occur if you specify a greater value).

    If you omit indent in start-position, the default is the first byte position of the field.

    If you omit indent in end-position, the default is the last byte position of the field.

  • Sort key options

    Specifies the b, f, n, or r option for sorting.

    The b option ignores leading spaces in determining the start or end position.

    The f option does not distinguish between lowercase and uppercase letters in sorting.

    The n option sorts numerically, treating the initial numeric character string in each line as a number.

    The r option sorts in descending order.

    The b option specified in start-position is valid only for start-position, and the b option specified in end-position is valid only for end-position. If no indent is specified in end-position, specification of the b option is disabled. For the options other than b, it does not matter whether they are specified for start-position or end-position (they function the same regardless of where they are specified.

Other specifications
-u

Specifies that when multiple records have the same sort key value, only one of them is to be output. If the -u option is specified at the same time as the -c option, a check is performed for whether there are records with the same sort key value. Specifying this option more than once does not result in an error.

-z

Specifies that the record delimiter is to be changed to NULL (0x00). It is an error to specify this option more than once (sort: multiple record delimiters).

In Windows, end-of-line codes are removed from the input data when the input is read and then are added back during output. For this reason, binary files must not be used as the input.

Sort function

Sorting works by reading one or more input files and running comparisons against one or more sort keys. The -k option is used to specify fields as sort keys. The -t option is used to specify a field delimiter for separating each record into fields.

If no sort key is specified, the entire record is considered to constitute the sort key. Sort keys are compared on a byte-by-byte basis.

If there are multiple sort keys, the first sort key specified is compared. If a match is found, the next sort key is compared, and comparing of sort keys continues until no match is found.

If there is a match on all the sort keys, the entire record is then compared byte-by-byte. Output is produced in ascending order with the -r option or in descending order without the -r option.

Sort key options

Two types of options apply to sort keys. When you specify one or more keys, for each global option that can be enabled, there is a corresponding local option that can be specified within the -k option. The -fnrb options are specified for the sort command globally. There are also corresponding local versions of the fnbr options that are specified within the -k option to the sort command. The global options cannot be specified after the -k option.

b

This option is enabled globally for both the start position and the end position specified in the -k option. However, it is disabled for the end position if no indent is specified for the end position or if an indent of 0 is specified for the end position.

The -b option is valid only when the -k option is specified.

f | n | r

When any of these options is specified locally, the local specification replaces the global specifications for the applicable field.

The following example illustrates global options:

-bfnr -k 1,1 -k 2,2

In this case, the -bnfr options are enabled for both the first and second fields. They are applied to the first and second fields as follows:

The following examples illustrate the range of sort keys when the global -b option is not specified.

-k 1

The sort key extends from the first field through the end of the record.

-k 1,1

The sort key is the entire first field.

-k 1,5

The sort key extends from the initial byte of the first field through the final byte of the fifth field.

-k 1.2,5.11

The sort key extends from the second byte of the first field through the eleventh byte of the fifth field.

-k 2,1

No sort key applies, because the fields are specified in reverse order, from the second to the first.

-k 2.1b, 5.1b

The sort key extends from the first byte (excluding leading whitespaces) of the second field through the first byte (excluding leading whitespaces) of the fifth field.

-k 2.1b, 5.0b

The sort key extends from the first byte (excluding leading whitespaces) of the second field through the final byte of the fifth field.

Merge function

The merge function aligns and integrates the input data by comparing the records of each pre-sorted input file. Even if the input files are not actually sorted, merging proceeds on the assumption that they are sorted. The example below illustrates merging of file1 and file2, whose contents are as follows:

file1
AAA
DDD
file2
BBB
AAA
The following command will merge file1 and file2:
# sort -m file1 file2

The file will be as follows:

AAA   (1st line of file1)    <-- Result of comparing 1st line of file1 to 1st line of file2
BBB   (1st line of file2)    <-- Result of comparing 2nd line of file1 to 1st line of file2
AAA   (2nd line of file2)    <-- Result of comparing 2nd line of file1 to 2nd line of file2
DDD   (2nd line of file1)    <-- 2nd line of file1, because no 3rd line in file2

Option to not distinguish between lowercase and uppercase (-f option)

In the examples below, the following input records are sorted:

file1
a:B
A:b
Sort with lowercase and uppercase letters distinguished:
$ sort -t : -k 2,2 file1

The sort key is set to the second field, which is delimited by :.

The following is the output of sorting with lowercase and uppercase letters distinguished:
a:B
A:b

Because B is smaller than b, a:B is output first.

Sort without distinguishing between lowercase and uppercase letters:
$ sort -f -t : -k 2,2 file1

In this case, the -f option is specified.

The following is the output of sorting without distinguishing between lowercase and uppercase letters:
A:b
a:B

In this case, the second fields are regarded to be the same value because they are compared without distinguishing between lowercase and uppercase, as specified by the -f option. Because the sort key values are the same, the records are compared byte-by-byte in their entirety. As a result, since A is smaller than a, A:b is output first.

Return codes

Return code

Meaning

0

Normal termination

1

Normal termination

  • The input data is not sorted (when the -c option is specified).

  • Duplicated key values exist (when the -c and -u options are specified).

2

Error termination

Notes

Usage examples

The following shows the format of the files used in the examples below to illustrate the results of executing the sort command.

The files listed above are used as input files in the following examples.