sort command (sorts text files)

Organization of this page

Format
Description
Arguments
Sort function
Sort key options
Merge function
Option to not distinguish between lowercase and uppercase (-f option)
Return codes
Notes
Usage examples

Format

sort [-c|-m] [-b] [-f] [-n] [-r] [-u] [-z]
     [-k start-position[, end-position]] [-o output-path-name]
     [-T temporary-file-directory] [-t field-delimiter]
     [input-path-name ...]

To Page Top

Description

This command reads input from files or from the standard input and performs one of the following operations, then sends the results to the standard output:

Sort
Merge
Check whether the input is already sorted

To Page Top

Arguments

Specifying the operation

If no operation is specified, the default is to sort. The -r option specifies whether items are to be sorted in ascending or descending order.

-c

Specifies that a single specified file is to be checked to determine if it is already sorted. This check functionality determines whether a specified file is already correctly sorted.

If the file is in sorted order, the command terminates with a return code of 0. If the file is not in sorted order, the command outputs a message (sort: found disorder: field-contents) to the standard error output and terminates with a return code of 1.

Specifying more than one file when this option is specified results in an error (sort: too many input files for the -c option). This option takes precedence when it is specified at the same time as any other option except for the -u option. Specifying this option more than once does not result in an error.

If this option is not specified, the default operation is to sort.

-m

Specifies that input files are to be merged (and assumes that they are already sorted). The -m option is ignored if it is specified at the same time as the -c option. Specifying this option more than once does not result in an error.

If this option is not specified, the default operation is to sort.

Input and output specifications

-o output-path-name

Specifies a destination for the output when the output is not to be sent to the standard output.

The output file is created if it does not already exist. In UNIX, the permissions for a newly created file are set according to the umask.

If the file already exists, the sort command first sends the output to a temporary file, then renames the temporary file to the output file, which overwrites the original file. The temporary output file is created in the same directory as the input files. In UNIX, the permissions for the file are reset according to the umask.

If this option is specified more than once, the last specification takes effect.

In UNIX, if you specify /dev/stdout (also written in lowercase /dev/stdout in Windows) for the output path name, the standard output is used.

If you specify a symbolic link for the output path name, the link is deleted and a new file is created.

-T temporary-file-directory

Specifies the directory to be used internally by the sort command for creating temporary files.

A temporary file is a work file that is used for sort and merge operations that cannot be performed entirely in memory.

If this option is specified more than once, the last specification takes effect.

If this option is omitted, the following directory is used:

Windows: common-application-data-folder\HITACHI\JP1AS\misc

UNIX: Directory specified in the TMPDIR environment variable (/var/tmp if the TMPDIR environment variable is not defined)

input-path-name

Specifies an input file. If this option omitted or specified as -, the standard input is read as the input. The standard input is also read when /dev/stdin (also written as lowercase /dev/stdin in Windows) is specified.

Sort key specifications

-b

Specifies that leading spaces are to be ignored in determining the start and end positions of a sort key specified with the -k option. The -b option is valid when a sort key is specified with the -k option. The -b option cannot be specified after the -k option.

-f

Specifies that lowercase letters are not to be distinguished from uppercase letters for purposes of sorting. Specifying this option more than once does not result in an error.

-n

Specifies numeric sorting, with the initial numeric character string in each line handled as a number.

The -n option takes precedence over the -f option. This option can be specified multiple times.

Numeric values are handled as follows.

A numeric value is a character string composed of the ASCII characters 0 (0x30) through 9 (0x39).
Leading whitespaces (0x20 and 0x09) and zeros (0x30) are ignored.
A minus sign (0x2d) is allowed to precede a numeric value.
No more than one decimal point can be specified.
A numeric value can include a digits grouping character in the integer portion.
The decimal point and the digits grouping character in the integer portion depend on the locale. Typically, the period (.) is used as the decimal point, and the comma (,) is used as the digits grouping character.
Anything other than a numeric character string is treated as 0.
Do not specify for a sort key a numeric value that consists of more than 61 digits in the integer or more than 61 digits following the decimal point in a decimal value.

-r

Specifies sorting in descending order. If this option is not specified, the default is sorting in ascending order. This option can be specified multiple times.

Specifying the field separator

-t field-delimiter

Specifies the field delimiter. The field delimiter is not considered part of the field in determining the offset for the sort key. Consecutive field delimiters denote an empty field between them. You cannot specify the same character for the record delimiter.

If the -t option is omitted, fields are delimited by one or more consecutive whitespaces (consecutive spaces do not denote an empty field between them). Leading spaces are considered part of the field in determining the offset for the sort key.

If you specify more than one character for the field delimiter or if you specify a multibyte character, only the initial byte is used as the field delimiter (which cannot be the same byte value as the record delimiter).

If -t is specified but no field delimiter value is specified, the option or file name that follows immediately will be interpreted as the field delimiter during processing. To prevent this from happening, you must make sure to specify a delimiter. It is an error to specify this option multiple times (sort: multiple field-delimiters).

Specifying a sort key

-k start-position[, end-position]

Specifies the start and end positions of the sort key. If you specify more than one sort key, then all lines with the same value for the first sort key can be distinguished on the basis of the next sort key.

If start-position is greater than end-position, or if a specified field does not exist, the command assumes that no sort key is specified and all comparisons against this sort key are considered to be the same.

Specify start-position and end-position in the following format:

field-position[.indent][bfnr]

field-position

Specifies the position of the field in the record. It is an error to specify a non-numeric value (sort: missing field number) or a negative value (sort: field numbers must be positive).

You cannot specify 0 for the start position.

If you specify 0 for the end position, the sort key is considered to extend to the end of the record.

The maximum value that can be specified for field-position is the maximum value for the int type (overflow will occur if you specify a greater value).

If you specify 0 for the field position of an end position, you cannot specify the indent described below.
indent

Specifies an offset within the field. It is an error to specify a non-numeric or a negative value (sort: missing offset).

The unit for the offset indentation is bytes. If the middle of a multibyte character is specified, evaluation occurs from that byte position.

You cannot specify 0 for the indent of the start position.

If you specify 0 for the indent of the end position, it is treated as though no indent were specified.

The maximum value that can be specified for indent is the maximum value for the int type (overflow will occur if you specify a greater value).

If you omit indent in start-position, the default is the first byte position of the field.

If you omit indent in end-position, the default is the last byte position of the field.
Sort key options

Specifies the b, f, n, or r option for sorting.

The b option ignores leading spaces in determining the start or end position.

The f option does not distinguish between lowercase and uppercase letters in sorting.

The n option sorts numerically, treating the initial numeric character string in each line as a number.

The r option sorts in descending order.

The b option specified in start-position is valid only for start-position, and the b option specified in end-position is valid only for end-position. If no indent is specified in end-position, specification of the b option is disabled. For the options other than b, it does not matter whether they are specified for start-position or end-position (they function the same regardless of where they are specified.

Other specifications

-u

Specifies that when multiple records have the same sort key value, only one of them is to be output. If the -u option is specified at the same time as the -c option, a check is performed for whether there are records with the same sort key value. Specifying this option more than once does not result in an error.

-z

Specifies that the record delimiter is to be changed to NULL (0x00). It is an error to specify this option more than once (sort: multiple record delimiters).

In Windows, end-of-line codes are removed from the input data when the input is read and then are added back during output. For this reason, binary files must not be used as the input.

To Page Top

Sort function

Sorting works by reading one or more input files and running comparisons against one or more sort keys. The -k option is used to specify fields as sort keys. The -t option is used to specify a field delimiter for separating each record into fields.

If no sort key is specified, the entire record is considered to constitute the sort key. Sort keys are compared on a byte-by-byte basis.

If there are multiple sort keys, the first sort key specified is compared. If a match is found, the next sort key is compared, and comparing of sort keys continues until no match is found.

If there is a match on all the sort keys, the entire record is then compared byte-by-byte. Output is produced in ascending order with the -r option or in descending order without the -r option.

To Page Top

Sort key options

Two types of options apply to sort keys. When you specify one or more keys, for each global option that can be enabled, there is a corresponding local option that can be specified within the -k option. The -fnrb options are specified for the sort command globally. There are also corresponding local versions of the fnbr options that are specified within the -k option to the sort command. The global options cannot be specified after the -k option.

b

This option is enabled globally for both the start position and the end position specified in the -k option. However, it is disabled for the end position if no indent is specified for the end position or if an indent of 0 is specified for the end position.

The -b option is valid only when the -k option is specified.

f | n | r

When any of these options is specified locally, the local specification replaces the global specifications for the applicable field.

The following example illustrates global options:

-bfnr -k 1,1 -k 2,2

In this case, the -bnfr options are enabled for both the first and second fields. They are applied to the first and second fields as follows:

-b option: Ignores leading blanks when determining the position of the sort key.
-f option: Does not distinguish between lowercase and uppercase letters when sorting; this is disabled if the -n option is specified.
-n option: Sorts numerically, handling the initial numeric character string in each line as a number.
-r option: Sorts in descending order.

The following examples illustrate the range of sort keys when the global -b option is not specified.

-k 1: The sort key extends from the first field through the end of the record.
-k 1,1: The sort key is the entire first field.
-k 1,5: The sort key extends from the initial byte of the first field through the final byte of the fifth field.
-k 1.2,5.11: The sort key extends from the second byte of the first field through the eleventh byte of the fifth field.
-k 2,1: No sort key applies, because the fields are specified in reverse order, from the second to the first.
-k 2.1b, 5.1b: The sort key extends from the first byte (excluding leading whitespaces) of the second field through the first byte (excluding leading whitespaces) of the fifth field.
-k 2.1b, 5.0b: The sort key extends from the first byte (excluding leading whitespaces) of the second field through the final byte of the fifth field.

To Page Top

Merge function

The merge function aligns and integrates the input data by comparing the records of each pre-sorted input file. Even if the input files are not actually sorted, merging proceeds on the assumption that they are sorted. The example below illustrates merging of file1 and file2, whose contents are as follows:

file1

AAA
DDD

file2

BBB
AAA

The following command will merge file1 and file2:

# sort -m file1 file2

The file will be as follows:

AAA   (1st line of file1)    <-- Result of comparing 1st line of file1 to 1st line of file2
BBB   (1st line of file2)    <-- Result of comparing 2nd line of file1 to 1st line of file2
AAA   (2nd line of file2)    <-- Result of comparing 2nd line of file1 to 2nd line of file2
DDD   (2nd line of file1)    <-- 2nd line of file1, because no 3rd line in file2

To Page Top

Option to not distinguish between lowercase and uppercase (-f option)

In the examples below, the following input records are sorted:

file1

a:B
A:b

Sort with lowercase and uppercase letters distinguished:

$ sort -t : -k 2,2 file1

The sort key is set to the second field, which is delimited by :.

The following is the output of sorting with lowercase and uppercase letters distinguished:

a:B
A:b

Because B is smaller than b, a:B is output first.

Sort without distinguishing between lowercase and uppercase letters:

$ sort -f -t : -k 2,2 file1

In this case, the -f option is specified.

The following is the output of sorting without distinguishing between lowercase and uppercase letters:

A:b
a:B

In this case, the second fields are regarded to be the same value because they are compared without distinguishing between lowercase and uppercase, as specified by the -f option. Because the sort key values are the same, the records are compared byte-by-byte in their entirety. As a result, since A is smaller than a, A:b is output first.

To Page Top

Return codes

Return code	Meaning
`0`	Normal termination
`1`	Normal termination The input data is not sorted (when the `-c` option is specified). Duplicated key values exist (when the `-c` and `-u` options are specified).
`2`	Error termination

Return code

Meaning

0

Normal termination

1

Normal termination

The input data is not sorted (when the -c option is specified).
Duplicated key values exist (when the -c and -u options are specified).

2

Error termination

To Page Top

Notes

If processing cannot be carried out in memory, it is performed using a temporary file. If the system runs out of disk space in the course of using the temporary file, the following error message is output:
```
sort: fwrite: No space left on device
```
If you receive this message, use the -T option to specify a disk with sufficient free space.
If you interrupt execution of the sort command, the temporary file might remain in the directory containing the output file that was specified with the -o option. In such a case, it must be deleted manually. Similarly, in cases where the -o option is omitted, the temporary file might remain and will have to be deleted manually.
References to whitespaces in the sort command include the tab character (\t) as well as the space character (0x20). Also, when the -z option is specified, \n (end-of-line) is also considered to be a whitespace.
When the record delimiter is missing from the last record of the input file, the result of the sort or merge operation is output with the record delimiter appended.
Processing can be carried out with the output end-of-line codes [CR] + [LF] or [LF], but in the case of UNIX, the [CR] is treated as data. Regardless of the format of the end-of-line codes in the input file, the output results will follow the end-of-line code conventions of the platform.
If a record cannot be accommodated, memory expands so that is can be stored. An error results if sufficient memory cannot be allocated.
The size of the sort buffer is 16 megabytes. If this amount of space is not adequate, a temporary file is created. Therefore, this command is not recommended for sorting large amounts of data.
If the sort command is cancelled during processing because sort or merge processing cannot be performed only in memory, a temporary file with the name shown below might remain. Delete such a temporary file manually.

In Windows:

sortuuuu.tmp (uuuu: any hexadecimal character string)

In UNIX:

sortppppp.XXXXXX (ppppp: process ID consisting of five or more digits; XXXXXX: any character string consisting of six characters)
If the -o option is specified and the sort command processing is cancelled, an intermediate file with the name shown below might remain. Delete such a temporary file manually.

In Windows:

first-three-characters-of-the-output-destination-path-nameuuuu.tmp (uuuu: any hexadecimal character string)

In UNIX:

output-destination-file-namepppppXXXXXX (ppppp: process ID consisting of five or more digits; XXXXXX: any character string consisting of six characters)
If multiple sort commands with the same output destination path name specified in the -o option are executed concurrently, they might terminate with an error. In such a case, the operation cannot be guaranteed.

To Page Top

Usage examples

The following shows the format of the files used in the examples below to illustrate the results of executing the sort command.

file1
```
yyyy:101
tttt:8
ppppppp:14
```

file2

cccccc:101
ggggg:31
rrrrrrrr:5
mmmmmmm:14

The files listed above are used as input files in the following examples.

Combine and sort the two text files.

$ sort file1 file2
cccccc:101
ggggg:31
mmmmmmm:14
ppppppp:14
rrrrrrrr:5
tttt:8
yyyy:101

Sort the two combined text files in descending order based on the numeric portion.

$ sort -t: -n -r -k 2 file1 file2
yyyy:101
cccccc:101
ggggg:31
ppppppp:14
mmmmmmm:14
tttt:8
rrrrrrrr:5

Merge three files, using the first field as the sort key.

$ cat s1.txt
AAA s1
DDD s1
 
$ cat s2.txt
BBB s2
AAA s2
 
$ cat s3.txt
CCC s3
111 s3
 
$ sort -m -k 1,1 s1.txt s2.txt s3.txt
AAA s1
BBB s2
AAA s2
 
CCC s3
111 s3
 
DDD s1
 
$

Sort data for which the keys are the same.

$ cat zr1.txt
aaa:999
$ cat zr2.txt
bbb:999
 
$ sort -k 2,2 -t : zr2.txt zr1.txt
 
aaa:999
bbb:999
$

Sort the first field numerically and the second field as a character string.
- Input command
```
sort -t : -k 1n, 1  -k 2,2
```
- Input data
```
0010:aaa
10:AAA
-1:aaa
-1.00:ZZZ
1:zzz
```
- Execution results
```
-1.00:ZZZ
-1:aaa
1:zzz
10:AAA
0010:aaa
```
Sort from the beginning of the third field through the end of the line without distinguishing between lowercase and uppercase, and secondarily with the second field in descending order. In this example, because the second field is specified with a local option, it does not inherit the global options, so lowercase is distinguished from uppercase in the second field.

Input command
```
sort -t : -f -k 3  -k 2,2r
```
Input data
```
aaa:aaa:cccc
aaa:AAA:cccc
aaa:aaa:AAAA
aaa:AAA:aaaa
aaa:aaa:BBBB
aaa:AAA:bbbb
```
Execution results
```
aaa:aaa:AAAA
aaa:AAA:aaaa
aaa:aaa:BBBB
aaa:AAA:bbbb
aaa:aaa:cccc
aaa:AAA:cccc
```

Display an option error message.

Windows example

C:\TEMP>%ADSH_OSCMD_DIR%\sort -w
sort: illegal option -- w
usage: sort [-cm][-bfnruz] [-k field1[, field2]] [-o output]
       [-T dir] [-t char] [file ...]

Linux example

$ sort -w
sort: invalid option -- w
usage: sort [-cm][-bfnruz] [-k field1[, field2]] [-o output]
        [-T dir] [-t char] [file ...]

AIX example

$ sort -w
sort: illegal option -- w
usage: sort [-cm][-bfnruz] [-k field1[, field2]] [-o output]
        [-T dir] [-t char] [file ...]

Display the message that is output when you specify a directory for the input file.
```
$ ./sort dir01
sort: dir01: Is a directory
```
Display the message that is output when you specify a nonexistent file as an input file.
```
$ ./sort xxxx
sort: xxxx: No such file or directory
```

Display the message that is output when you specify a temporary file directory that does not exist.

Windows example

C:\TEMP>%ADSH_OSCMD_DIR%\sort -mTxxx s0.txt s0.txt s0.txt s0.txt s0.txt s0.t
xt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt
s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt
sort: xxx\sort: The directory name is invalid.

Linux example

$ ./sort -mT xxxx s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt
sort: xxxx/sort.SDm1yr: No such file or directory

AIX example

$ ./sort -mT xxxx s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt s0.txt
sort: xxxx/sort.XXXXXX: No such file or directory

Display the message that is output when you specify an invalid field position.
```
C:\TEMP>%ADSH_OSCMD_DIR%\sort -k xx
sort: missing field number
```

Display the message that is output when you specify an invalid field position.

C:\TEMP>%ADSH_OSCMD_DIR%\sort -k 0 s0.txt
sort: field numbers must be positive

Display the message that is output when you specify an invalid indent for the field position.
```
C:\TEMP>%ADSH_OSCMD_DIR%\sort -k 1.0 s0.txt
sort: illegal offset
```

To Page Top