SORT(1) | User Commands | SORT(1) |
sort - sort, merge, or sequence check text files
/usr/bin/sort [-bcdfimMnru] [-k keydef] [-o output]
[-S kmem] [-t char] [-T directory] [-y [kmem]]
[-z recsz] [+pos1 [-pos2]] [file]...
/usr/xpg4/bin/sort [-bcdfimMnru] [-k keydef] [-o output]
[-S kmem] [-t char] [-T directory] [-y [kmem]]
[-z recsz] [+pos1 [-pos2]] [file]...
The sort command sorts lines of all the named files together and writes the result on the standard output.
Comparisons are based on one or more sort keys extracted from each line of input. By default, there is one sort key, the entire input line. Lines are ordered according to the collating sequence of the current locale.
The following options alter the default behavior:
-c
-c
-m
-o output
-S kmem
-T directory
-u
-y kmem
-z recsz
The default sort order depends on the value of LC_COLLATE. If LC_COLLATE is set to C, sorting is in ASCII order. If LC_COLLATE is set to en_US, sorting is case insensitive except when the two strings are otherwise equal and one has an uppercase letter earlier than the other. Other locales have other sort orders.
The following options override the default ordering rules. When ordering options appear independent of any key field specifications, the requested field ordering rules are applied globally to all sort keys. When attached to a specific key (see Sort Key Options), the specified ordering options override all global ordering options for that key. In the obsolescent forms, if one or more of these options follows a +pos1 option, it affects only the key field specified by that preceding option.
-d
-f
-i
-M
-n
-r
The treatment of field separators can be altered using the following options:
-b
-t char
Sort keys can be specified using the options:
-k keydef
-k field_start [type] [,field_end [type] ]
where:
field_start and field_end
type
When there are multiple key fields, later keys are compared only after all earlier keys compare equal. Except when the -u option is specified, lines that otherwise compare equal are ordered as if none of the options -d, -f, -i, -n or -k were present (but with -r still in effect, if it was specified) and with all bytes in the lines significant to the comparison.
The notation:
-k field_start[type][,field_end[type]]
defines a key field that begins at field_start and ends at field_end inclusive, unless field_start falls beyond the end of the line or after field_end, in which case the key field is empty. A missing field_end means the last character of the line.
A field comprises a maximal sequence of non-separating characters and, in the absence of option -t, any preceding field separator.
The field_start portion of the keydef option-argument has the form:
field_number[.first_character]
Fields and characters within fields are numbered starting with 1. field_number and first_character, interpreted as positive decimal integers, specify the first character to be used as part of a sort key. If .first_character is omitted, it refers to the first character of the field.
The field_end portion of the keydef option-argument has the form:
field_number[.last_character]
The field_number is as described above for field_start. last_character, interpreted as a non-negative decimal integer, specifies the last character to be used as part of the sort key. If last_character evaluates to zero or .last_character is omitted, it refers to the last character of the field specified by field_number.
If the -b option or b type modifier is in effect, characters within a field are counted from the first non-blank character in the field. (This applies separately to first_character and last_character.)
[+pos1 [-pos2]]
pos1 and pos2 each have the form m.n optionally followed by one or more of the flags bdfiMnr. A starting position specified by +m.n is interpreted to mean the n+1st character in the m+1st field. A missing .n means .0, indicating the first character of the m+1st field. If the b flag is in effect n is counted from the first non-blank in the m+1st field; +m.0b refers to the first non-blank character in the m+1st field.
A last position specified by −m.n is interpreted to mean the nth character (including separators) after the last character of the mth field. A missing .n means .0, indicating the last character of the mth field. If the b flag is in effect n is counted from the last leading blank in the m+1st field; −m.1b refers to the first non-blank in the m+1st field.
The fully specified +pos1 −pos2 form with type modifiers T and U:
+w.xT -y.zU
is equivalent to:
undefined (z==0 & U contains b & -t is present) -k w+1.x+1T,y.0U (z==0 otherwise) -k w+1.x+1T,y+1.zU (z > 0)
Implementations support at least nine occurrences of the sort keys (the -k option and obsolescent +pos1 and −pos2) which are significant in command line order. If no sort key is specified, a default sort key of the entire line is used.
The following operand is supported:
file
See largefile(7) for the description of the behavior of sort when encountering files greater than or equal to 2 Gbyte ( 2^31 bytes).
In the following examples, first the preferred and then the obsolete way of specifying sort keys are given as an aid to understanding the relationship between the two forms.
Example 1 Sorting with the Second Field as a sort Key
Either of the following commands sorts the contents of infile with the second field as the sort key:
example% sort -k 2,2 infile example% sort +1 −2 infile
Example 2 Sorting in Reverse Order
Either of the following commands sorts, in reverse order, the contents of infile1 and infile2, placing the output in outfile and using the second character of the second field as the sort key (assuming that the first character of the second field is the field separator):
example% sort -r -o outfile -k 2.2,2.2 infile1 infile2 example% sort -r -o outfile +1.1 −1.2 infile1 infile2
Example 3 Sorting Using a Specified Character in One of the Files
Either of the following commands sorts the contents of infile1 and infile2 using the second non-blank character of the second field as the sort key:
example% sort -k 2.2b,2.2b infile1 infile2 example% sort +1.1b −1.2b infile1 infile2
Example 4 Sorting by Numeric User ID
Either of the following commands prints the passwd(5) file (user database) sorted by the numeric user ID (the third colon-separated field):
example% sort -t : -k 3,3n /etc/passwd example% sort -t : +2 −3n /etc/passwd
Example 5 Printing Sorted Lines Excluding Lines that Duplicate a Field
Either of the following commands prints the lines of the already sorted file infile, suppressing all but one occurrence of lines having the same third field:
example% sort -um -k 3.1,3.0 infile example% sort -um +2.0 −3.0 infile
Example 6 Sorting by Host IP Address
Either of the following commands prints the hosts(5) file (IPv4 hosts database), sorted by the numeric IP address (the first four numeric fields):
example$ sort -t . -k 1,1n -k 2,2n -k 3,3n -k 4,4n /etc/hosts example$ sort -t . +0 -1n +1 -2n +2 -3n +3 -4n /etc/hosts
Since '.' is both the field delimiter and, in many locales, the decimal separator, failure to specify both ends of the field leads to results where the second field is interpreted as a fractional portion of the first, and so forth.
See environ(7) for descriptions of the following environment variables that affect the execution of sort: LANG, LC_ALL, LC_COLLATE, LC_MESSAGES, and NLSPATH.
LC_CTYPE
LC_NUMERIC
The following exit values are returned:
0
1
>1
/var/tmp/stm???
See attributes(7) for descriptions of the following attributes:
ATTRIBUTE TYPE | ATTRIBUTE VALUE |
CSI | Enabled |
ATTRIBUTE TYPE | ATTRIBUTE VALUE |
CSI | Enabled |
Interface Stability | Standard |
comm(1), join(1), uniq(1), nl_langinfo(3C), strftime(3C), hosts(5), passwd(5), attributes(7), environ(7), largefile(7), standards(7)
Comments and exits with non-zero status for various trouble conditions (for example, when input lines are too long), and for disorders discovered under the -c option.
When the last line of an input file is missing a new-line character, sort appends one, prints a warning message, and continues.
sort does not guarantee preservation of relative line ordering on equal keys.
One can tune sort performance for a specific scenario using the -S option. However, one should note in particular that sort has greater knowledge of how to use a finite amount of memory for sorting than the virtual memory system. Thus, a sort invoked to request an extremely large amount of memory via the -S option could perform extremely poorly.
As noted, certain of the field modifiers (such as -M and -d) cause the interpretation of input data to be done with reference to locale-specific settings. The results of this interpretation can be unexpected if one's expectations are not aligned with the conventions established by the locale. In the case of the month keys, sort does not attempt to compensate for approximate month abbreviations. The precise month abbreviations from nl_langinfo(3C) or strftime(3C) are the only ones recognized. For printable or dictionary order, if these concepts are not well-defined by the locale, an empty sort key might be the result, leading to the next key being the significant one for determining the appropriate ordering.
November 19, 2001 | OmniOS |