What is the difference between "sort -u" and "sort | uniq"?
Everywhere I see someone needing to get a sorted, unique list, they always pipe to
sort | uniq. I've never seen any examples where someone uses
sort -uinstead. Why not? What's the difference, and why is it better to use uniq than the unique flag to sort?
sort | uniqexisted before
sort -u, and is compatible with a wider range of systems, although almost all modern systems do support
-u-- it's POSIX. It's mostly a throwback to the days when
sort -udidn't exist (and people don't tend to change their methods if the way that they know continues to work, just look at
The two were likely merged because removing duplicates within a file requires sorting (at least, in the standard case), and is an extremely common use case of sort. It is also faster internally as a result of being able to do both operations at the same time (and due to the fact that it doesn't require IPC between
sort). Especially if the file is big,
sort -uwill likely use fewer intermediate files to sort the data.
On my system I consistently get results like this:
$ dd if=/dev/urandom of=/dev/shm/file bs=1M count=100 100+0 records in 100+0 records out 104857600 bytes (105 MB) copied, 8.95208 s, 11.7 MB/s $ time sort -u /dev/shm/file >/dev/null real 0m0.500s user 0m0.767s sys 0m0.167s $ time sort /dev/shm/file | uniq >/dev/null real 0m0.772s user 0m1.137s sys 0m0.273s
It also doesn't mask the return code of
sort, which may be important (in modern shells there are ways to get this, for example,
$PIPESTATUSarray, but this wasn't always true).
I tend to use `sort | uniq` because 9 times out of 10, I'm actually piping to `uniq -c`.
Note that `sort -u` was part of 7th Edition UNIX, circa 1979. Versions of `sort` without support for `-u` are truly archaic — or were written without attention to the de facto standard before POSIX's de jure standard. See also Stack Overflow Sort & uniq in Linux shell from 2010.
+1 because of `ip`. It's 2016 and this post in 2013, but I only know about `ip` command now.
+1 for "9 times out 10 I'm actually piping to `uniq -c` " (and maybe piping once more to `sort -nr | head` ). I was wondering what is the equivalent to `sort | uniq` in Vim when I found out that Vim has `:sort u` command. And TIL `sort -u` exists as well.
Note that there is a difference when using `sort -n | uniq` vs. `sort -n -u`. For example trailing and leading whitespaces will be seen as duplicates by `sort -n -u` but not by the former! `echo -e 'test \n test' | sort -n -u` returns `test`, but `echo -e 'test \n test' | sort -n | uniq` returns both lines.
Another problem with `sort -n -u` becomes apparent with this `echo -e '14a-foo\n14b-bar\n15' | sort -n -u` ... i.e. the `14b-bar` will be deleted! Not sure if this is a bug or not, though. This does not happen with with `sort -n | uniq`. Imo you should never use `sort -n -u`, it only leads to trouble.