Sorting Data from the CLI
Sometimes you just need to put a bit of text in order. This could me
numerically or alphbetically. If you need to do this from a shell script or the
terminal, the sort
command and its friends can help. According to Wikipedia,
the command was originally written by none other than Ken Thompson and appeared
in Version 1 Unix. It first appeared even earlier than that, first appearing in
Multics. So its safe to say you’ll find it installed on most systems. Here are
some examples, starting from the basics and then we will head to just slightly
more complicated.
Sort the lines of a file
Let’s say we start with a file called unsorted.txt
with the following contents.
2,this
3,that
11,last
1,original
44,larger than last
44,duplicate
The lines of this file can be sorted simply by passing the name of the file as the sole argument.
sort unsorted.txt
11,last
1,original
2,this
3,that
44,duplicate
44,larger than last
This isn’t too impressive, as by default it is sorted lexigraphically and not
numerically. For example, 11
comes before 1
, which might not be what you
want. Before we fix this issue, let’s reverse the order using the -r
switch
because we want to see the larger numbers at the top of the list.
sort -r unsorted.txt
44,larger than last
44,duplicate
3,that
2,this
1,original
11,last
It still isn’t what you want, because we want to sort by the numeric value, not
the alphabetical value. Add the -n
switch to indicate sort
should use the
numeric order.
sort -r -n unsorted.txt
44,larger than last
44,duplicate
11,last
3,that
2,this
1,original
Now we are getting somewhere.
Counting and Sorting
Maybe we don’t have numeric data to start with. Let’s say our file called
unsorted-names.txt
contains the following and we want to count the number of
times a name appears.
cat unsorted-names.txt
bill
arun
michael
ben
alfred
lakshmi
joshua
jeff
jeff
arun
sarah
jeff
We can use a command that is often used with sort
called uniq
. Let’s take a
look at the output of uniq
without the benefit of first sorting the contents.
uniq unsorted-names.txt
bill
arun
michael
ben
alfred
lakshmi
joshua
jeff
arun
jeff
As you can see, uniq
only keeps track of adjacent duplicate lines - jeff
appears twice. To get the truly unique values, the contents first need to be
sorted and then piped to uniq
.
sort unsorted-names.txt | uniq
alfred
arun
ben
bill
jeff
joshua
lakshmi
michael
We still haven’t counted anything, though. To get the count, add the -c
switch to uniq
.
sort unsorted-names.txt | uniq -c
1 alfred
2 arun
1 ben
1 bill
3 jeff
1 joshua
1 lakshmi
1 michael
Now we can combine what we demonstrated at the beginning of this post with the previous example to acheive the following.
sort unsorted-names.txt | uniq -c | sort -r
2 jeff
2 arun
1 michael
1 lakshmi
1 joshua
1 bill
1 ben
1 alfred
Here the results are sorted, counted and then sorted again in numerically descending order.