Hypercipient

Sorting Data from the CLI

Sometimes you just need to put a bit of text in order. This could me numerically or alphbetically. If you need to do this from a shell script or the terminal, the sort command and its friends can help. According to Wikipedia, the command was originally written by none other than Ken Thompson and appeared in Version 1 Unix. It first appeared even earlier than that, first appearing in Multics. So its safe to say you’ll find it installed on most systems. Here are some examples, starting from the basics and then we will head to just slightly more complicated.

Sort the lines of a file

Let’s say we start with a file called unsorted.txtwith the following contents.

2,this
3,that
11,last
1,original
44,larger than last
44,duplicate

The lines of this file can be sorted simply by passing the name of the file as the sole argument.

sort unsorted.txt

11,last
1,original
2,this
3,that
44,duplicate
44,larger than last

This isn’t too impressive, as by default it is sorted lexigraphically and not numerically. For example, 11 comes before 1, which might not be what you want. Before we fix this issue, let’s reverse the order using the -r switch because we want to see the larger numbers at the top of the list.

sort -r unsorted.txt

44,larger than last
44,duplicate
3,that
2,this
1,original
11,last

It still isn’t what you want, because we want to sort by the numeric value, not the alphabetical value. Add the -n switch to indicate sort should use the numeric order.

sort -r -n unsorted.txt

44,larger than last
44,duplicate
11,last
3,that
2,this
1,original

Now we are getting somewhere.

Counting and Sorting

Maybe we don’t have numeric data to start with. Let’s say our file called unsorted-names.txt contains the following and we want to count the number of times a name appears.

cat unsorted-names.txt
bill
arun
michael
ben
alfred
lakshmi
joshua
jeff
jeff
arun
sarah
jeff

We can use a command that is often used with sort called uniq. Let’s take a look at the output of uniq without the benefit of first sorting the contents.

uniq unsorted-names.txt
bill
arun
michael
ben
alfred
lakshmi
joshua
jeff
arun
jeff

As you can see, uniq only keeps track of adjacent duplicate lines - jeff appears twice. To get the truly unique values, the contents first need to be sorted and then piped to uniq.

sort unsorted-names.txt | uniq
alfred
arun
ben
bill
jeff
joshua
lakshmi
michael

We still haven’t counted anything, though. To get the count, add the -c switch to uniq.

sort unsorted-names.txt | uniq -c
      1 alfred
      2 arun
      1 ben
      1 bill
      3 jeff
      1 joshua
      1 lakshmi
      1 michael

Now we can combine what we demonstrated at the beginning of this post with the previous example to acheive the following.

sort unsorted-names.txt | uniq -c | sort -r
      2 jeff
      2 arun
      1 michael
      1 lakshmi
      1 joshua
      1 bill
      1 ben
      1 alfred

Here the results are sorted, counted and then sorted again in numerically descending order.

Tags: