Building JSON from Raw Input
The command jq
is great for processing JSON. It can be used to convert,
extract, filter and transform JSON. Its utility doesn’t stop if you are
starting with something other than JSON, though. This article walks through
several examples on constructing JSON from plain text. It starts with a simple
example to demonstrate object construction. It progresses through several more
complex scenarios to help you become familiar with some of the core constructs
of the command and the syntax. It then closes with a more real-world example
that combines everything learned in the earlier exercises.
JSON From (Almost) Nothing
jq
can create JSON from almost nothing. Importantly for the concept of
constructing JSON, it can be created from non-JSON. Take the following example.
echo "Hello World" | jq -R '{greeting: .}'
This sends the text Hello World
, which jq
reads from stdin
. The -R
indicates the input is “raw”, that is, treats each line as a string instead of
JSON. The contents between the single quotes is the “filter”. This is sort of
like a program written in jq
’s language. The opening {
and closing }
indicate construction, specifically, object construction. The elements are
constructed using key-value pairs. The value is any valid jq
expression. In
this example, the value is .
, which is the “identity” filter. The .
filter
is simply the same value that was passed into the expression - Hello World
.
The output is as follows.
{
"greeting": "Hello World"
}
It is also possible to construct arrays. Specifying a filter with the form
[f]
where f
is a valid filter creates an array.
echo "Hello World" | jq -R '{greetings: [., . + ", again"]}'
This outputs the following.
{
"greetings": [
"Hello World",
"Hello World, again"
]
}
This example introduces another operator, the ,
. When constructing arrays,
the comma operater can be used to specify multiple values. It also demonstrates
string contatenation, a subject for another time.
JSON Arrays From Non-JSON Input
In the previous example we showed a contrived example of creating a JSON array
using []
construction. A more common scenario might be when you have input
data that represents multiple values but is not syntactically represented as a
JSON array. Let’s suppose you have the input 1 2 3 4
and want to calculate
the sum. This operation can be accomplished using jq
as follows.
echo "1 2 3 4" | jq --slurp 'add'
Here we already have the numbers as JSON, but they are not a valid an array.
They are not enclosed by []
, nor are they separated by commas. The --slurp
option reads all input values into an array and passing it as a single value to
the filter. We can see this option at work by eliminated the filter, and just
output the JSON value jq
is actually passing to add
.
echo "1 2 3 4" | jq --slurp
This will generate the following output.
[
1,
2,
3,
4
]
In this scenario array construction is not required. The option --slurp
does
the heavy lifting for us.
The key takeaway in this example is as follows. Array construction, depending
on the input, may not the appropriate method of obtaining an array. Instead,
when starting with multiple values not specified as a JSON array, a possible
solution, depending on the input, may require the --slurp
argument. Many of
the interesting operations with jq
require an array or an object, so this
technique is indespensible when not starting with JSON that needs to be treated
as a JSON array.
Extracting JSON from Comma Separated Values
You will seldom encounter a list of numbers neatly stored as individual values in a file. A more likely scenario is you will start with data stored in a file, perhaps as comma separated values (CSV). Let’s say like the previous example, you need to find the sum of a collection of numbers, but instead the source is a CSV files, and the numbers are stored in one of the columns. For example, let’s say you have the following CSV file.
00000,location 0, 34
00001,location 0, 37
00002,location 1, 36
00003,location 2, 35
00004,location 1, 139
00005,location 0, 39
00006,location 3, 34
00007,location 1, 32
In this example, the last column contains the numbers that need to be added. For the sake of exploring more ground, let’s also stipulate that each column is the same width in every row. (Splitting the columns will be explored in a later example.) From this we can infer the start of the number (last) column always starts at the same index, even though the number itself may not take up with entire width of the column.
To get the last column as numbers, we can combine the two previous examples to
get an array of strings, piping the raw output back into jq
with the
--slurp
option.
cat numbers-to-add.csv | jq --raw | jq --slurp
This will output the following list of strings.
[
"00000,location 0, 34",
"00001,location 0, 37",
"00002,location 1, 36",
"00003,location 2, 35",
"00004,location 1, 139",
"00005,location 0, 39",
"00006,location 3, 34",
"00007,location 1, 32"
]
Now that we have an array of strings, combined with the fact that we know the index at which the number column starts, each value can be mapped to contain only the number column.
cat numbers-to-add.csv | jq --raw | jq --slurp 'map(.[17:])'
The filter .
produces a string. In jq
strings ave the property, like
arrays, of allowing the slice filter, which is a colon-separated pair of
numbers in brackets indicating the start and end indices of the slice. Similar
to other languages that have slices, either of the which can be omitted to
indicate the start or end of the array. In this example the start index is 17
and the end is omitted and thus the end of the string. This produces the
following JSON.
[
" 34",
" 37",
" 36",
" 35",
" 139",
" 39",
" 34",
" 32"
]
We now have the last column as a string, but we need them as numbers such that
they can be added. This can be accomplished using the tonumber
function.
cat numbers-to-add.csv | jq --raw | jq --slurp 'map(.[17:] | tonumber)'
[
34,
37,
36,
35,
139,
39,
34,
32
]
The final step is to pipe the array to the add
function.
cat numbers-to-add.csv | jq --raw | jq --slurp 'map(.[17:] | tonumber) | add'
This produces 386
.
It would be a shame to leave this example without extending it to also calculate the average. However, to achieve this, one final topic needs to be covered: variables. This will take the exercise away from the main topic of this article, though, so the solution will be presented without further explanation.
cat numbers-to-add.csv | jq -R | jq --slurp 'map(.[17:] | tonumber) | length as $len | add / $len'
This section detailed several techniques for constructing and processing JSON
from CSVs. Using the --raw
option, each line of the file gets converted to a
JSON string. This command’s output can then piped to jq
using the --slurp
option, which wraps the input in a JSON array. The data of interest can then be
extracted from each value using map
and the appropriate filter. Finally, when
a type conversion is required, the to<type>
function can be used to convert
from a string to the required type. However, in this example, we only required
one column from each row. Converting each row to JSON objects with multiple
elements extracted from multiple columns is explored in the next section.
JSON From Comma Separated Values
Data from a CSV file can be converted to JSON. Let’s say you have the following JSON.
00000,place 000,18391
00001,place 001,18923
00002,"place" 002,18322
There are a few interesting details to note. The first thing to note is there is a mix of string-like and number-like values. Another thing to note is one of the values contains quotes. We are assuming here that no values contain commas, and no values are quoted. It is not uncommon to encounter data that is not constructed from a set of well-documented production rules. Let’s further stipulate the last column should be interpreted as an integer.
In this example we are going to combine raw input and slurp into a pipeline of
commands. Get the file CSV Input File. Cat the file and pipe
the output to jq
with the --raw
option.
cat lines-to-json.csv | jq --raw
This will output the following list of strings.
"00000,place 000,18391"
"00001,place 001,18923"
"00002,\"place\" 002,18322"
Now we can we the --slurp
option to make this an array of strings.
cat lines-to-json.csv | jq --raw | jq --slurp
[
"00000,place 000,18391",
"00001,place 001,18923",
"00002,\"place\" 002,18322"
]
The input in a now in a form that can easily be manipulated using jq
filters.
First, each value of the array must be mapped to an array of strings, resulting
in an array of an array of strings. The map
function takes a filter that is
applied to each value and produces and array. The split(str)
function takes a
string and produces an array of strings using str
as a separator.
cat lines-to-json.csv | jq -R | jq --slurp 'map(split(","))'
This command produces the following.
[
[
"00000",
"place 000",
"18391"
],
[
"00001",
"place 001",
"18923"
],
[
"00002",
"\"place\" 002",
"18322"
]
]
A JSON object can then be constructed using the indices of each array of
strings. So instead of mapping from an array of an array of strings, the filter
passed to map
will map each string value to an array of strings to a JSON
object.
cat lines-to-json.csv | jq -R | jq --slurp 'map(split(",") | {id: .[0], name: .[1], value: .[2] })'
This produces an array of JSON objects.
[
{
"id": "00000",
"name": "place 000",
"value": "18391"
},
{
"id": "00001",
"name": "place 001",
"value": "18923"
},
{
"id": "00002",
"name": "\"place\" 002",
"value": "18322"
}
]
We are nearly there. Note that the value is still a string. This value must be
represented as an integer. As in the previous example, the value can be piped
to the function tonumber
.
cat lines-to-json.csv | jq -R | jq --slurp 'map(split(",") | {id: .[0], name: .[1], value: (.[2] | tonumber) })'
With this final addition, the output is as follows.
[
{
"id": "00000",
"name": "place 000",
"value": 18391
},
{
"id": "00001",
"name": "place 001",
"value": 18923
},
{
"id": "00002",
"name": "\"place\" 002",
"value": 18322
}
]
This example shows how jq
can be used to construct JSON from CSVs. There are,
however, some caveats. The solution assumes the input data is relatively
well-formed. Splitting CSV input using a comma is a naive solution, working
only on input data that doesn’t also contain commas within values (that is,
commas that are not delimiters). This is not unique to this solution; the same
approach in other tools are equally naive. It may therefore be desirable to use
other tools with more robust CSV parsing capabilities, though similarly robust
solutions could likely be produced similarly in jq
. Regardless, the
techniques of object and array construction are applicable in other contexts.
Summary
This article discussed creating JSON output from non-JSON input. In doing so it
covered object and array construction, mapping, type conversion and the split
function. It also covered using the --raw
option when starting with non-JSON
data, and the --slurp
options when indiviual values need to be wrapped as an
array.