Building JSON from Raw Input

July 22, 2024

The command jq is great for processing JSON. It can be used to convert, extract, filter and transform JSON. Its utility doesn’t stop if you are starting with something other than JSON, though. This article walks through several examples on constructing JSON from plain text. It starts with a simple example to demonstrate object construction. It progresses through several more complex scenarios to help you become familiar with some of the core constructs of the command and the syntax. It then closes with a more real-world example that combines everything learned in the earlier exercises.

JSON From (Almost) Nothing

jq can create JSON from almost nothing. Importantly for the concept of constructing JSON, it can be created from non-JSON. Take the following example.

echo "Hello World" | jq -R '{greeting: .}'

This sends the text Hello World, which jq reads from stdin. The -R indicates the input is “raw”, that is, treats each line as a string instead of JSON. The contents between the single quotes is the “filter”. This is sort of like a program written in jq’s language. The opening { and closing } indicate construction, specifically, object construction. The elements are constructed using key-value pairs. The value is any valid jq expression. In this example, the value is . , which is the “identity” filter. The . filter is simply the same value that was passed into the expression - Hello World.

The output is as follows.

{
  "greeting": "Hello World"
}

It is also possible to construct arrays. Specifying a filter with the form [f] where f is a valid filter creates an array.

echo "Hello World" | jq -R '{greetings: [., . + ", again"]}'

This outputs the following.

{
  "greetings": [
    "Hello World",
    "Hello World, again"
  ]
}

This example introduces another operator, the ,. When constructing arrays, the comma operater can be used to specify multiple values. It also demonstrates string contatenation, a subject for another time.

JSON Arrays From Non-JSON Input

In the previous example we showed a contrived example of creating a JSON array using [] construction. A more common scenario might be when you have input data that represents multiple values but is not syntactically represented as a JSON array. Let’s suppose you have the input 1 2 3 4 and want to calculate the sum. This operation can be accomplished using jq as follows.

echo "1 2 3 4" | jq --slurp 'add'

Here we already have the numbers as JSON, but they are not a valid an array. They are not enclosed by [], nor are they separated by commas. The --slurp option reads all input values into an array and passing it as a single value to the filter. We can see this option at work by eliminated the filter, and just output the JSON value jq is actually passing to add.

echo "1 2 3 4" | jq --slurp

This will generate the following output.

[
  1,
  2,
  3,
  4
]

In this scenario array construction is not required. The option --slurp does the heavy lifting for us.

The key takeaway in this example is as follows. Array construction, depending on the input, may not the appropriate method of obtaining an array. Instead, when starting with multiple values not specified as a JSON array, a possible solution, depending on the input, may require the --slurp argument. Many of the interesting operations with jq require an array or an object, so this technique is indespensible when not starting with JSON that needs to be treated as a JSON array.

Extracting JSON from Comma Separated Values

You will seldom encounter a list of numbers neatly stored as individual values in a file. A more likely scenario is you will start with data stored in a file, perhaps as comma separated values (CSV). Let’s say like the previous example, you need to find the sum of a collection of numbers, but instead the source is a CSV files, and the numbers are stored in one of the columns. For example, let’s say you have the following CSV file.

00000,location 0,   34
00001,location 0,   37
00002,location 1,   36
00003,location 2,   35
00004,location 1,  139
00005,location 0,   39
00006,location 3,   34
00007,location 1,   32

In this example, the last column contains the numbers that need to be added. For the sake of exploring more ground, let’s also stipulate that each column is the same width in every row. (Splitting the columns will be explored in a later example.) From this we can infer the start of the number (last) column always starts at the same index, even though the number itself may not take up with entire width of the column.

To get the last column as numbers, we can combine the two previous examples to get an array of strings, piping the raw output back into jq with the --slurp option.

cat numbers-to-add.csv | jq --raw | jq --slurp

This will output the following list of strings.

[
  "00000,location 0,   34",
  "00001,location 0,   37",
  "00002,location 1,   36",
  "00003,location 2,   35",
  "00004,location 1,  139",
  "00005,location 0,   39",
  "00006,location 3,   34",
  "00007,location 1,   32"
]

Now that we have an array of strings, combined with the fact that we know the index at which the number column starts, each value can be mapped to contain only the number column.

cat numbers-to-add.csv | jq --raw  | jq --slurp 'map(.[17:])'

The filter . produces a string. In jq strings ave the property, like arrays, of allowing the slice filter, which is a colon-separated pair of numbers in brackets indicating the start and end indices of the slice. Similar to other languages that have slices, either of the which can be omitted to indicate the start or end of the array. In this example the start index is 17 and the end is omitted and thus the end of the string. This produces the following JSON.

[
  "   34",
  "   37",
  "   36",
  "   35",
  "  139",
  "   39",
  "   34",
  "   32"
]

We now have the last column as a string, but we need them as numbers such that they can be added. This can be accomplished using the tonumber function.

cat numbers-to-add.csv | jq --raw  | jq --slurp 'map(.[17:] | tonumber)'

The final step is to pipe the array to the add function.

cat numbers-to-add.csv | jq --raw  | jq --slurp 'map(.[17:] | tonumber) | add'

This produces 386.

It would be a shame to leave this example without extending it to also calculate the average. However, to achieve this, one final topic needs to be covered: variables. This will take the exercise away from the main topic of this article, though, so the solution will be presented without further explanation.

cat numbers-to-add.csv | jq -R  | jq --slurp 'map(.[17:] | tonumber) | length as $len | add / $len'

This section detailed several techniques for constructing and processing JSON from CSVs. Using the --raw option, each line of the file gets converted to a JSON string. This command’s output can then piped to jq using the --slurp option, which wraps the input in a JSON array. The data of interest can then be extracted from each value using map and the appropriate filter. Finally, when a type conversion is required, the to<type> function can be used to convert from a string to the required type. However, in this example, we only required one column from each row. Converting each row to JSON objects with multiple elements extracted from multiple columns is explored in the next section.

JSON From Comma Separated Values

Data from a CSV file can be converted to JSON. Let’s say you have the following JSON.

00000,place 000,18391
00001,place 001,18923
00002,"place" 002,18322

There are a few interesting details to note. The first thing to note is there is a mix of string-like and number-like values. Another thing to note is one of the values contains quotes. We are assuming here that no values contain commas, and no values are quoted. It is not uncommon to encounter data that is not constructed from a set of well-documented production rules. Let’s further stipulate the last column should be interpreted as an integer.

In this example we are going to combine raw input and slurp into a pipeline of commands. Get the file CSV Input File. Cat the file and pipe the output to jq with the --raw option.

cat lines-to-json.csv | jq --raw

This will output the following list of strings.

"00000,place 000,18391"
"00001,place 001,18923"
"00002,\"place\" 002,18322"

Now we can we the --slurp option to make this an array of strings.

cat lines-to-json.csv | jq --raw | jq --slurp

[
  "00000,place 000,18391",
  "00001,place 001,18923",
  "00002,\"place\" 002,18322"
]

The input in a now in a form that can easily be manipulated using jq filters. First, each value of the array must be mapped to an array of strings, resulting in an array of an array of strings. The map function takes a filter that is applied to each value and produces and array. The split(str) function takes a string and produces an array of strings using str as a separator.

cat lines-to-json.csv | jq -R | jq --slurp 'map(split(","))'

This command produces the following.

[
  [
    "00000",
    "place 000",
    "18391"
  ],
  [
    "00001",
    "place 001",
    "18923"
  ],
  [
    "00002",
    "\"place\" 002",
    "18322"
  ]
]

A JSON object can then be constructed using the indices of each array of strings. So instead of mapping from an array of an array of strings, the filter passed to map will map each string value to an array of strings to a JSON object.

cat lines-to-json.csv | jq -R  | jq --slurp 'map(split(",") | {id: .[0], name: .[1], value: .[2] })'

This produces an array of JSON objects.

[
  {
    "id": "00000",
    "name": "place 000",
    "value": "18391"
  },
  {
    "id": "00001",
    "name": "place 001",
    "value": "18923"
  },
  {
    "id": "00002",
    "name": "\"place\" 002",
    "value": "18322"
  }
]

We are nearly there. Note that the value is still a string. This value must be represented as an integer. As in the previous example, the value can be piped to the function tonumber.

cat lines-to-json.csv | jq -R  | jq --slurp 'map(split(",") | {id: .[0], name: .[1], value: (.[2] | tonumber) })'

With this final addition, the output is as follows.

[
  {
    "id": "00000",
    "name": "place 000",
    "value": 18391
  },
  {
    "id": "00001",
    "name": "place 001",
    "value": 18923
  },
  {
    "id": "00002",
    "name": "\"place\" 002",
    "value": 18322
  }
]

This example shows how jq can be used to construct JSON from CSVs. There are, however, some caveats. The solution assumes the input data is relatively well-formed. Splitting CSV input using a comma is a naive solution, working only on input data that doesn’t also contain commas within values (that is, commas that are not delimiters). This is not unique to this solution; the same approach in other tools are equally naive. It may therefore be desirable to use other tools with more robust CSV parsing capabilities, though similarly robust solutions could likely be produced similarly in jq. Regardless, the techniques of object and array construction are applicable in other contexts.

Summary

This article discussed creating JSON output from non-JSON input. In doing so it covered object and array construction, mapping, type conversion and the split function. It also covered using the --raw option when starting with non-JSON data, and the --slurp options when indiviual values need to be wrapped as an array.

Tags: