Hypercipient

Restructure JSON Objects Using jq

You may encounter a scenario in which you want to produce JSON with a subset of data from an input of JSON. Further, the input data may need to be presented with a slightly different structure than the original. As you may already know, jq provides this capability. This post describes a specific scenario where you need to map from the values of elements of a JSON object to an object where the name of each element is the original element name, and the value is a single value of the original object’s element.

To clarify the use case, start with the following input stored in addrs.json.

{
   "addrs": {
    "o0": {
      "addr": "addr0",
      "v": "val0"
    },
    "o1": {
      "addr": "addr1",
      "v": "val1"
    },
    "o2": {
      "addr": "addr2",
      "v": "val2"
    }
  }
}

The goal is to produce a mapping of the names of each element of addrs to the value of each element’s value of addr. That is, produce the following.

{
  "o0": "addr0",
  "o1": "addr1",
  "o2": "addr2"
}

To first method shown encapsulates a few intermediate steps using the function with_entries. To help deepen the reader’s understanding how this function works, the intermediate steps are elaborated. The solidify these concepts, some additional examples are provided.

Converting an Object to List of Objects

jq excels at extracting data from JSON. The goal of producing the JSON described in the introduction can be achieved, after extracting the value of addrs, with a single command.

cat addrs.json | jq '.addrs | with_entries({name: .key, value: .value.addr})'

This takes the value of the element .addrs and pipes it to with_entries(f), where f is a filter applied to each element.

On first encounter this may be an unsatisfying solution, as without an understanding of what with_entries does, it may be unclear how to formulate the filter, or how to extend this filter when a slightly different scenario arises. For example, from where did .key arise? It is not in the original JSON, nor is .value. It does have an element from the original JSON, addr, but .value is not present?

The answers to these questions can be obtained by understanding the equivalent expression with_entries encapsulates. It is an encapsulation of three functions connected into a pipeline, to_entries | map({key: .key, value: .value.addr}) | from_entries. Let’s break each expression in the pipeline in turn. As a hint, with_entries requires input objects with a specific structure.

Produce an Array of Key-Value Pairs

The first expression in the pipeline is simply the function to_entries. If the input is an object, it maps each element to an array of objects with two element, one named key and one named value.

cat addrs.json | jq '.addrs' | to_entries

This produces the following array.

[
  {
    "key": "o0",
    "value": {
      "addr": "addr0",
      "v": "val0"
    }
  },
  {
    "key": "o1",
    "value": {
      "addr": "addr1",
      "v": "val1"
    }
  },
  {
    "key": "o2",
    "value": {
      "addr": "addr2",
      "v": "val2"
    }
  }
]

The elements key and value have made an appearance. As mentioned, the output of to_entries will always produce objects with exactly two elements: key and value. How these are used in the filter are explained in the next section.

Map Elements to Key-Value Pairs

It might seem like we are heading in the wrong direction. However, now we have an array of objects that can be mapped to simpler objects using map(f). This function applies the filter f to each item in the array. In other words, . is the current item, and the values of key and value’s addr element can thus be referenced with the filter .key and .value.addr respectively.

cat addrs.json | jq '.addrs | to_entries | map({key: .key, value: .value.addr})'

Notice that the value of f is the same as the value used for with_entries. This produces the following array of objects.

[
  {
    "key": "o0",
    "value": "addr0"
  },
  {
    "key": "o1",
    "value": "addr1"
  },
  {
    "key": "o2",
    "value": "addr2"
  }
]

We now have exactly the data we need. However, the structure still needs alteration. The elements key and value already made their appearance in to_entries, and are still present in this output. The value of the filter and the reason behind its specific structure will become clear in the next section.

Map Back to an Object

The last step is to map this array of objects back to an object where the element names and values are obtained from key and value respectively. This can be accomplished with the from_entries function.

cat addrs.json | jq '.addrs | to_entries | map({key: .key, value: .value.addr}) | from_entries'

This results in the desired JSON, exactly as the version with with_entries.

{
  "o0": "addr0",
  "o1": "addr1",
  "o2": "addr2"
}

The function from_entries takes each value of the input array and creates an object, essentially reversing the initial mapping, with map(f) getting the list of objects into the structure required by from_entries. The function to_entries produces objects with two elements: key and value. The function from_entries takes objects that have two element (key and value) and converts them to elements where the name is key, and the value is value. For example, the object

{
  "key": "o1",
  "value": "addr1"
}

gets converted to the following.

{
  "o1": "addr1"
}

The function with_entries simply does the work of all three expressions in a single expression. The filter passed to it (and likewise map in the expanded version) must create objects of that conform the structure required by from_entries.

Hopefully showing the intermediate steps and explaining the input and output structures between them help clarify use of all these operations. To further solidify your understanding, let’s proceed with a few more examples that use jq’s other features.

Add Some Numbers

It is no problem for jq to add numbers in an array. The function add takes an array of numbers and produces a number. The following example demonstrates this.

echo "[123, 456, 789]" | jq 'add'
1368

Seldom will input data be some simple, however. Let’s take the following JSON as our input. Maybe this is the list of books you read so far this month and you are curious how many total pages you have read.

{
  "books": [
    {
      "title": "The Count of Monte Christo",
      "pages": 1200
    },
    {
      "title": "How to Do the Right Thing",
      "pages": 201
    },
    {
      "title": "Consider the Lobster",
      "pages": 437
    }
  ]
}

We know the input to add is an array of numbers. So let’s make our first goal to get the list of books. This can be obtained using the filter .books.

cat numbers.json | jq '.books '

This produces the following.

[
  {
    "title": "The Count of Monte Christo",
    "pages": 1200
  },
  {
    "title": "How to Do the Right Thing",
    "pages": 201
  },
  {
    "title": "Consider the Lobster",
    "pages": 437
  }
]

With the list of books as our current output, each item of the array can be mapped to an array of numbers using map(.pages). The filter .pages is applied to each book and map returns an array.

cat numbers.json | jq '.books | map(.pages)'
[
  1200,
  201,
  437
]

This is precisely the required input for add. The last step is thus to pipe this output to it.

cat numbers.json | jq '.books | map(.pages) | add'

And we get the following output.

1838

The next section expands on this construct.

Perform a Mathematical Operation On An Array of Values

A related use case is to perform a mathematical operation on each value in a list of values. To keep things brief while still providing the general concept, let’s take the case of rounding a each value in an array.

echo '[123.4, 124.5, 125.6]' | jq '[.[] | round]'

As you might expect, this rounds each value in the input array and produces an array of the rounded values.

[
  123,
  125,
  126
]

Here is the equivalent version using map(f).

echo '[123.4, 124.5, 125.6]' | jq 'map(round)'

This is perhaps more readable, especially used in a map-reduce style of operation. For example, to find the maximum value (that is, the reduce operation), pipe the result to the max function.

echo '[123.4, 124.5, 125.6]' | jq 'map(round) | max'

Get All the Values of a Given Element

For the final example, lets say we want to find all the values of the addrs element of each thing object in the things array.

{
  "things": [
    {
      "name": "thing-000",
      "addrs": [
        {
          "names": [ "addr-001", "addr-000" ]
        },
        {
          "names": [ "addr-001", "addr-002" ]
        }
      ]
    },
    {
      "name": "thing-001",
      "addrs": [
        {
          "names": [ "addr-002", "addr-003" ]
        },
        {
          "names": [ "addr-004", "addr-003" ]
        }
      ]
    },
    {
      "name": "thing-002",
      "addrs": [
        {
          "names": [ "addr-004", "addr-005" ]
        },
        {
          "names": [ "addr-004", "addr-006" ]
        },
        {
          "names": [ "addr-004", "addr-005" ]
        }
      ]
    }
  ]
}

Again, we call upon map(f) function to map from an array of objects to an array of values.

cat things-addrs.json | jq '.things | map(.addrs[].names)'

This does not produce an array of objects, though. Instead we get a list of lists, because the filter (.addrs[].names) produced an array.

[
  [
    "addr-001",
    "addr-000"
  ],
  [
    "addr-001",
    "addr-002"
  ],
  [
    "addr-002",
    "addr-003"
  ],
  [
    "addr-004",
    "addr-003"
  ],
  [
    "addr-004",
    "addr-005"
  ],
  [
    "addr-004",
    "addr-006"
  ],
  [
    "addr-004",
    "addr-005"
  ]
]

This is quite a common case, and the flatten function takes each item of each array and produces an array with all the values.

cat things-addrs.json | jq '.things | map(.addrs[].names) | flatten'

This has all the values contained in all the array as follows.

[
  "addr-001",
  "addr-000",
  "addr-001",
  "addr-002",
  "addr-002",
  "addr-003",
  "addr-004",
  "addr-003",
  "addr-004",
  "addr-005",
  "addr-004",
  "addr-006",
  "addr-004",
  "addr-005"
]

We might be, however, only interested in unique entries. Piping this output to unique removes duplicates.

cat things-addrs.json | jq '.things | map(.addrs[].names) | flatten | unique'
[
  "addr-000",
  "addr-001",
  "addr-002",
  "addr-003",
  "addr-004",
  "addr-005",
  "addr-006"
]

Summary

Perhaps you have inherited some code that used with_entries and did not understand what it was doing. Or maybe you wrote that code and forgot what it did an need to explain it to a colleague. Hopefully this explanation assisted with these situations, and others as well. To close, here are a few key takeaways.

  • Pipe arrays into map(f) to output arrays with items resulting from f.

  • Pass the filter f to map such that f produces the desired JSON type (string, object, array, etc).

  • When starting with an object and needing to combine the name of each element with its value, first convert it to name-value pairs using to_entries, then use map(f) to perform further mapping.

  • When you need to calculate a specific resulting value from values contained in an element of each object contained in an array, use map(f) to extract the values and pipe its output to your required reduce operation (for example, add or max).

Tags: