Using jq to extract values and format in CSV

  • I have the below JSON file:

    {
    "data": [
        {
            "displayName": "First Name",
            "rank": 1,
            "value": "VALUE"
        },
        {
            "displayName": "Last Name",
            "rank": 2,
            "value": "VALUE"
        },
        {
            "displayName": "Position",
            "rank": 3,
            "value": "VALUE"
        },
        {
            "displayName": "Company Name",
            "rank": 4,
            "value": "VALUE"
        },
        {
            "displayName": "Country",
            "rank": 5,
            "value": "VALUE"
        },
    ]
    }
    

    I would like to have a CSV file in this format:

    First Name, Last Name, Position, Company Name, Country
    VALUE, VALUE, VALUE, VALUE, VALUE, VALUE
    

    Is this possible by using only jq? I don't have any programming skills.

    I provided an answer below, but I'm now looking closer at your question and I can't help wondering - where is the 6th *VALUE* supposed to come from?

  • jq has a filter, @csv, for converting an array to a CSV string. This filter takes into account most of the complexities associated with the CSV format, beginning with commas embedded in fields. (jq 1.5 has a similar filter, @tsv, for generating tab-separated-value files.)

    Of course, if the headers and values are all guaranteed to be free of commas and double quotation marks, then there may be no need to use the @csv filter. Otherwise, it would probably be better to use it.

    For example, if the 'Company Name' were 'Smith, Smith and Smith', and if the other values were as shown below, invoking jq with the "-r" option would produce valid CSV:

    $ jq -r '.data | map(.displayName), map(.value) | @csv' so.json2csv.json
    "First Name","Last Name","Position","Company Name","Country"
    "John (""Johnnie"")","Doe","Director, Planning and Posterity","Smith, Smith and Smith","Transylvania"
    

    I was able to 'jq somestuff | map(.) | @csv', very handy! Thanks

    Your example is going to put all of the display names on the first line and all of the values on the second line, instead of having one line per record.

  • I prefer to make each record an row in my CSV.

    jq '.data | map([.displayName, .rank, .value] | join(", ")) | join("\n")'
    

    What if .value is a number? I receive the error "string and number cannot be added"

    @Cos something like `.value|tostring` instead of `.value` in the above example

    @Cos, I found parenthesis are required. `(.value|tostring)`

    Also, use `jq -r` to strip the quotes

  • Given just this file, you can do something like:

    <testfile jq -r '.data | map(.displayName), map(.value) | join(", ")'
    

    The . operator selects a field from an object/hash. Thus, we start with .data, which returns the array with the data in it. We then map over the array twice, first selecting the displayName, then selecting the value, giving us two arrays with just the values of those keys. For each array, we join the elements with ", " forming two lines. The -r argument tells jq to not quote the resulting strings.

    If your actual file is longer (ie, has entries for more than one person), you will likely need something a bit more complicated.

    It is not working for me. In a related topic, the answer http://stackoverflow.com/questions/32960857/how-to-convert-arbirtrary-simple-json-to-csv-using-jq is both working and very well explained!

  • I've found jq hard to wrap my head around. Here's some Ruby:

    ruby -rjson -rcsv -e '
      data = JSON.parse(File.read "file.json")
      data["data"].collect {|item| [item["displayName"], item["value"]]}
                  .transpose
                  .each {|row| puts row.to_csv}
    '
    
    First Name,Last Name,Position,Company Name,Country
    VALUE,VALUE,VALUE,VALUE,VALUE
    

    The ruby JSON parser barfed about the trailing comma before the close bracket.

  • Since you tagged this python and assuming name of json file is x.json

    import os, json
    with open('x.json') as f:
        x  = json.load(f)
        print '{}{}{}'.format(', '.join(y['displayName'] for y in x['data']), os.linesep,
                 ', '.join(y['value'] for y in x['data']))
    First Name, Last Name, Position, Company Name, Country
    VALUE, VALUE, VALUE, VALUE, VALUE
    
  • Though I had to remove the last comma in your example input to make it work because jq was complaining about expecting another array element, this:

    INPUT | jq -r '[.[][].displayName], [.[][].value]| join(", ")'
    

    ...got me...

    First Name, Last Name, Position, Company Name, Country
    VALUE, VALUE, VALUE, VALUE, VALUE
    

    How it works in a nutshell:

    1. I traversed through to the third level of data objects using the empty [] index field form and .dot notation.
    2. Once deep enough I specified the data fields I wanted by name like .[][].displayName.
    3. I assured that my desired fields were self-associated by returning them as separate array objects like [.[][].displayName], [.[][].value]
    4. And then piped those objects to the join(", ") function to be joined as separate entities.

    In truth doing [.field] is merely another way to map(.field) but this is a little more specific in that it specifies the depth level for retrieving the desired data.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM