How to parse JSON with shell scripting in Linux?

  • I have a JSON output from which I need to extract a few parameters in Linux.

    This is the JSON output:

    {
            "OwnerId": "121456789127",
            "ReservationId": "r-48465168",
            "Groups": [],
            "Instances": [
                {
                    "Monitoring": {
                        "State": "disabled"
                    },
                    "PublicDnsName": null,
                    "RootDeviceType": "ebs",
                    "State": {
                        "Code": 16,
                        "Name": "running"
                    },
                    "EbsOptimized": false,
                    "LaunchTime": "2014-03-19T09:16:56.000Z",
                    "PrivateIpAddress": "10.250.171.248",
                    "ProductCodes": [
                        {
                            "ProductCodeId": "aacglxeowvn5hy8sznltowyqe",
                            "ProductCodeType": "marketplace"
                        }
                    ],
                    "VpcId": "vpc-86bab0e4",
                    "StateTransitionReason": null,
                    "InstanceId": "i-1234576",
                    "ImageId": "ami-b7f6c5de",
                    "PrivateDnsName": "ip-10-120-134-248.ec2.internal",
                    "KeyName": "Test_Virginia",
                    "SecurityGroups": [
                        {
                            "GroupName": "Test",
                            "GroupId": "sg-12345b"
                        }
                    ],
                    "ClientToken": "VYeFw1395220615808",
                    "SubnetId": "subnet-12345314",
                    "InstanceType": "t1.micro",
                    "NetworkInterfaces": [
                        {
                            "Status": "in-use",
                            "SourceDestCheck": true,
                            "VpcId": "vpc-123456e4",
                            "Description": "Primary network interface",
                            "NetworkInterfaceId": "eni-3619f31d",
                            "PrivateIpAddresses": [
                                {
                                    "Primary": true,
                                    "PrivateIpAddress": "10.120.134.248"
                                }
                            ],
                            "Attachment": {
                                "Status": "attached",
                                "DeviceIndex": 0,
                                "DeleteOnTermination": true,
                                "AttachmentId": "eni-attach-9210dee8",
                                "AttachTime": "2014-03-19T09:16:56.000Z"
                            },
                            "Groups": [
                                {
                                    "GroupName": "Test",
                                    "GroupId": "sg-123456cb"
                                }
                            ],
                            "SubnetId": "subnet-31236514",
                            "OwnerId": "109030037527",
                            "PrivateIpAddress": "10.120.134.248"
                        }
                    ],
                    "SourceDestCheck": true,
                    "Placement": {
                        "Tenancy": "default",
                        "GroupName": null,
                        "AvailabilityZone": "us-east-1c"
                    },
                    "Hypervisor": "xen",
                    "BlockDeviceMappings": [
                        {
                            "DeviceName": "/dev/sda",
                            "Ebs": {
                                "Status": "attached",
                                "DeleteOnTermination": false,
                                "VolumeId": "vol-37ff097b",
                                "AttachTime": "2014-03-19T09:17:00.000Z"
                            }
                        }
                    ],
                    "Architecture": "x86_64",
                    "KernelId": "aki-88aa75e1",
                    "RootDeviceName": "/dev/sda1",
                    "VirtualizationType": "paravirtual",
                    "Tags": [
                        {
                            "Value": "Server for testing RDS feature in us-east-1c AZ",
                            "Key": "Description"
                        },
                        {
                            "Value": "RDS_Machine (us-east-1c)",
                            "Key": "Name"
                        },
                        {
                            "Value": "1234",
                            "Key": "cost.centre",
                          },
                        {
                            "Value": "Jyoti Bhanot",
                            "Key": "Owner",
                          }
                    ],
                    "AmiLaunchIndex": 0
                }
            ]
        }
    

    I want to write a file that contains heading like instance id, tag like name, cost center, owner. and below that certain values from the JSON output. The output here given is just an example.

    How can I do that using sed and awk?

    Expected output :

     Instance id         Name                           cost centre             Owner
        i-1234576          RDS_Machine (us-east-1c)        1234                   Jyoti
    

    Pipe your CLI call into python, suggested because it's native to EC2 instances. Python can easily interpret JSON. See the answer below for an example. Of course, you could use any other SS language too, but they will require installs whereas Python is already there.

  • The availability of parsers in nearly every programming language is one of the advantages of JSON as a data-interchange format.

    Rather than trying to implement a JSON parser, you are likely better off using either a tool built for JSON parsing such as jq or a general purpose script language that has a JSON library.

    For example, using jq, you could pull out the ImageID from the first item of the Instances array as follows:

    jq '.Instances[0].ImageId' test.json
    

    Alternatively, to get the same information using Ruby's JSON library:

    ruby -rjson -e 'j = JSON.parse(File.read("test.json")); puts j["Instances"][0]["ImageId"]'
    

    I won't answer all of your revised questions and comments but the following is hopefully enough to get you started.

    Suppose that you had a Ruby script that could read a from STDIN and output the second line in your example output[0]. That script might look something like:

    #!/usr/bin/env ruby
    require 'json'
    
    data = JSON.parse(ARGF.read)
    instance_id = data["Instances"][0]["InstanceId"]
    name = data["Instances"][0]["Tags"].find {|t| t["Key"] == "Name" }["Value"]
    owner = data["Instances"][0]["Tags"].find {|t| t["Key"] == "Owner" }["Value"]
    cost_center = data["Instances"][0]["SubnetId"].split("-")[1][0..3]
    puts "#{instance_id}\t#{name}\t#{cost_center}\t#{owner}"
    

    How could you use such a script to accomplish your whole goal? Well, suppose you already had the following:

    • a command to list all your instances
    • a command to get the json above for any instance on your list and output it to STDOU

    One way would be to use your shell to combine these tools:

    echo -e "Instance id\tName\tcost centre\tOwner"
    for instance in $(list-instances); do
        get-json-for-instance $instance | ./ugly-ruby-scriptrb
    done
    

    Now, maybe you have a single command that give you one json blob for all instances with more items in that "Instances" array. Well, if that is the case, you'll just need to modify the script a bit to iterate through the array rather than simply using the first item.

    In the end, the way to solve this problem, is the way to solve many problems in Unix. Break it down into easier problems. Find or write tools to solve the easier problem. Combine those tools with your shell or other operating system features.

    [0] Note that I have no idea where you get cost-center from, so I just made it up.

    i have installed jq on my machine. but i dont know how to get the information. i am updating the question

    How to do that. the command ec2-describe instance gives reslut like this. this is data for 1 instance, there are 100 instance. How to do that in a script

    i have aws cli tools that gives me the output. now how to parse the output and the required tags that i really dont know

    @user3086014 I'm sorry, but I won't be putting more work into this answer. Take a look at the Ruby example I have there. It should give you a good place to start on how to get tags out of various parts of the JSON you want.

    In the moltitude of json tools available jq is my favorite https://stedolan.github.io/jq/manual/. Available in std distribution as well. A playground for testing filters is available at https://jqplay.org/jq?q=.&;j=%22Hello%2C%20world!%22

    I have no idea why you don't appear higher in the results. jq is by far the simplest and more portable solution (ease of use, well documented, there is an online tester available, and the program consists of one single binary file with no dependencies). I tried with jsawk and with a combo of awk/sed/grep before finding out jq, I think jq is best.

    kudos for jq library

  • You can use following python script to parse that data. Lets assume that you have JSON data from arrays in files like array1.json, array2.json and so on.

    import json
    import sys
    from pprint import pprint
    
    jdata = open(sys.argv[1])
    
    data = json.load(jdata)
    
    print "InstanceId", " - ", "Name", " - ", "Owner"
    print data["Instances"][0]["InstanceId"], " - " ,data["Instances"][0]["Tags"][1]["Value"], " - " ,data["Instances"][0]["Tags"][2]["Value"] 
    
    jdata.close()
    

    And then just run:

    $ for x in `ls *.json`; do python parse.py $x; done
    InstanceId  -  Name  -  Owner
    i-1234576  -  RDS_Machine (us-east-1c)  -  Jyoti Bhanot
    

    I haven't seen cost in your data, that's why I didn't include that.

    According to discussion in comments, I have updated parse.py script:

    import json
    import sys
    from pprint import pprint
    
    jdata = sys.stdin.read()
    
    data = json.loads(jdata)
    
    print "InstanceId", " - ", "Name", " - ", "Owner"
    print data["Instances"][0]["InstanceId"], " - " ,data["Instances"][0]["Tags"][1]["Value"], " - " ,data["Instances"][0]["Tags"][2]["Value"] 
    

    You can try to run following command:

    #ec2-describe-instance <instance> | python parse.py
    

    but this is just one array there are similar arrays returned by the command. how to do that

    and this data is generated by ec2-describe instance command at runtime. how to handle that

    I have modified a bit this python script: `import json from pprint import pprint jdata = open('example.json') data = json.load(jdata) print "InstanceId", " - ", "Name", " - ", "Owner" print data["Instances"][0]["InstanceId"], " - " ,data["Instances"][0]["Tags"][1]["Value"], " - " ,data["Instances"][0]["Tags"][2]["Value"] jdata.close()` If you have all json data from arrays in files like array1.json, array2.json, ... and so on, you could try to run it like this: `# for x in `ls *.json`; do python parse.py $x; done`

    you can update the answer itself . its not readable

    also i have arrays.100 of arrays like this

    jdata will come from ec2-describe-instance command

    I've just updated above answer.

    how can i fetch the things from a command itfelf.

    Note : sir the data will be generated at runtime from the command, so it will not be stored in file in json format. We need to parse it only at the run time itslef

    I have updated my answer.

  • Others have provided general answers for your question which demonstrate good ways of parsing json however I, like you, were looking for a way to extract an aws instance id using a core tool like awk or sed without depending on other packages. To accomplish this you can pass the "--output=text" argument to your aws command which will give you an awk parsable string. With that you can simply get the instance ID using something like the following...

    aws ec2 run-instances --output text  | awk -F"\t" '$1=="INSTANCES" {print $8}'
    

    exactly what i needed. thanks

    Life saver! their cli manual doesn't show the option of output :/. Thanks a ton though

  • The following jq code:

    .Instances[] | (.Tags | map(.value=.Value | .key=.Key) | from_entries) as $tags | "\(.InstanceId) | \($tags.Name) | \($tags["cost.centre"]) | \($tags.Owner)"
    

    used like:

    json_producer | jq -r '<jq code...>'
    

    would output:

    i-1234576 | RDS_Machine (us-east-1c) | 1234 | Jyoti Bhanot
    

    A few pointers to understand the code:

    • from_entries takes an array of objects like {key:a, value:b} and turns it into an object with corresponding key/value pairs ({a: b});
    • The Key and Value keys in the Tags array had to be converted to lowercase;
    • The last string uses jq's string interpolation feature. You can tweak it as needed.

    Fore more details, go see jq's tutorial and manual at https://stedolan.github.io/jq/

    You can now shorten the extraction of tags using `(.Tags | map({Value, Key}) | from_entries) as $tags`, without converting keys to lowercase.

  • Jshon is available in several distributions :

    $ echo your_JSON|jshon -e Instances -a -e InstanceId -u -p -e Tags -a -e Key -u -p -e Value -u
    i-1234576
    Description
    Server for testing RDS feature in us-east-1c AZ
    Name
    RDS_Machine (us-east-1c)
    cost.centre
    1234
    Owner
    Jyoti Bhanot
    

    Poor explanation : -e uu will extract object uu, -a will make the array usable (not sure I correctly phrased this one but anyway…), -u will decode the string, -p will go back to previous item (seems that -i N, N being any number, has the same effect).

    Depending on your case, the output may require some post-treatment (like yours, as you can see).

    Jshon doesn's seem robust against JSON malformation, though (your "Tags" with commas before the closing curly bracket will raise an error).

    Someone mentioned jsawk in another thread, but I have not tested it.

  • If this is limited to the AWS use case provided above, you should use --query and --output flags for your CLI API call

    http://docs.aws.amazon.com/cli/latest/userguide/controlling-output.html

  • Here is one-liner suggestion:

    pr -mt \
     <(grep -o ".*: .*," in.json | grep -iw InstanceId | cut -d: -f2) \
     <(grep -o ".*: .*," in.json | grep -iw Value      | cut -d: -f2) \
     <(grep -o ".*: .*," in.json | grep -iw Key        | cut -d: -f2)
    

    Not perfect, but it'd work if you tweak it a bit.

    It basically using pr to print each set result per column. Each result set is returned by process substitution which parse JSON file and return values based on the key.

    This works similar as described in: Given key-value content, how do I group values by key and sort by value?

  • Take a look at jtc cli tool:

    it lets easily extracting required information from your json (assuming it's in file.json, btw, your JSON needs to be fixed, there's couple extra commas there):

    bash $ cat file.json | jtc -x '<InstanceId>l+0[-1]' -y '[InstanceId]' -y "[Key]:<Name>[-1][Value]" -y "[Key]:<cost.centre>[-1][Value]" -y "[Key]:<Owner>[-1][Value]" | sed 's/"/\\"/g' | xargs -L4 echo
    "i-1234576" "RDS_Machine (us-east-1c)" "1234" "Jyoti Bhanot"
    bash $ 
    
  • So many suggestions and enough to confuse about choosing one. I just want to keep things simple here with good simple practical examples.

    To parse JSON there are various solution. With Shell one best tool is

    "jq"

    jq tutorial: https://stedolan.github.io/jq/tutorial/

    Install jq at centos as:

    yum install -y jq
    

    . But I always suggest to know enough about JSON format. It will helps in understanding the way how we go for parsing.

    suppose I want to see number of volumes associated with my docker container I will use below command:

    To see complete output:

    docker inspect <container_name> | jq .
    

    To see all mounts:

    docker inspect <container_name> | jq .[0].Mounts
    

    To see 1st Item (0th) of Mount (with screenshot):

    docker inspect <container_name> | jq .[0].Mounts[0]
    

    'jq' output sample for reference

    To see specifically 'Source' field of first Mount item:

    docker inspect <container_name> | jq .[0].Mounts[1].Source
    

    To see the total counts of volumes (lenght):

    docker inspect <container_name> | jq '.[0].Mounts | length'
    

    There are already two answers here suggesting jq

  • jq "." recovery.js | head -n 20

    translates your jason file to something readeable like this:

    {
      "version": [
        "sessionrestore",
        1
      ],
      "windows": [
        {
          "tabs": [
            {
              "entries": [
                {
                  "url": "http://orf.at/#/stories/2.../",
                  "title": "news.ORF.at",
                  "charset": "UTF-8",
                  "ID": 9588,
                  "docshellID": 298,
                  "docIdentifier": 10062,
                  "persist": true
                },
    ...
    

    Now it should be possible to parse your data with any standard tools

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM