How to turn JSON array into Postgres array?

  • I have a column data of type json that holds JSON documents like this:

    {
        "name": "foo",
        "tags": ["foo", "bar"]
    }
    

    I would like to turn the nested tags array into a concatenated string ('foo, bar'). That would be easily possible with the array_to_string() function in theory. However, this function does not accept json input. So I wonder how to turn this JSON array into a Postgres array (type text[])?

    Is `json_extract_path_text(your_column, 'tags')` what you are looking for?

    @a_horse_with_no_name: The remaining problem: array elements are still quoted for JSON format. Text is not properly extracted ...

  • Postgres 9.4 or newer

    Obviously inspired by this post, Postgres 9.4 added the missing function(s):
    Thanks to Laurence Rowe for the patch and Andrew Dunstan for committing!

    To unnest the JSON array. Then use array_agg() or an ARRAY constructor to build a Postgres array from it. Or string_agg() to build a text string.

    Aggregate unnested elements per row in a LATERAL or correlated subquery. Then original order is preserved and we don't need ORDER BY, GROUP BY or even a unique key in the outer query. See:

    Replace 'json' with 'jsonb' for jsonb in all following SQL code.

    SELECT t.tbl_id, d.list
    FROM   tbl t
    CROSS  JOIN LATERAL (
       SELECT string_agg(d.elem::text, ', ') AS list
       FROM   json_array_elements_text(t.data->'tags') AS d(elem)
       ) d;
    

    Short syntax:

    SELECT t.tbl_id, d.list
    FROM   tbl t, LATERAL (
       SELECT string_agg(value::text, ', ') AS list
       FROM   json_array_elements_text(t.data->'tags')  -- col name default: "value"
       ) d;
    

    Related:

    ARRAY constructor in correlated subquery:

    SELECT tbl_id, ARRAY(SELECT json_array_elements_text(t.data->'tags')) AS txt_arr
    FROM   tbl t;
    

    Related:

    Subtle difference: null elements are preserved in actual arrays. This is not possible in the above queries producing a text string, which cannot contain null values. The true representation is an array.

    Function wrapper

    For repeated use, to make this even simpler, encapsulate the logic in a function:

    CREATE OR REPLACE FUNCTION json_arr2text_arr(_js json)
      RETURNS text[] LANGUAGE sql IMMUTABLE PARALLEL SAFE AS
    'SELECT ARRAY(SELECT json_array_elements_text(_js))';
    

    Make it an SQL function, so it can be inlined in bigger queries.
    Make it IMMUTABLE (because it is) to avoid repeated evaluation in bigger queries and allow it in index expressions.
    Make it PARALLEL SAFE (in Postgres 9.6 or later!) to not stand in the way of parallelism. See:

    Call:

    SELECT tbl_id, json_arr2text_arr(data->'tags')
    FROM   tbl;
    

    db<>fiddle here


    Postgres 9.3 or older

    Use the function json_array_elements(). But we get double quoted strings from it.

    Alternative query with aggregation in the outer query. CROSS JOIN removes rows with missing or empty arrays. May also be useful for processing elements. We need a unique key to aggregate:

    SELECT t.tbl_id, string_agg(d.elem::text, ', ') AS list
    FROM   tbl t
    CROSS  JOIN LATERAL json_array_elements(t.data->'tags') AS d(elem)
    GROUP  BY t.tbl_id;
    

    ARRAY constructor, still with quoted strings:

    SELECT tbl_id, ARRAY(SELECT json_array_elements(t.data->'tags')) AS quoted_txt_arr
    FROM   tbl t;
    

    Note that null is converted to the text value "null", unlike above. Incorrect, strictly speaking, and potentially ambiguous.

    Poor man's unquoting with trim():

    SELECT t.tbl_id, string_agg(trim(d.elem::text, '"'), ', ') AS list
    FROM   tbl t, json_array_elements(t.data->'tags') d(elem)
    GROUP  BY 1;
    

    Retrieve a single row from tbl:

    SELECT string_agg(trim(d.elem::text, '"'), ', ') AS list
    FROM   tbl t, json_array_elements(t.data->'tags') d(elem)
    WHERE  t.tbl_id = 1;
    

    Strings form correlated subquery:

    SELECT tbl_id, (SELECT string_agg(trim(value::text, '"'), ', ')
                    FROM   json_array_elements(t.data->'tags')) AS list
    FROM   tbl t;
    

    ARRAY constructor:

    SELECT tbl_id, ARRAY(SELECT trim(value::text, '"')
                         FROM   json_array_elements(t.data->'tags')) AS txt_arr
    FROM   tbl t;
    

    Original (outdated) SQL Fiddle.
    db<>fiddle here.

    Related:

    Notes (outdated since pg 9.4)

    We would need a json_array_elements_text(json), the twin of json_array_elements(json) to return proper text values from a JSON array. But that seems to be missing from the provided arsenal of JSON functions. Or some other function to extract a text value from a scalar JSON value. I seem to be missing that one, too.
    So I improvised with trim(), but that will fail for non-trivial cases ...

    Good post like always, but with your knowledge of the internals why isn't the cast from array->jsonb there. I can understand not implementing the other cast because the sql-array is more strongly typed. Is it just because PostgreSQL is averse to auto generating code to cast (int[], bigint[], text[]) etc.

    @Evan: You'd use **`to_jsonb()`** for array->jsonb conversion.

    Does `SELECT ARRAY(SELECT json_array_elements_text(_js))` really guarantee that the ordering of the array is preserved? Isn't the optimizer allowed to theoretically alter the order of the rows coming out of json_array_elements_text?

    @Felix: there is no formal guarantee in the SQL standard. (then again, set returning functions aren't even allowed in the SELECT list in standard SQL to begin with.) but there is an informal assertion in the Postgres manual. see: https://dba.stackexchange.com/a/185862/3684 To be explicit - at the cost of a minor perfomance penalty - see: https://dba.stackexchange.com/a/27287/3684. Personally, I am 100 % sure this particular expression works as expected in every present and future version of Postgres since 9.4.

    @ErwinBrandstetter thank you so much for confirming this! I'm currently doing some research for an article that summarizes the formal and informal guarantees ordering guarantees provided by PostgreSQL and your answers have been incredibly helpful! If you'd be interested to review the article let me know, but no worries if not. I'm incredibly grateful for your StackOverflow contributions and learned a lot from you over the years!

  • PG 9.4+

    The accepted answer is definitely what you need, but for the sake of simplicity here is a helper I use for this:

    CREATE OR REPLACE FUNCTION jsonb_array_to_text_array(p_input jsonb)
     RETURNS text[]
     LANGUAGE sql
     IMMUTABLE
    AS $function$
    
    SELECT array_agg(ary)::text[] FROM jsonb_array_elements_text(p_input) AS ary;
    
    $function$;
    

    Then just do:

    SELECT jsonb_array_to_text_array('["a", "b", "c"]'::jsonb);
    

    Updated 2/23/2020 in response to comments: Comments are correct that this could be more efficient. At the time I posted there was no modularized solution offered so I offered one in earnest, if non-optimal. Since then Erwin has updated his answer with a simple and efficient function so I never updated mine. Updating it now since there is still attention coming to this answer

    One more update, because this just bit me: The above function will return null if there are no values. This may not be desirable depending on your situation. Here's a function which returns an empty array if the value is not null, but still returns null if the input is null.

    CREATE OR REPLACE FUNCTION jsonb_array_to_text_array_strict(p_input jsonb)
     RETURNS text[]
     LANGUAGE sql
     IMMUTABLE
    AS $function$
    
    SELECT 
      CASE 
        WHEN p_input IS null 
        THEN null 
        ELSE coalesce(ary_out, ARRAY[]::text[]) 
      END
    FROM (
      SELECT array_agg(ary)::text[] AS ary_out
      FROM jsonb_array_elements_text(p_input) AS ary
    ) AS extracted;
    
    $function$
    ;
    

    I added some faster expressions to my answer and a simpler function. This can be substantially cheaper.

    This function should be pure SQL so that the optimizer can peek into it. No need to use pgplsql here.

    ... and it should be IMMUTABLE not VOLATILE, since it does not touch the database and any given input will always generate the same output.

  • This question was asked on the PostgreSQL mailing lists and I came up with this hackish way of converting JSON text to PostgreSQL text type via the JSON field extraction operator:

    CREATE FUNCTION json_text(json) RETURNS text IMMUTABLE LANGUAGE sql
    AS $$ SELECT ('['||$1||']')::json->>0 $$;
    
    db=# select json_text(json_array_elements('["hello",1.3,"\u2603"]'));
     json_text 
    -----------
     hello
     1.3
     ☃
    

    Basically it converts the value into a single-element array and then asks for the first element.

    Another approach would be to use this operator to extract all fields one-by-one. But for large arrays this is likely slower, as it needs to parse the whole JSON string for each array element, leading to O(n^2) complexity.

    CREATE FUNCTION json_array_elements_text(json) RETURNS SETOF text IMMUTABLE LANGUAGE sql
    AS $$ SELECT $1->>i FROM generate_series(0, json_array_length($1)-1) AS i $$;
    
    db=# select json_array_elements_text('["hello",1.3,"\u2603"]');
     json_array_elements_text 
    --------------------------
     hello
     1.3
     ☃
    
  • I've tested a few options. Here is my favorite query. Suppose we have a table containing id and json field. The json field contains array, which we want to turn into pg array.

    SELECT * 
    FROM   test 
    WHERE  TRANSLATE(jsonb::jsonb::text, '[]','{}')::INT[] 
           && ARRAY[1,2,3];
    

    It is working anywhere and faster than others, but looks crutchy)

    Firstly json array is casted as text, and then we just change square brackets to parenthesis. Finally the text is being casted as array of required type.

    SELECT TRANSLATE('[1]'::jsonb::text, '[]','{}')::INT[];
    

    and if you prefer text[] arrays

    SELECT TRANSLATE('[1]'::jsonb::text, '[]','{}')::TEXT[];
    

    `SELECT TRANSLATE('{"name": "foo", "tags": ["foo", "bar"]}'::jsonb::text, '[]','{}')::INT[]; ERROR: malformed array literal: "{"name": "foo", "tags": {"foo", "bar"}}"` I think you have to add some explanation about how this is supposed to work.

    The question is how to turn JSON array(!) into pg array. Suppose I have the table containing id and jsonb columns. JSONb column contains json array. Then

    TRANSLATE(jsonb::jsonb::text, '[]','{}')::INT[] converts json array to pg array.

    `SELECT translate('["foo", "bar"]'::jsonb::text, '[]','{}')::INT[]; ERROR: invalid input syntax for integer: "foo"` It's not so bomb-proof...

    Consider using text[] for these arrays

    This helped me - my case was guaranteed to have only integers in the array.

  • These few functions, taken from answers to this question, are what I'm using and they're working great

    CREATE OR REPLACE FUNCTION json_array_casttext(json) RETURNS text[] AS $f$
        SELECT array_agg(x) || ARRAY[]::text[] FROM json_array_elements_text($1) t(x);
    $f$ LANGUAGE sql IMMUTABLE;
    
    CREATE OR REPLACE FUNCTION jsonb_array_casttext(jsonb) RETURNS text[] AS $f$
        SELECT array_agg(x) || ARRAY[]::text[] FROM jsonb_array_elements_text($1) t(x);
    $f$ LANGUAGE sql IMMUTABLE;
    
    CREATE OR REPLACE FUNCTION json_array_castint(json) RETURNS int[] AS $f$
        SELECT array_agg(x)::int[] || ARRAY[]::int[] FROM json_array_elements_text($1) t(x);
    $f$ LANGUAGE sql IMMUTABLE;
    
    CREATE OR REPLACE FUNCTION jsonb_array_castint(jsonb) RETURNS int[] AS $f$
        SELECT array_agg(x)::int[] || ARRAY[]::int[] FROM jsonb_array_elements_text($1) t(x);
    $f$ LANGUAGE sql IMMUTABLE;
    

    In each of them, by concatenating with an empty array, they handle a case that had me racking my brain for a bit, in that if you try and cast an empty array from json/jsonb without it you'll get nothing returned, instead of an empty array ({}) as you would expect. I'm certain there's some optimization for them, but they're left as is for simplicity in explaining the concept.

License under CC-BY-SA with attribution


Content dated before 6/26/2020 9:53 AM