How do I insert a row which contains a foreign key?

  • Using PostgreSQL v9.1. I have the following tables:

        type VARCHAR(60) NOT NULL UNIQUE
        description VARCHAR(40) NOT NULL UNIQUE,

    Say the first table foo is populated like this:

    INSERT INTO foo (type) VALUES
        ( 'red' ),
        ( 'green' ),
        ( 'blue' );

    Is there any way to insert rows into bar easily by referencing the foo table? Or must I do it in two steps, first by looking up the foo type I want, and then inserting a new row into bar?

    Here is an example of pseudo-code showing what I was hoping could be done:

    INSERT INTO bar (description, foo_id) VALUES
        ( 'testing',     SELECT id from foo WHERE type='blue' ),
        ( 'another row', SELECT id from foo WHERE type='red'  );
  • Your syntax is almost good, needs some parenthesis around the subqueries and it will work:

    INSERT INTO bar (description, foo_id) VALUES
        ( 'testing',     (SELECT id from foo WHERE type='blue') ),
        ( 'another row', (SELECT id from foo WHERE type='red' ) );

    Tested at DB-Fiddle

    Another way, with shorter syntax if you have a lot of values to insert:

    WITH ins (description, type) AS
    ( VALUES
        ( 'more testing',   'blue') ,
        ( 'yet another row', 'green' )
       (description, foo_id) 
      foo JOIN ins
        ON ins.type = foo.type ;

    Took reading it a few times, but I now understand that 2nd solution you provided. I like it. Using it now to bootstrap my database with a handful of known values when the system first comes up.

  • Plain INSERT

    INSERT INTO bar (description, foo_id)
    SELECT val.description,
    FROM  (
          (text 'testing', text 'blue')  -- explicit type declaration; see below
        , ('another row', 'red' )
        , ('new row1'   , 'purple')      -- purple does not exist in foo, yet
        , ('new row2'   , 'purple')
       ) val (description, type)
    LEFT   JOIN foo f USING (type);
    • The use of a LEFT [OUTER] JOIN instead of [INNER] JOIN means that rows from val aren't dropped when no match is found in foo. Instead, NULL is entered for foo_id.

    • The VALUES expression in the subquery does the same as @ypercube's CTE. Common Table Expressions offer additional features and are easier to read in big queries, but they also pose as optimization barriers. So subqueries are typically a bit faster when none of the above is needed.

    • id as column name is a wide-spread anti-pattern. Should be foo_id and bar_id or anything descriptive. When joining a bunch of tables, you end up with multiple columns all named id ...

    • Consider plain text or varchar instead of varchar(n). If you really need to impose a length restriction, add a CHECK constraint:

    • You may need to add explicit type casts. Since the VALUES expression is not directly attached to a table (like in INSERT ... VALUES ...), types cannot be derived and default data types are used without explicit type declaration, which may not work in all cases. It's enough to do it in the first row, the rest will fall in line.

    INSERT missing FK rows at the same time

    If you want to create non-existent entries in foo on the fly, in a single SQL statement, CTEs are instrumental:

    WITH sel AS (
       SELECT val.description, val.type, AS foo_id
       FROM  (
             (text 'testing', text 'blue')
           , ('another row', 'red'   )
           , ('new row1'   , 'purple')
           , ('new row2'   , 'purple')
          ) val (description, type)
       LEFT   JOIN foo f USING (type)
    , ins AS ( 
       INSERT INTO foo (type)
       RETURNING id AS foo_id, type
    INSERT INTO bar (description, foo_id)
    SELECT sel.description, COALESCE(sel.foo_id, ins.foo_id)
    FROM   sel
    LEFT   JOIN ins USING (type);

    Note the two new dummy rows to insert. Both are purple, which does not exist in foo, yet. Two rows to illustrate the need for DISTINCT in the first INSERT statement.

    Step-by-step explanation

    1. The 1st CTE sel provides multiples rows of input data. The subquery val with the VALUES expression can be replaced with a table or subquery as source. Immediately LEFT JOIN to foo to append the foo_id for pre-existing type rows. All other rows get foo_id IS NULL this way.

    2. The 2nd CTE ins inserts distinct new types (foo_id IS NULL) into foo, and returns the newly generated foo_id - together with the type to join back to insert rows.

    3. The final outer INSERT can now insert a for every row: either the type pre-existed, or it was inserted in step 2.

    Strictly speaking, both inserts happen "in parallel", but since this is a single statement, default FOREIGN KEY constraints will not complain. Referential integrity is enforced at the end of the statement by default.

    SQL Fiddle for Postgres 9.3. (Works the same in 9.1.)

    There is a tiny race condition if you run multiple of these queries concurrently. Read more under related questions here and here and here. Really only happens under heavy concurrent load, if ever. In comparison to caching solutions like advertised in another answer, the chance is super-tiny.

    Function for repeated use

    For repeated use I would create an SQL function that takes an array of records as parameter and use unnest(param) in place of the VALUES expression.

    Or, if the syntax for arrays of records is too messy for you, use a comma-separated string as parameter _param. For instance of the form:


    Then use this to replace the VALUES expression in above statement:

    SELECT split_part(x, ',', 1) AS description
           split_part(x, ',', 2) AS type
    FROM unnest(string_to_array(_param, ';')) x;

    Function with UPSERT in Postgres 9.5

    Create a custom row type for parameter passing. We could do without it, but it's simpler:

    CREATE TYPE foobar AS (description text, type text);


    CREATE OR REPLACE FUNCTION f_insert_foobar(VARIADIC _val foobar[])
      RETURNS void AS
       WITH val AS (SELECT * FROM unnest(_val))    -- well-known row type
       ,    ins AS ( 
          INSERT INTO foo AS f (type)
          SELECT DISTINCT v.type                   -- DISTINCT!
          FROM   val v
          ON     CONFLICT(type) DO UPDATE          -- type already exists
          SET    type = excluded.type WHERE FALSE  -- never executed, but lock rows
          RETURNING f.type,
       INSERT INTO bar AS b (description, foo_id)
       SELECT v.description, COALESCE(,  -- assuming most types pre-exist
       FROM        val v
       LEFT   JOIN foo f USING (type)              -- already existed
       LEFT   JOIN ins i USING (type)              -- newly inserted
       ON     CONFLICT (description) DO UPDATE     -- description already exists
       SET    foo_id = excluded.foo_id             -- real UPSERT this time
       WHERE  b.foo_id IS DISTINCT FROM excluded.foo_id  -- only if actually changed
    $func$  LANGUAGE sql;


    SELECT f_insert_foobar(
       , '(another row,red)'
       , '(new row1,purple)'
       , '(new row2,purple)'
       , '("with,comma",green)'  -- added to demonstrate row syntax

    Fast and rock-solid for environments with concurrent transactions.

    In addition to the queries above, this ...

    • ... applies SELECT or INSERT on foo: Any type that doesn't exist in the FK table, yet, is inserted. Assuming most types pre-exist. To be absolutely sure and rule out race conditions, existing rows we need are locked (so that concurrent transactions cannot interfere). If that's too paranoid for your case you can replace:

        ON     CONFLICT(type) DO UPDATE          -- type already exists
        SET    type = excluded.type WHERE FALSE  -- never executed, but lock rows


        ON     CONFLICT(type) DO NOTHING
    • ... applies INSERT or UPDATE (true "UPSERT") on bar: If the description already exists its type is updated:

        ON     CONFLICT (description) DO UPDATE     -- description already exists
        SET    foo_id = excluded.foo_id             -- real UPSERT this time
        WHERE  b.foo_id IS DISTINCT FROM excluded.foo_id  -- only if actually changed

      But only if type actually changes:

    • ... passes values as well known row types with a VARIADIC parameter. Note the default maximum of 100 parameters! Compare:

      There are many other ways to pass multiple rows ...


    In your `INSERT missing FK rows at the same time` example, would putting this in a Transaction reduce the risk of race conditions in SQL Server?

    @element11: The answer is for Postgres, but since we are talking about a *single* SQL command, it's a single transaction in any case. Executing it inside a bigger transaction would only *increase* the time window for possible race conditions. As for SQL Server: data-modifying CTEs are not supported at all (only `SELECT` inside a `WITH` clause) . Source: MS documentation.

    You can also do this with `INSERT ... RETURNING \gset` in `psql` then use the returned values as psql `:'variables'`, but this only works for single row inserts.

    @ErwinBrandstetter this is great, but I'm too new to sql to understand it all, could you add some comments to "INSERT missing FK rows at the same time" explaining how it works? also, thanks for the SQLFiddle working examples!

    @glallen: I added a step-by-step explanation. There are also many links to related answers and the manual with more explanation. You *need* to understand what the query does or you may be in over your head.

    Bravo on the single statement. I understand that MySQL won't support CTEs until 8.0, would this single statement be feasible in MySQL without CTEs?

    @Jon: I don't think so. But please ask your new question as *question*. You can always link to this one for context.

    Brilliant explanation and examples!

  • Lookup. You basically need the foo id's to insert them into bar.

    Not postgres specific, btw. (and you did not tag it like that) - this is generally how SQL works. No shortcuts here.

    Application wise, though, you may have a cache of foo items in memory. My tables often have up to 3 unique fields:

    • Id (integer or something) that is the table level primary key.
    • Identifier, which is a GUID that is used as stable ID application level wise (and may be exposed to the customer in URL's etc.)
    • Code - a string that may be there and has to be unique if it is there (sql server: filtered unique index on not null). That is a customer set identifier.


    • Account (in a trading application) -> Id is a int used for foreign keys. -> Identifier is a Guid and used in the web portals etc. - always accepted. -> Code is manually set. Rule: once set it does not change.

    Obviously when you want to link something to an account - you first must, technically, get the Id - but given both Identifier and Code never change once they are there, a positive cache in memory kan stop most lookups from hitting the database.

    You are aware that you can let the RDBMS do the lookup for you, in a single SQL statement, avoiding error-prone cache?

    You are aware that looking up non-changing elements is not error prone? Also, typically, the RDBMS is not scalable and the most expensive element in the game, due to licensing costs. Taking as much load from it as possible is not exactly bad. Also, not many ORM's do support that to start with.

    Non-changing elements? Most expensive element? Licensing costs (for PostgreSQL)? ORMs defining what's sane? No I wasn't aware of all of that.

License under CC-BY-SA with attribution

Content dated before 6/26/2020 9:53 AM